llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	220fd33522	[X86] Add AES to KNL CPUs to match clang. I believe this was lost from KNL when AES was pushed from Westmere to Skylake recently. KNL used to inherit from IVB. llvm-svn: 345519	2018-10-29 18:17:01 +00:00
Stanislav Mekhanoshin	6b1c6548bd	[AMDGPU] Fixed return value causing warning and regression llvm-svn: 345518	2018-10-29 17:53:23 +00:00
Bryan Chan	bfd32d4377	[AArch64] Rename FP16FML instruction format (NFC) Rename SIMDThreeSameMult (etc.) to SIMDThreeSameVectorFML (etc.) to follow usual naming convention, and add some comments in the .td files. llvm-svn: 345515	2018-10-29 17:27:34 +00:00
Stanislav Mekhanoshin	79080ecd82	[AMDGPU] Match v_swap_b32 Differential Revision: https://reviews.llvm.org/D52677 llvm-svn: 345514	2018-10-29 17:26:01 +00:00
Francis Visoiu Mistrih	61c9de7565	[X86] Enable the MachineVerifier by default The machine verifier was disabled for x86 by default. There are now only 9 tests failing, compared to what previously was between 20 and 30. This is a good opportunity to file bugs for all the remaining issues, then explicitly disable the failing tests and enabling the machine verifier by default. This allows us to avoid adding new tests that break the verifier. PR27481 llvm-svn: 345513	2018-10-29 16:57:43 +00:00
Luke Cheeseman	71c989ae1f	[AArch64] Return address signing B key support - Add support to generate AUTIBSP, PACIBSP, RETAB instructions for return address signing - The key used to sign the function is controlled by the function attribute "sign-return-address-key" Differential Revision: https://reviews.llvm.org/D51427 llvm-svn: 345511	2018-10-29 16:26:58 +00:00
Craig Topper	aa5eb2fbaa	[X86] Force floating point values in constant pool decoding to print in scientific notation so they can't be confused with integers. When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction. Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that. llvm-svn: 345488	2018-10-29 04:52:04 +00:00
Craig Topper	42aa87143d	[X86] Recognize constant splats in LowerFCOPYSIGN. llvm-svn: 345484	2018-10-28 23:51:35 +00:00
Simon Pilgrim	9b77f0c291	[VectorLegalizer] Enable TargetLowering::expandFP_TO_UINT support. Add vector support to TargetLowering::expandFP_TO_UINT. This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this. llvm-svn: 345473	2018-10-28 13:07:25 +00:00
Roman Lebedev	a5baf86744	AMD BdVer2 (Piledriver) Initial Scheduler model Summary: # Overview This is somewhat partial. * Latencies are good {F7371125} * All of these remaining inconsistencies //appear// to be noise/noisy/flaky. * NumMicroOps are somewhat good {F7371158} * Most of the remaining inconsistencies are from `Ld` / `Ld_ReadAfterLd` classes * Actual unit occupation (pipes, `ResourceCycles`) are undiscovered lands, i did not really look there. They are basically verbatum copy from `btver2` * Many `InstRW`. And there are still inconsistencies left... To be noted: I think this is the first new schedule profile produced with the new next-gen tools like llvm-exegesis! # Benchmark I realize that isn't what was suggested, but i'll start with some "internal" public real-world benchmark i understand - [[ https://github.com/darktable-org/rawspeed \| RawSpeed raw image decoding library ]]. Diff (the exact clang from trunk without/with this patch): ``` Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_mean -0.0607 -0.0604 234 219 233 219 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_median -0.0630 -0.0626 233 219 233 219 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_stddev +0.2581 +0.2587 1 2 1 2 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_mean -0.0770 -0.0767 144 133 144 133 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_median -0.0767 -0.0763 144 133 144 133 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_stddev -0.4170 -0.4156 1 0 1 0 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_mean -0.0271 -0.0270 463 450 463 450 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_median -0.0093 -0.0093 453 449 453 449 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_stddev -0.7280 -0.7280 13 4 13 4 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_pvalue 0.0004 0.0004 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_mean -0.0065 -0.0065 569 565 569 565 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_median -0.0077 -0.0077 569 564 569 564 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_stddev +1.0077 +1.0068 2 5 2 5 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_pvalue 0.0220 0.0199 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_mean +0.0006 +0.0007 312 312 312 312 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_median +0.0031 +0.0032 311 312 311 312 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_stddev -0.7069 -0.7072 4 1 4 1 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_pvalue 0.0004 0.0004 U Test, Repetitions: 25 vs 25 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_mean -0.0015 -0.0015 141 141 141 141 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_median -0.0010 -0.0011 141 141 141 141 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_stddev -0.1486 -0.1456 0 0 0 0 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_pvalue 0.6139 0.8766 U Test, Repetitions: 25 vs 25 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_mean -0.0008 -0.0005 60 60 60 60 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_median -0.0006 -0.0002 60 60 60 60 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_stddev -0.1467 -0.1390 0 0 0 0 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_pvalue 0.0137 0.0137 U Test, Repetitions: 25 vs 25 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_mean +0.0002 +0.0002 275 275 275 275 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_median -0.0015 -0.0014 275 275 275 275 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_stddev +3.3687 +3.3587 0 2 0 2 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_pvalue 0.4041 0.3933 U Test, Repetitions: 25 vs 25 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_mean +0.0004 +0.0004 67 67 67 67 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_median -0.0000 -0.0000 67 67 67 67 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_stddev +0.1947 +0.1995 0 0 0 0 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_pvalue 0.0074 0.0001 U Test, Repetitions: 25 vs 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_mean -0.0092 +0.0074 547 542 25 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_median -0.0054 +0.0115 544 541 25 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_stddev -0.4086 -0.3486 8 5 0 0 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_pvalue 0.3320 0.0000 U Test, Repetitions: 25 vs 25 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_mean +0.0015 +0.0204 218 218 12 12 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_median +0.0001 +0.0203 218 218 12 12 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_stddev +0.2259 +0.2023 1 1 0 0 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_pvalue 0.0000 0.0001 U Test, Repetitions: 25 vs 25 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_mean -0.0209 -0.0179 96 94 90 88 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_median -0.0182 -0.0155 95 93 90 88 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_stddev -0.6164 -0.2703 2 1 2 1 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_mean -0.0098 -0.0098 176 175 176 175 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_median -0.0126 -0.0126 176 174 176 174 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_stddev +6.9789 +6.9157 0 2 0 2 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_mean -0.0237 -0.0238 474 463 474 463 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_median -0.0267 -0.0267 473 461 473 461 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_stddev +0.7179 +0.7178 3 5 3 5 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_pvalue 0.6837 0.6554 U Test, Repetitions: 25 vs 25 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_mean -0.0014 -0.0013 1375 1373 1375 1373 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_median +0.0018 +0.0019 1371 1374 1371 1374 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_stddev -0.7457 -0.7382 11 3 10 3 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_mean -0.0080 -0.0289 22 22 10 10 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_median -0.0070 -0.0287 22 22 10 10 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_stddev +1.0977 +0.6614 0 0 0 0 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_mean +0.0132 +0.0967 35 36 10 11 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_median +0.0132 +0.0956 35 36 10 11 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_stddev -0.0407 -0.1695 0 0 0 0 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_mean +0.0331 +0.1307 13 13 6 6 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_median +0.0430 +0.1373 12 13 6 6 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_stddev -0.9006 -0.8847 1 0 0 0 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_pvalue 0.0016 0.0010 U Test, Repetitions: 25 vs 25 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_mean -0.0023 -0.0024 395 394 395 394 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_median -0.0029 -0.0030 395 394 395 393 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_stddev -0.0275 -0.0375 1 1 1 1 Phase One/P65/CF027310.IIQ/threads:8/real_time_pvalue 0.0232 0.0000 U Test, Repetitions: 25 vs 25 Phase One/P65/CF027310.IIQ/threads:8/real_time_mean -0.0047 +0.0039 114 113 28 28 Phase One/P65/CF027310.IIQ/threads:8/real_time_median -0.0050 +0.0037 114 113 28 28 Phase One/P65/CF027310.IIQ/threads:8/real_time_stddev -0.0599 -0.2683 1 1 0 0 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_mean +0.0206 +0.0207 405 414 405 414 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_median +0.0204 +0.0205 405 414 405 414 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_stddev +0.2155 +0.2212 1 1 1 1 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_mean -0.0109 -0.0108 147 145 147 145 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_median -0.0104 -0.0103 147 145 147 145 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_stddev -0.4919 -0.4800 0 0 0 0 Samsung/NX3000/_3184416.SRW/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX3000/_3184416.SRW/threads:8/real_time_mean -0.0149 -0.0147 220 217 220 217 Samsung/NX3000/_3184416.SRW/threads:8/real_time_median -0.0173 -0.0169 221 217 220 217 Samsung/NX3000/_3184416.SRW/threads:8/real_time_stddev +1.0337 +1.0341 1 3 1 3 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_pvalue 0.0001 0.0001 U Test, Repetitions: 25 vs 25 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_mean -0.0019 -0.0019 194 193 194 193 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_median -0.0021 -0.0021 194 193 194 193 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_stddev -0.4441 -0.4282 0 0 0 0 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_pvalue 0.0000 0.4263 U Test, Repetitions: 25 vs 25 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_mean +0.0258 -0.0006 81 83 19 19 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_median +0.0235 -0.0011 81 82 19 19 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_stddev +0.1634 +0.1070 1 1 0 0 ``` {F7443905} If we look at the `_mean`s, the time column, the biggest win is `-7.7%` (`Canon/EOS 5D Mark II/10.canon.sraw2.cr2`), and the biggest loose is `+3.3%` (`Panasonic/DC-GH5S/P1022085.RW2`); Overall: mean `-0.7436%`, median `-0.23%`, `cbrt(sum(time^3))` = `-8.73%` Looks good so far i'd say. llvm-exegesis details: {F7371117} {F7371125} {F7371128} {F7371144} {F7371158} Reviewers: craig.topper, RKSimon, andreadb, courbet, avt77, spatel, GGanesh Reviewed By: andreadb Subscribers: javed.absar, gbedwell, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52779 llvm-svn: 345463	2018-10-27 20:46:30 +00:00
Simon Pilgrim	a365719a24	[X86][SSE] LowerVSELECT - pull out repeated getOperand(). NFCI. llvm-svn: 345458	2018-10-27 18:37:59 +00:00
Simon Pilgrim	88116e905e	Revert rL345395: [X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs Makes no difference to actual shuffle decoding yet, but merges all the existing limits in one place for when proper support is fixed. ........ Its been reported that this is causing out of trunk failures. llvm-svn: 345451	2018-10-27 07:10:48 +00:00
Sanjin Sijaric	96f2ea3dd4	[ARM64][Windows] MCLayer support for exception handling Add ARM64 unwind codes to MCLayer, as well SEH directives that will be emitted by the frame lowering patch to follow. We only emit unwind codes into object object files for now. Differential Revision: https://reviews.llvm.org/D50166 llvm-svn: 345450	2018-10-27 06:13:06 +00:00
Craig Topper	4b89647b79	[X86] Add some isel patterns for scalar_to_vector/extract_vector_element that use the avx512 extended register classes when they are available. llvm-svn: 345448	2018-10-27 05:35:20 +00:00
Alina Sbirlea	bdb16f0519	Revert r345169 [along with its llvm counterpart r345170] as it makes Halide builds timeout. llvm-svn: 345447	2018-10-27 04:51:12 +00:00
Brendon Cahoon	aa783dfd6e	[Hexagon] Add missing assignment to Itinerary in Call_nr The class definition for Call_nr has the itinerary as a parameter, but the value is never assigned to the Itinerary field for the instruction. This means the compiler is unable to schedule and packetize the instruction correctly because these instrution will not have any resource descritions. I don't have a specific test case, but the ps_call_nr.ll test failed with a proposed patch. llvm-svn: 345442	2018-10-27 00:50:29 +00:00
Reid Kleckner	98d880fbd7	[Spectre] Fix MIR verifier errors in retpoline thunks Summary: The main challenge here is that X86InstrInfo::AnalyzeBranch doesn't understand the way we're using a CALL instruction as a branch, so we can't list the CallTarget MBB as a successor of the entry block. If we don't list it as a successor, then the AsmPrinter doesn't print a label for the MBB. Fix the issue by inserting our own label at the beginning of the call target block. We can rely on the AsmPrinter to always emit it, even though the block appears to be unreachable, but address-taken. Fixes PR38391. Reviewers: thegameg, chandlerc, echristo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D53653 llvm-svn: 345426	2018-10-26 20:26:36 +00:00
Eli Friedman	2ac1162917	[ARM] Make InstrEmitter mark CPSR defs dead for Thumb1. The "dead" markings allow existing target-independent optimizations, like MachineSink, to trigger more frequently. The CPSR defs would have eventually been marked dead by LiveVariables, so this only affects optimizations before regalloc. The ARMBaseInstrInfo.cpp change is fixing a bug which is only visible with this change: the transform adds a use to an otherwise dead def of CPSR. This is covered by existing regression tests. thumb2-tbh.ll breaks for Thumb1 due to MachineLICM changing the generated code; I'll fix it in D53452. Differential Revision: https://reviews.llvm.org/D53453 llvm-svn: 345420	2018-10-26 19:32:24 +00:00
Lei Huang	de20843f6f	[PowerPC] Improve BUILD_VECTOR of 4 i32s Currently, for this node: vector int test(int a, int b, int c, int d) { return (vector int) { a, b, c, d }; } we get this on Power9: mtvsrdd 34, 5, 3 mtvsrdd 35, 6, 4 vmrgow 2, 3, 2 and this on Power8: mtvsrwz 0, 3 mtvsrwz 1, 5 mtvsrwz 2, 4 mtvsrwz 3, 6 xxmrghd 34, 1, 0 xxmrghd 35, 3, 2 vmrgow 2, 3, 2 This can be improved to this on LE Power9: rldimi 3, 4, 32, 0 rldimi 5, 6, 32, 0 mtvsrdd 34, 5, 3 and this on LE Power8 rldimi 3, 4, 32, 0 rldimi 5, 6, 32, 0 mtvsrd 34, 3 mtvsrd 35, 5 xxpermdi 34, 35, 34, 0 This patch updates the TD pattern to generate the optimized sequence for both Power8 and Power9 on LE and BE. Differential Revision: https://reviews.llvm.org/D53494 llvm-svn: 345414	2018-10-26 18:09:36 +00:00
Craig Topper	8315d9990c	[X86] Stop promoting vector and/or/xor/andn to vXi64. These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation. This patch removes the promotion and adds isel patterns instead. The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward. Differential Revision: https://reviews.llvm.org/D53268 llvm-svn: 345408	2018-10-26 17:21:26 +00:00
Simon Pilgrim	5d1be4f8d4	[X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs Makes no difference to actual shuffle decoding yet, but merges all the existing limits in one place for when proper support is fixed. llvm-svn: 345395	2018-10-26 15:19:02 +00:00
Sanjay Patel	6b40768f5a	[x86] commute blendvb with constant condition op to allow load folding This is a narrow fix for 1 of the problems mentioned in PR27780: https://bugs.llvm.org/show_bug.cgi?id=27780 I looked at more general solutions, but it's a mess. We canonicalize shuffle masks based on the number of elements accessed from each operand, and that's not optional. If you remove that, we'll crash because we fail to match isel patterns. So I'm waiting until we're sure that we have blendvb with constant condition and then commuting based on the load potential. Other cases like blend-with-immediate are already handled elsewhere, so this is probably not a common problem anyway. I didn't use "MayFoldLoad" because that checks for one-use and in these cases, we've screwed that up by creating a temporary PSHUFB using these operands that we're counting on to be killed later. Undoing that didn't look like a simple task because it's intertwined with determining if we actually use both operands of the shuffle or not.a Differential Revision: https://reviews.llvm.org/D53737 llvm-svn: 345390	2018-10-26 14:58:13 +00:00
Simon Pilgrim	7575c6d01b	[X86] Use existing pulled out VT variables. NFCI. llvm-svn: 345388	2018-10-26 14:39:28 +00:00
Scott Linder	11ef7984b0	[AMDGPU] Add a pass to promote bitcast calls AMDGPU currently only supports direct calls, but at lower optimisation levels it fails to lower statically direct calls which appear indirect due to a bitcast. Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize" calls. Differential Revision: https://reviews.llvm.org/D52741 llvm-svn: 345382	2018-10-26 13:18:36 +00:00
Fangrui Song	065c3610ad	[SystemZ] Fix -Wcovered-switch-default as coding standard regulates llvm-svn: 345369	2018-10-26 06:59:08 +00:00
Li Jia He	f6fb752fe8	[PowerPC] Fix some missed optimization opportunities in combineSetCC For both operands are bool, short, int, long, long long, add the following optimization. 1. 0-x == y --> x+y ==0 2. 0-x != y --> x+y != 0 Review: nemanjai Differential Revision: https://reviews.llvm.org/D53360 llvm-svn: 345366	2018-10-26 06:48:53 +00:00
Nemanja Ivanovic	6a74bfba20	[PowerPC] Keep vector int to fp conversions in vector domain At present a v2i16 -> v2f64 convert is implemented by extracts to scalar, scalar converts, and merge back into a vector. Use vector converts instead, with the int data permuted into the proper position and extended if necessary. Patch by RolandF. Differential revision: https://reviews.llvm.org/D53346 llvm-svn: 345361	2018-10-26 03:19:13 +00:00
Fangrui Song	61ea8dae2e	Add dependency from SystemZAsmParser to SystemZAsmPrinter after rL345349 This fixes -DBUILD_SHARED_LIBS=on build. The dependency is similar to that of X86's. llvm-svn: 345358	2018-10-26 03:04:54 +00:00
Vlad Tsyrklevich	21beeb29ea	Revert "[AArch64] Create proper memoperand for multi-vector stores" This reverts commit r345315, it was causing test failures on sanitizer-x86_64-linux-fast. llvm-svn: 345356	2018-10-26 02:00:14 +00:00
Jonas Paulsson	dda46307c2	[SystemZ] Implement SystemZOperand::print() SystemZAsmParser can now handle -debug by printing the operands neatly to the output stream. Before this patch this lead to an llvm_unreachable(). It seems that now '-mllvm -debug' does not cause any crashes anywhere (at least not on SPEC). Review: Ulrich Weigand https://reviews.llvm.org/D53328 llvm-svn: 345349	2018-10-26 00:36:00 +00:00
Jonas Paulsson	e2c5cbc164	[SystemZ] Pass the DAG pointer from SystemZAddressingMode::dump(). In order to print the IR slot number for the memory operand, the DAG pointer must be passed to SDNode::dump(). The isel-debug.ll test updated to also check for the IR Value reference being printed correctly. Review: Ulrich Weigand https://reviews.llvm.org/D53333 llvm-svn: 345347	2018-10-26 00:02:33 +00:00
Heejin Ahn	24faf859e5	Reland "[WebAssembly] LSDA info generation" Summary: This adds support for LSDA (exception table) generation for wasm EH. Wasm EH mostly follows the structure of Itanium-style exception tables, with one exception: a call site table entry in wasm EH corresponds to not a call site but a landing pad. In wasm EH, the VM is responsible for stack unwinding. After an exception occurs and the stack is unwound, the control flow is transferred to wasm 'catch' instruction by the VM, after which the personality function is called from the compiler-generated code. (Refer to WasmEHPrepare pass for more information on this part.) This patch: - Changes wasm.landingpad.index intrinsic to take a token argument, to make this 1:1 match with a catchpad instruction - Stores landingpad index info and catch type info MachineFunction in before instruction selection - Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an exception table - Adds WasmException class with overridden methods for table generation - Adds support for LSDA section in Wasm object writer Reviewers: dschuff, sbc100, rnk Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52748 llvm-svn: 345345	2018-10-25 23:55:10 +00:00
Heejin Ahn	3103d3dcd1	[WebAssembly] Support EH instructions in InstPrinter Summary: This adds support for exception handling instructions to InstPrinter. Reviewers: dschuff, aardappel Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53634 llvm-svn: 345343	2018-10-25 23:45:48 +00:00
Bryan Chan	f0923f16f8	[AArch64] Implement FP16FML intrinsics Add LLVM intrinsics for the ARMv8.2-A FP16FML vector-form instructions. Add a DAG pattern to define the indexed-form intrinsics in terms of the vector-form ones, similarly to how the Dot Product intrinsics were implemented. Based on a patch by Gao Yiling. Differential Revision: https://reviews.llvm.org/D53632 llvm-svn: 345337	2018-10-25 23:36:41 +00:00
Heejin Ahn	1d13e6be37	Address comments - Add llvm-mc test case (and delete the old one) - Change report_fatal_error to assertions llvm-svn: 345334	2018-10-25 23:35:14 +00:00
Heejin Ahn	1147d91402	[WebAssembly] Error out when block/loop markers mismatch Summary: Currently InstPrinter ignores if there are mismatches between block/loop and end markers by skipping the case if ControlFlowStack is empty. I guess it is better to explicitly error out in this case, because this signals invalid input. Reviewers: aardappel Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53620 llvm-svn: 345333	2018-10-25 23:35:13 +00:00
Jonas Paulsson	2b280ea604	[SystemZ] NFC reformatting in SystemZTargetTransformInfo.cpp Some lines more than 80 characters long reformatted. llvm-svn: 345331	2018-10-25 22:53:27 +00:00
Jonas Paulsson	b7caa809e1	[SystemZ] Improve getMemoryOpCost() to find foldable loads that are converted. The SystemZ backend can do arithmetic of memory by loading and then extending one of the operands. Similarly, a load + truncate can be folded into an operand. This patch improves the SystemZ TTI cost function to recognize this. Review: Ulrich Weigand https://reviews.llvm.org/D52692 llvm-svn: 345327	2018-10-25 22:28:25 +00:00
Jonas Paulsson	4645711a8d	[SystemZ] Improve handling and cost estimates of vector integer div/rem Enable the DAG optimization that converts vector div/rem with constants into multiply+shifts sequences by expanding them early. This is needed since ISD::SMUL_LOHI is 'Custom' lowered on SystemZ, and will therefore not be available to BuildSDIV after legalization. Better cost values for these instructions based on how they will be implemented (a constant divisor is cheaper). Review: Ulrich Weigand https://reviews.llvm.org/D53196 llvm-svn: 345321	2018-10-25 21:47:22 +00:00
Craig Topper	813064bf4d	[X86] Change X86 backend to look for 'min-legal-vector-width' attribute instead of 'required-vector-width' when determining whether 512-bit vectors should be legal. The required-vector-width attribute was only used for backend testing and has never been generated by clang. I believe clang is now generating min-legal-vector-width for vector uses in user code. With this I believe passing -mprefer-vector-width=256 to clang should prevent use of zmm registers in the generated assembly unless the user used a 512-bit intrinsic in their source code. llvm-svn: 345317	2018-10-25 21:16:06 +00:00
David Greene	53e869da7d	[AArch64] Create proper memoperand for multi-vector stores Include all of the store's source vector operands when creating the MachineMemOperand. Previously, we were missing the first operand, making the store size seem smaller than it really is. Differential Revision: https://reviews.llvm.org/D52816 llvm-svn: 345315	2018-10-25 21:10:39 +00:00
Thomas Lively	0aad98fd07	[WebAssembly] Use target-independent saturating add Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53721 llvm-svn: 345299	2018-10-25 19:06:13 +00:00
Volkan Keles	f87473fe1c	[GISel] LegalizerInfo: Rename MemDesc::Size to SizeInBits to make the value clearer Requested in D53679. llvm-svn: 345288	2018-10-25 17:37:07 +00:00
Craig Topper	c10de9a37a	[X86] Remove ProcIntelKNL and replace with a SlowPMADDWD flag to use in the one place it was checked. llvm-svn: 345286	2018-10-25 17:29:00 +00:00
Craig Topper	5d787ac4be	[X86] Remove some uarch tuning flags from KNL that look to have been inherited from SNB/IVB incorrectly KNL is based on a modified Silvermont core so I don't think these features apply. I think the LEA flag is probably also wrong, but I'm less sure as I barely understand the 3 LEA flags we have currently. Differential Revision: https://reviews.llvm.org/D53671 llvm-svn: 345285	2018-10-25 17:28:57 +00:00
Volkan Keles	3a103b1d25	[AArch64][GlobalISel] Fix the LegalityPredicate for lowerIf for G_LOAD/G_STORE Summary: Currently, Legalizer is trying to lower G_LOAD with a vector type that has more than two elements due to the incorrect LegalityPredicate. This patch fixes the issue by removing the multiplication by 8 as `MemDesc.Size` already contains the size in bits. Reviewers: dsanders, aemerson Reviewed By: dsanders Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53679 llvm-svn: 345282	2018-10-25 17:23:25 +00:00
Evandro Menezes	b53cf99388	[AArch64] Refactor Exynos feature sets (NFC) llvm-svn: 345279	2018-10-25 16:45:46 +00:00
John Brawn	958865202d	[AArch64] Add EXT patterns for 64-bit EXT of a subvector of a 128-bit vector If we have a 64-bit EXT where one of the operands is a subvector of a 128-bit vector then in some cases we can eliminate an extract_subvector by converting to a 128-bit EXT of the 128-bit vector. Differential Revision: https://reviews.llvm.org/D53582 llvm-svn: 345275	2018-10-25 15:31:51 +00:00
Sam Parker	a16667e79b	[ARM] Use Cortex-A57 sched model for Cortex-A72 This mirrors what we already do for AArch64 as the cores are similar. As discussed in the review, enabling the machine scheduler causes more variations in performance changes so it is not enabled for now. This patch improves LNT scores by a geomean of 1.57% at -O3. Differential Revision: https://reviews.llvm.org/D53562 llvm-svn: 345272	2018-10-25 15:08:29 +00:00
John Brawn	b8e7887f33	[AArch64] Refactor definition of EXT patterns to use a multiclass Using a multiclass reduces duplication, and makes it easier to add new patterns later. This refactoring does add some new patterns, but as far as I can tell there's no IR that will end up triggering them so this is effectively NFC. Differential Revision: https://reviews.llvm.org/D53580 llvm-svn: 345271	2018-10-25 15:00:10 +00:00
John Brawn	49e61d90ca	[AArch64] Do 64-bit vector move of 0 and -1 by extracting from the 128-bit move Currently a vector move of 0 or -1 will use different instructions depending on the size of the vector. Using a single instruction (the 128-bit one) for both gives more opportunity for Machine CSE to eliminate instructions. Differential Revision: https://reviews.llvm.org/D53579 llvm-svn: 345270	2018-10-25 14:56:48 +00:00
Alexey Bataev	0f2fe4f135	[DEBUG_INFO][NVPTX]Fix processing of DBG_VALUES. Summary: If the instruction in the eliminateFrameIndex function is a DBG_VALUE instruction, it requires special processing. The frame register is set to VRFrame and the offset is based on the object offset. The code is similar to the code used in lib/CodeGen/PrologEpilogInserter.cpp. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D53657 llvm-svn: 345269	2018-10-25 14:27:27 +00:00
Amara Emerson	cbd86d8429	[GlobalISel] Use the target preferred type for G_EXTRACT_VECTOR_ELT index. Allows for better imported pattern re-use. llvm-svn: 345265	2018-10-25 14:04:54 +00:00
Simon Pilgrim	53e8e145e9	[CostModel][X86] Add realistic vXi64 uitofp vXf64 costs Match codegen improvements from D53649/rL345256 llvm-svn: 345263	2018-10-25 13:06:20 +00:00
Alex Bradbury	74d4931da2	[RISCV] Use PatFrags for variable shift patterns This follows SystemZ and I think is cleaner vs the multiclass. llvm-svn: 345262	2018-10-25 12:45:20 +00:00
Simon Pilgrim	0573b8d8b6	[CostModel][X86] Add realistic i64 uitofp f64 scalar costs llvm-svn: 345261	2018-10-25 12:42:10 +00:00
Simon Pilgrim	071e82218f	[TTI] Add generic SK_Broadcast shuffle costs I noticed while fixing PR39368 that we don't have generic shuffle costs for broadcast style shuffles. This patch adds SK_BROADCAST handling, but exposes ARM/AARCH64 lack of handling of this type, which I've added a fix for at the same time. Differential Revision: https://reviews.llvm.org/D53570 llvm-svn: 345253	2018-10-25 10:52:36 +00:00
Clement Courbet	41c8af3924	[MCSched] Bind PFM Counters to the CPUs instead of the SchedModel. Summary: The pfm counters are now in the ExegesisTarget rather than the MCSchedModel (PR39165). This also compresses the pfm counter tables (PR37068). Reviewers: RKSimon, gchatelet Subscribers: mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D52932 llvm-svn: 345243	2018-10-25 07:44:01 +00:00
Craig Topper	7ae43cad65	[X86] Don't use the OriginalDemandedBits to calculate the DemandedMask for PMULUDQ/PMULDQ inputs. Multiply a is complex operation so just because some bit of the output isn't used doesn't mean that bit of the input isn't used. We might able to bound it, but it will require some more thought. llvm-svn: 345241	2018-10-25 07:00:09 +00:00
Craig Topper	eaa1cf5b57	[X86] Fix typo in comment. NFC llvm-svn: 345236	2018-10-25 05:00:20 +00:00
Thomas Lively	325c9c5e84	[WebAssembly] Set LoadExt and TruncStore actions for SIMD types Summary: Fixes part of the problem reported in bug 39275. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton Differential Revision: https://reviews.llvm.org/D53542 llvm-svn: 345230	2018-10-25 01:46:07 +00:00
Heejin Ahn	ac764aa88e	[WebAssembly] Fix immediate of rethrow when throwing to caller Summary: Currently when assigning depths 'rethrow' does not take the whole control flow stack into accounts but only considers EH pad stacks. When assigning depth immmediates to rethrows, in normal cases it is done correctly but when a rethrow instruction throws up to a caller, i.e., we convert a pseudo RETHROW_TO_CALLER instruction to a rethrow, it mistakenly compute the whole stack depth. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53619 llvm-svn: 345223	2018-10-24 23:31:24 +00:00
Thomas Lively	ed9513472c	[WebAssembly] Retain shuffle types during custom lowering Summary: Changing the node type in lowering was violating assumptions made in the DAG combiner, so don't change the node type any more. This fixes one of the issues reported in bug 39275. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton Differential Revision: https://reviews.llvm.org/D53537 llvm-svn: 345221	2018-10-24 23:27:40 +00:00
Reid Kleckner	49a24278ba	[ELF] Fix large code model MIR verifier errors Instead of using the MOVGOT64r pseudo, use the existing MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to create a "scratch register operand" for the pseudo to use, and the register allocator can make better decisions. Fixes some X86 verifier errors tracked in PR27481. llvm-svn: 345219	2018-10-24 22:57:28 +00:00
Thomas Lively	30f1d69115	[NFC] Rename minnan and maxnan to minimum and maximum Summary: Changes all uses of minnan/maxnan to minimum/maximum globally. These names emphasize that the semantic difference between these operations is more than just NaN-propagation. Reviewers: arsenm, aheejin, dschuff, javed.absar Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53112 llvm-svn: 345218	2018-10-24 22:49:55 +00:00
Evandro Menezes	096e2497b5	[AArch64] Refactor Exynos machine model Effectively, NFC. llvm-svn: 345201	2018-10-24 21:40:43 +00:00
Reid Kleckner	9c5bda652c	[X86] Add SP to tailcall register class to fix verifier error It's possible to do a tail call to a stack argument. LLVM already calculates the right stack offset to call through. Fixes the sibcall and musttail* verifier failures tracked at PR27481. llvm-svn: 345197	2018-10-24 21:09:34 +00:00
Reid Kleckner	953bdce68d	[MC] Separate masm integer literal lexer support from inline asm Summary: This renames the IsParsingMSInlineAsm member variable of AsmLexer to LexMasmIntegers and moves it up to MCAsmLexer. This is the only behavior controlled by that variable. I added a public setter, so that it can be set from outside or from the llvm-mc command line. We may need to arrange things so that users can get this behavior from clang, but that's future work. I also put additional hex literal lexing functionality under this flag to fix PR32973. It appears that this hex literal parsing wasn't intended to be enabled in non-masm-style blocks. Now, masm integers (0b1101 and 0ABCh) work in __asm blocks from clang, but 0b label references work when using .intel_syntax in standalone .s files. However, 0b label references will not work from __asm blocks in clang. They will work from GCC inline asm blocks, which it sounds like is important for Crypto++ as mentioned in PR36144. Essentially, we only lex masm literals for inline asm blobs that use intel syntax. If the .intel_syntax directive is used inside a gnu-style inline asm statement, masm literals will not be lexed, which is compatible with gas and llvm-mc standalone .s assembly. This fixes PR36144 and PR32973. Reviewers: Gerolf, avt77 Subscribers: eraman, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D53535 llvm-svn: 345189	2018-10-24 20:23:57 +00:00
Tim Northover	1c353419ab	AArch64: add a pass to compress jump-table entries when possible. llvm-svn: 345188	2018-10-24 20:19:09 +00:00
Evandro Menezes	769d4cebad	[AArch64] Refactor Exynos machine model (NFC) llvm-svn: 345187	2018-10-24 20:03:24 +00:00
Evandro Menezes	80bc136732	[AArch64] Fix overlapping instructions Fix overlapping instruction descriptions in the machine model for Exynos M3. Effectively, NFC. llvm-svn: 345186	2018-10-24 20:03:20 +00:00
Craig Topper	7bb8c2e6e5	[X86] Explicitly list all KNL features of inheriting from IVB. NFC I'm not sure all the microarchitectural tuning flags that have been added to IVBFeatures are relevant for KNL. Separating will allow us to see and audit them. There might even be some simplification opportunities in the Sandy Bridge through Icelake inheritance line without KNL using the same chain. llvm-svn: 345183	2018-10-24 19:24:44 +00:00
Simon Pilgrim	c5bb362b13	[X86][SSE] Add SimplifyDemandedBitsForTargetNode PMULDQ/PMULUDQ handling Add X86 SimplifyDemandedBitsForTargetNode and use it to simplify PMULDQ/PMULUDQ target nodes. This enables us to repeatedly simplify the node's arguments after the previous approach had to be reverted due to PR39398. Differential Revision: https://reviews.llvm.org/D53643 llvm-svn: 345182	2018-10-24 19:11:28 +00:00
Simon Pilgrim	ac84005841	[CostModel][X86] Add vXi8 vector division by constants costs. ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV. llvm-svn: 345175	2018-10-24 18:44:12 +00:00
Peter Collingbourne	4bb928c110	ARM: Use BKPT instead of TRAP to implement llvm.debugtrap. The BKPT instruction is specified to cause a software breakpoint, and at least on Linux results in a SIGTRAP. This makes it more suitable for implementing debugtrap than TRAP (aka UDF #254), which is specified to cause an undefined instruction exception and results in a SIGILL on Linux. Moreover, BKPT is not marked as a terminator, which is not only consistent with the IR instruction but allows the analyzeBlock function to correctly analyze a basic block containing the instruction, which fixes an assertion failure in the machine block placement pass previously triggered by the included test case. Because BKPT is only supported starting with ARMv5T, we continue to use UDF #254 when targeting v4T. Differential Revision: https://reviews.llvm.org/D53614 llvm-svn: 345171	2018-10-24 18:10:38 +00:00
Krzysztof Parzyszek	57b5ac1431	[Hexagon] Flip hexagon-autohvx to be true by default This will allow other generators of LLVM IR to use the auto-vectorizer without having to change that flag. Note: on its own, this patch will enable auto-vectorization on Hexagon in all cases, regardless of the -fvectorize flag. There is a companion clang patch that together with this one forms an NFC for clang users. llvm-svn: 345169	2018-10-24 17:55:13 +00:00
Craig Topper	2417273255	[X86] Bring back the MOV64r0 pseudo instruction This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos. My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at. There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling. Differential Revision: https://reviews.llvm.org/D52757 llvm-svn: 345165	2018-10-24 17:32:09 +00:00
Simon Pilgrim	2cce074e8c	[CostModel][X86] Enable non-uniform vector division by constants costs. Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases. llvm-svn: 345164	2018-10-24 17:30:29 +00:00
Alexey Bataev	c15c853c3a	[DEBUGINFO, NVPTX] Try to pack bytes data into a single string. Summary: If the target does not support `.asciz` and `.ascii` directives, the strings are represented as bytes and each byte is placed on the new line as a separate byte directive `.b8 <data>`. NVPTX target allows to represent the vector of the data of the same type as a vector, where values are separated using `,` symbol: `.b8 <data1>,<data2>,...`. This allows to reduce the size of the final PTX file. Ptxas tool includes ptx files into the resulting binary object, so reducing the size of the PTX file is important. Reviewers: tra, jlebar, echristo Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D45822 llvm-svn: 345142	2018-10-24 14:04:00 +00:00
Tim Renouf	2a1b1d94b6	[AMDGPU] Defined gfx909 Raven Ridge 2 Differential Revision: https://reviews.llvm.org/D53418 Change-Id: Ie3d054f2e956c2768988c0f4c0ffd29a47294eef llvm-svn: 345120	2018-10-24 08:14:07 +00:00
Craig Topper	da54bbf52a	[X86] Correct a bad isel predicate. Though I don't think it can be exposed. This B/W VPTEST instructions are only available with AVX512BW. But lowering should prevent any byte or word elements from getting to isel so this can't be exposed. llvm-svn: 345112	2018-10-24 06:13:36 +00:00
Saleem Abdulrasool	4005f9a860	ARM: handle checking aliases with out-of-bounds GEPs A global alias may use indices which are not considered in bounds. In such a case, accessing the base object will fail as it only peers through inbounds accesses. This pattern is used by the swift compiler to create references to preceeding members in the type metadata. This would cause the code generation to fail when targeting a platform that used ELF as the object file format. Be conservative and fail the read-only check if we run into an alias that we cannot peer through. llvm-svn: 345107	2018-10-24 00:00:52 +00:00
Matthias Braun	4f82406c46	SelectionDAG: Reuse bigger sized constants in memset expansion. When implementing memset's today we often see this pattern: $x0 = MOV 0xXYXYXYXYXYXYXYXY store $x0, ... $w1 = MOV 0xXYXYXYXY store $w1, ... We first create a 64bit constant in a 64bit register with all bytes the same and then create a 32bit constant with all bytes the same in a 32bit register. In many targets we could just access the lower byte of the 64bit register instead. - Ideally this would be handled by the ConstantHoist pass but it runs too early when memset isn't expanded yet. - The memset expansion code already had this optimization implemented, however SelectionDAG constantfolding would constantfold the "trunc(bigconstnat)" pattern to "smallconstant". - This patch makes the memset expansion mark the constant as Opaque and stop DAGCombiner from constant folding in this situation. (Similar to how ConstantHoisting marks things as Opaque to avoid folding ADD/SUB/etc.) Differential Revision: https://reviews.llvm.org/D53181 llvm-svn: 345102	2018-10-23 23:19:23 +00:00
Simon Pilgrim	b6c57075c0	[X86][SSE] Revert rL343922 combinePMULDQ AddToWorklist (PR39398) We can't add the MULDQ node back to the worklist after the demanded bits change has been committed in case the node has been removed entirely. This will have to wait until we have SimplifyDemandedBitsForTargetNode. llvm-svn: 345070	2018-10-23 19:07:53 +00:00
Roman Lebedev	2fae985793	X86DAGToDAGISel::matchBitExtract(): lambdas can't have default arguments. As reported by ctopper. That is a gcc-only warning at the moment. llvm-svn: 345065	2018-10-23 18:27:10 +00:00
Stefan Pintilie	927e8bf316	[Power9] Add __float128 support in the backend for bitcast to a i128 Add support to allow bit-casting from f128 to i128 and then extracting 64 bits from the result. Differential Revision: https://reviews.llvm.org/D49507 llvm-svn: 345053	2018-10-23 17:11:36 +00:00
Simon Pilgrim	f04a04c2b6	[TTI][X86] Treat SK_Transpose shuffles as SK_PermuteTwoSrc - there's no difference in lowering. llvm-svn: 345048	2018-10-23 16:45:26 +00:00
Sanjay Patel	47a52a0521	[WebAssembly] use 'match' to simplify code; NFC Vector types are not possible here because this code explicitly checks for a scalar type, but this is another step towards completely removing the fake binop queries for not/neg/fneg. llvm-svn: 345043	2018-10-23 16:05:09 +00:00
Roman Lebedev	06e4db07af	Experimental re-land of [X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern This initially landed in rL345014, but was reverted in rL345017 due to sanitizer-x86_64-linux-fast buildbot failure in check-lld (ELF/relocatable-versioned.s) test. While i'm not yet quite sure what is the problem, one obvious thing here is that extra truncation roundtrip. Maybe that's it? If not, will re-revert. Differential Revision: https://reviews.llvm.org/D53521 llvm-svn: 345027	2018-10-23 13:19:31 +00:00
Simon Pilgrim	f85ee9f8b4	[X86][SSE] Update raw mask shuffle decoders to handle UNDEF mask elts Matches the approach taken in the constant pool shuffle decoders, and uses an UndefElts mask instead of uint64_t(-1) raw mask values, which doesn't work safely for i32/i64 shuffle mask sizes (as the -1 value is legal). This allows us to remove the constant pool shuffle decoders from most of the getTargetShuffleMask variable shuffle cases (X86ISD::VPERMV3 will be handled in a future commit). llvm-svn: 345018	2018-10-23 11:33:38 +00:00
Roman Lebedev	c29dbbdb10	Revert "[X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern" Seems to be breaking sanitizer-x86_64-linux-fast buildbot, the ELF/relocatable-versioned.s test: ==17758==MemorySanitizer CHECK failed: /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191 "((kBlockMagic)) == ((((u64)addr)[0]))" (0x6a6cb03abcebc041, 0x0) #0 0x59716b in MsanCheckFailed(char const, int, char const, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/msan/msan.cc:393 #1 0x586635 in __sanitizer::CheckFailed(char const, int, char const, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_termination.cc:79 #2 0x57d5ff in __sanitizer::InternalFree(void, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191 #3 0x7fc21b24193f (/lib/x86_64-linux-gnu/libc.so.6+0x3593f) #4 0x7fc21b241999 in exit (/lib/x86_64-linux-gnu/libc.so.6+0x35999) #5 0x7fc21b22c2e7 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e7) #6 0x57c039 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld+0x57c039) This reverts commit r345014. llvm-svn: 345017	2018-10-23 10:34:57 +00:00
Simon Pilgrim	816e57be35	[TTI] Add generic cost handling of SK_Reverse shuffles These can be treated as a general permute. This required a fix for missing reverse patterns on ARM llvm-svn: 345015	2018-10-23 09:42:10 +00:00
Roman Lebedev	1c95b2f779	[X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern Summary: Continuation of D52348. We also get the `c) x & (-1 >> (32 - y))` pattern here, because of the D48768. I will add extra-uses into those tests and follow-up with a patch to handle those patterns too. Reviewers: RKSimon, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53521 llvm-svn: 345014	2018-10-23 09:08:44 +00:00
Heejin Ahn	a40303aa03	[WebAssembly] Fix assembly printing of br_table Summary: In `br_table's stack version asm string, \t was missing. Reviewers: aardappel Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53516 llvm-svn: 344981	2018-10-23 00:28:14 +00:00
Saleem Abdulrasool	96cd3cc312	X86: fix a comment copy-paste issue (NFC) The comment was copy-pasted but not updated. NFC. llvm-svn: 344973	2018-10-22 23:34:24 +00:00
Craig Topper	96889b8b96	[X86] Remove unused entries from the X86ProcFamily enum. Add a note to discourage creation of new enum entries. As we've learned multiple times, a coarse grained enum like this is not scalable and we should be migrating away from it. llvm-svn: 344972	2018-10-22 23:14:55 +00:00
Matthias Braun	a0beeffeed	X86: Do not optimize branches with undef eflags inputs analyzeBranch()/insertBranch() etc. do not properly deal with an undef flag on the eflags input and used to produce invalid MIR. I don't see this ever affecting real world inputs (I don't think it is possible to produce undef flags with llvm IR), so I simply changed the code to bail out in this case. rdar://42122367 llvm-svn: 344970	2018-10-22 22:52:23 +00:00
Craig Topper	c8e183f9ee	Recommit r344877 "[X86] Stop promoting integer loads to vXi64" I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile. Original commit message: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344965	2018-10-22 22:14:05 +00:00
Thomas Lively	c63b5fcb2a	[WebAssembly][NFC] Remove WebAssemblyStackifier TableGen backend Summary: Replace its functionality with a TableGen InstrInfo relational instruction mapping. Although arguably more complex than the TableGen backend, the relational mapping is a smaller maintenance burden than a TableGen backend. Reviewers: aardappel, aheejin, dschuff Subscribers: mgorny, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53307 llvm-svn: 344962	2018-10-22 21:55:26 +00:00
Tim Northover	a23c12a627	X86: add alias for pushfw/popfw in Intel mode A while ago we changed pushf and popf in Intel mode to generate pushfq and popfq. Unfortunately that left us with no way to get the 16-bit encoding in Intel mode so this patch adds pushfw and popfw as aliases there. llvm-svn: 344949	2018-10-22 20:38:13 +00:00
Simon Pilgrim	3b91e9676b	Revert rL344931 from llvm/trunk: [X86][SSE] getTargetShuffleMaskIndices - allow opt-in support for whole undef shuffle mask elements We can't safely assume that certain RawMask entries are UNDEF as most variable shuffles ignore non-index bits - PSHUFB only works on i8 elts so it'd be safe to use but I'm intending to come up with an alternative approach that works for all. ........ Enable this for PSHUFB constant mask decoding and remove the ConstantPool DecodePSHUFBMask llvm-svn: 344937	2018-10-22 19:01:25 +00:00
Simon Pilgrim	794f85cd93	Revert rL344933 from llvm/trunk: [X86][SSE] Tidyup DecodeVPERMILPMask shuffle mask decoding We can't safely assume that certain RawMask entries are UNDEF as most variable shuffles ignore non-index bits. ........ Add support for UNDEF raw mask elements and remove the ConstantPool DecodeVPERMILPMask usage in X86ISelLowering.cpp llvm-svn: 344936	2018-10-22 18:58:32 +00:00
Simon Pilgrim	476c9f42fc	[X86][SSE] Tidyup DecodeVPERMILPMask shuffle mask decoding Add support for UNDEF raw mask elements and remove the ConstantPool DecodeVPERMILPMask usage in X86ISelLowering.cpp llvm-svn: 344933	2018-10-22 18:35:13 +00:00
Simon Pilgrim	3521367ff3	[X86][SSE] getTargetShuffleMaskIndices - allow opt-in support for whole undef shuffle mask elements Enable this for PSHUFB constant mask decoding and remove the ConstantPool DecodePSHUFBMask llvm-svn: 344931	2018-10-22 18:09:02 +00:00
Simon Pilgrim	5dff767c25	[X86] getTargetConstantBitsFromNode - handle extraction from larger constant pool entries First step towards removing X86ShuffleDecodeConstantPool usage from X86ISelLowering.cpp llvm-svn: 344924	2018-10-22 17:43:33 +00:00
Craig Topper	8d8dcfe690	Revert r344877 "[X86] Stop promoting integer loads to vXi64" Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out. llvm-svn: 344921	2018-10-22 16:59:24 +00:00
Matt Arsenault	687ec75d10	DAG: Change behavior of fminnum/fmaxnum nodes Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914	2018-10-22 16:27:27 +00:00
Simon Pilgrim	6f5cd7c67f	[X86][SSE] getTargetShuffleMask - pull out repeated shuffle mask element size. NFCI. llvm-svn: 344910	2018-10-22 15:33:30 +00:00
Roman Lebedev	898808504d	[X86] X86DAGToDAGISel: handle BZHI selection too, not just BEXTR. Summary: As discussed in D52304 / IRC, we now have pattern matching for 'bit extract' in two places - tablegen and `X86DAGToDAGISel`. There are 4 patterns. And we will have a problem with `x & (-1 >> (32 - y))` pattern. * If the mask is one-use, then it is always unfolded into `x << (32 - y) >> (32 - y)` first. Thus, the existing test coverage is already broken. * If it is not one-use, then it is not unfolded, and is matched as BZHI. * If it is not one-use, we will not match it as BEXTR. And if it is one-use, it will have been unfolded already. So we will either not handle that pattern for BEXTR, or not have test coverage for it. This is bad. As discussed with @craig.topper, let's unify this matching, and do everything in `X86DAGToDAGISel`. Then we will not have code duplication, and will have proper test coverage. This indeed does not affect any tests, and this is great. It means that for these two patterns, the `X86DAGToDAGISel` is identical to the tablegen version. Please review carefully, i'm not fully sure about that intrinsic change, and introduction of the new `X86ISD` opcode. Reviewers: craig.topper, RKSimon, spatel Reviewed By: craig.topper Subscribers: llvm-commits, craig.topper Differential Revision: https://reviews.llvm.org/D53164 llvm-svn: 344904	2018-10-22 14:12:44 +00:00
Roman Lebedev	13c5ab2e27	[X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ((1 << nbits) + (-1)) pattern Summary: Trivial continuation of D52304. While this pattern is not canonical, we do select it in the BZHI case, so this should not be any different. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52348 llvm-svn: 344902	2018-10-22 13:54:17 +00:00
Petar Avramovic	e72a743740	Test commit: change comment. llvm-svn: 344900	2018-10-22 13:27:50 +00:00
Nemanja Ivanovic	674581afbb	[PowerPC][NFC] Fix bugs in r+r to r+i conversion The D-Form VSX loads introduced in ISA 3.0 are not direct D-Form equivalent of the corresponding X-Forms since they only target the Altivec registers. Namely LXSSPX can load into any of the 64 VSX registers whereas LXSSP can only load into the upper 32 VSX registers. Similarly with the remaining affected instructions. There is currently no way that I can see to trigger the bug, but as we add other ways of exploiting these instructions, there may very well be instances that do. This is an NFC patch in practical terms since the changes it introduces can not be triggered without an MIR test. Differential revision: https://reviews.llvm.org/D53323 llvm-svn: 344894	2018-10-22 11:22:59 +00:00
Craig Topper	290c081d91	[X86] Add patterns for vector and/or/xor/andn with other types than vXi64. This makes fast isel treat all legal vector types the same way. Previously only vXi64 was in the fast-isel tables. This unfortunately prevents matching of andn by fast-isel for these types since the requires SelectionDAG. But we already had this issue for vXi64. So at least we're consistent now. Interestinly it looks like fast-isel can't handle instructions with constant vector arguments so the the not part of the andn patterns is selected with SelectionDAG. This explains why VPTERNLOG shows up in some of the tests. This is a subset of D53268. As I make progress on that, I will try to reduce the number of lines in the tablegen files. llvm-svn: 344884	2018-10-22 06:30:22 +00:00
Craig Topper	321df5b0d4	[X86] Stop promoting integer loads to vXi64 Summary: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast. I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size. I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344877	2018-10-21 21:30:26 +00:00
Craig Topper	8de07b4db1	Revert r344873 "foo" Rebase gone wrong left this in my tree. llvm-svn: 344875	2018-10-21 21:08:37 +00:00
Craig Topper	5eea94edd4	[X86] Remove SDIVREM8_SEXT_HREG/UDIVREM8_ZEXT_HREG and their associated DAG combine and target bits support. Use a post isel peephole instead. Summary: These nodes exist to overcome an isel problem where we can generate a zero extend of an AH register followed by an extract subreg, and another zero extend. The first zero extend exists to avoid a partial register update copying the AH register into the low 8-bits. The second zero extend exists if the user wanted the remainder zero extended. To make this work we had a DAG combine to morph the DIVREM opcode to a special opcode that included the extend. But then we had to add the new node to computeKnownBits and computeNumSignBits to process the extension portion. This patch instead removes all of that and adds a late peephole to detect the two extends. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53449 llvm-svn: 344874	2018-10-21 21:07:27 +00:00
Craig Topper	e367039fe5	foo llvm-svn: 344873	2018-10-21 21:07:25 +00:00
Simon Pilgrim	eb806d5f30	[X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 unary shuffle lowering llvm-svn: 344868	2018-10-21 17:07:50 +00:00
Simon Pilgrim	abc24fdb94	[X86] Only extract constant pool shuffle mask data with zero offsets D53306 exposes an issue where we sometimes use constant pool data from bigger vectors than the target shuffle mask. This should be safe to do, but we have to be certain that we're using the bottom most part of the vector as the shuffle mask decoders have no way to peek into subvectors with non-zero offsets. llvm-svn: 344867	2018-10-21 11:55:56 +00:00
Thomas Lively	5ea17d450e	[WebAssembly] Implement vector sext_inreg and tests with comparisons Summary: Depends on D53251. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53252 llvm-svn: 344826	2018-10-20 01:35:23 +00:00
Thomas Lively	55735d522d	[WebAssembly] Custom lower i64x2 constant shifts to avoid wrap Summary: Depends on D53057. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53251 llvm-svn: 344825	2018-10-20 01:31:18 +00:00
Changpeng Fang	f95f763ea5	AMDGPU: Add support pattern for SUB of one bit Summary: Add selection patterns to support one bit Sub. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D52946 llvm-svn: 344815	2018-10-19 21:09:21 +00:00
Craig Topper	5ed1099962	[X86] Remove some left over code from when MVT:i1 was a legal type for AVX512. llvm-svn: 344813	2018-10-19 20:44:33 +00:00
Craig Topper	5c81c68385	[X86] In PostprocessISelDAG, start from allnodes_end, not the root. There is no guarantee the root is at the end if isel created any nodes without morphing them. This includes the nodes created by manual isel from C++ code in X86ISelDAGToDAG. This is similar to r333415 from PowerPC which is where I originally stole the peephole loop from. I don't have a test case, but without this a future patch doesn't work which is how I found it. llvm-svn: 344808	2018-10-19 19:24:42 +00:00
Thomas Lively	11a332d08d	[WebAssembly] Handle undefined lane indices in SIMD patterns Summary: Undefined indices in shuffles can be used when not all lanes of the output vector will be used. This happens for example in the expansion of vector reduce operations. Regardless, undefs are legal as lane indices in IR and should be supported. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53057 llvm-svn: 344803	2018-10-19 19:08:06 +00:00
Krzysztof Parzyszek	6bfc6577f2	[Hexagon] Remove support for V4 llvm-svn: 344791	2018-10-19 17:31:11 +00:00
Fangrui Song	2e83b2e9ee	Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC llvm-svn: 344774	2018-10-19 06:12:02 +00:00
Eli Friedman	b09c778715	Revert r344693 ("[ARM] bottom-top mul support in ARMParallelDSP") Still causing failures on the polly-aosp buildbot; I'll follow up with a reduced testcase. llvm-svn: 344752	2018-10-18 19:34:30 +00:00
Kristina Brooks	312fcc116b	[X86] Support for the mno-tls-direct-seg-refs flag Allows to disable direct TLS segment access (%fs or %gs). GCC supports a similar flag, it can be useful in some circumstances, e.g. when a thread context block needs to be updated directly from user space. More info and specific use cases: https://bugs.llvm.org/show_bug.cgi?id=16145 There is another revision for clang as well. Related: D53102 All X86 CodeGen tests appear to pass: ``` [46/47] Running lit suite /SourceCache/llvm-trunk-8.0/test/CodeGen Testing Time: 23.17s Expected Passes : 3801 Expected Failures : 15 Unsupported Tests : 8021 ``` Reviewed by: Craig Topper. Patch by nruslan (Ruslan Nikolaev). Differential Revision: https://reviews.llvm.org/D53103 llvm-svn: 344723	2018-10-18 03:14:37 +00:00
Nicolai Haehnle	4821937d2e	AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI Summary: To workaround a hardware issue in the (base + offset) calculation when base is negative. The impact on code quality should be limited since SILoadStoreOptimizer still runs afterwards and is able to combine loads/stores based on known sign information. This fixes visible corruption in Hitman on SI (easily reproducible by running benchmark mode). Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923 Reviewers: arsenm, mareko Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53160 llvm-svn: 344698	2018-10-17 15:37:48 +00:00
Nicolai Haehnle	c4a2ff0950	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 344696	2018-10-17 15:37:30 +00:00
Sam Parker	2ef3c0dad6	[ARM] bottom-top mul support in ARMParallelDSP Previously reverted in rL343082. Original commit message: On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 344693	2018-10-17 13:02:48 +00:00
Nicolai Haehnle	e9b134aa31	AMDGPU: Remove dead TableGen code Summary: Change-Id: Ic1f2c1d0cf9e90a0baa9fc6bacd0d3c386069fb0 Reviewers: tpr Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53318 Change-Id: Ib4d143c898801e5cf6cb9999a495d62c91ae77fb llvm-svn: 344691	2018-10-17 12:14:26 +00:00
Petar Jovanovic	8a08412533	[MIPS GlobalISel] Legalize constants Legalize s1, s8, s16 and s64 G_CONSTANT for MIPS32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D53077 llvm-svn: 344684	2018-10-17 10:30:03 +00:00
Sjoerd Meijer	64cfb74a61	[ARM] Do not fuse VADD and VMUL, continued (2/2) This is patch 2/2, following up on D53314, and is the functional change to prevent fusing mul + add sequences into VFMAs. Differential revision: https://reviews.llvm.org/D53315 llvm-svn: 344683	2018-10-17 10:05:44 +00:00
Sjoerd Meijer	1a213a42d6	[ARM] Follow up of rL344671, attempt to pacify a buildbot It was rightfully complaining about an unpretty logical expression. llvm-svn: 344677	2018-10-17 07:51:24 +00:00
Sjoerd Meijer	ff3ab33ec8	[ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2) This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs for performance reasons on the Cortex-M4 and Cortex-M33. This is a serie of 2 patches, that is trying to achieve the same for VFMA. The second column in the table below shows what we were generating before rL342874, the third column what changed with rL342874, and the last column what we want to achieve with these 2 patches: -------------------------------------------------------- \| Opt \| < rL342874 \| >= rL342874 \| \| \|------------------------------------------------------\| \|-O3 \| vmla \| vmul \| vmul \| \| \| \| vadd \| vadd \| \|------------------------------------------------------\| \|-Ofast \| vfma \| vfma \| vmul \| \| \| \| \| vadd \| \|------------------------------------------------------\| \|-Oz \| vmla \| vmla \| vmla \| -------------------------------------------------------- This patch 1/2, is a cleanup of the spaghetti predicate logic on the different VMLA and VFMA codegen rules, so that we can make the final functional change in patch 2/2. This also fixes a typo in the regression test added in rL342874. Differential revision: https://reviews.llvm.org/D53314 llvm-svn: 344671	2018-10-17 07:26:35 +00:00
Craig Topper	e0a992918b	[X86] Match (cmp (and (shr X, C), mask), 0) to BEXTR+TEST. Without this we match the CMP+AND to a TEST and then match the SHR separately. I'm trusting analyzeCompare to remove the TEST during the peephole pass. Otherwise we need to check the flag users to see if they only use the Z flag. This recovers a case lost by r344270. Differential Revision: https://reviews.llvm.org/D53310 llvm-svn: 344649	2018-10-16 22:29:36 +00:00
Krasimir Georgiev	547d824da6	Revert "[WebAssembly] LSDA info generation" This reverts commit r344575. Newly introduced test eh-lsda.ll.test fails with use-after-free under ASAN build. llvm-svn: 344639	2018-10-16 18:50:09 +00:00
Evandro Menezes	c98decf864	[PATCH] [NFC][AArch64] Fix refactoring of macro fusion Fix compiler error. llvm-svn: 344632	2018-10-16 17:41:45 +00:00
Evandro Menezes	46eadcff9c	[NFC][ARM] Refactor macro fusion Simplify code for wildcards. llvm-svn: 344625	2018-10-16 17:19:51 +00:00
Evandro Menezes	de655c6d3a	[NFC][AArch64] Refactor macro fusion Simplify API of checking functions. llvm-svn: 344624	2018-10-16 17:19:28 +00:00
Simon Pilgrim	7d27cfdcb2	[X86] Fix Skylake ReadAfterLd for PADDrm etc. Missed in rL343868 as due to their custom InstrRW. llvm-svn: 344600	2018-10-16 09:50:16 +00:00
Aleksandar Beserminji	a5949439ca	[mips][micromips] Fix how values in .gcc_except_table are calculated When a landing pad is calculated in a program that is compiled for micromips, it will point to an even address. Such an error will cause a segmentation fault, as the instructions in micromips are aligned on odd addresses. This patch sets the last bit of the offset where a landing pad is, to 1, which will effectively be an odd address and point to the instruction exactly. Differential Revision: https://reviews.llvm.org/D52985 llvm-svn: 344591	2018-10-16 08:27:28 +00:00
Heejin Ahn	0981eaab47	[WebAssembly] LSDA info generation Summary: This adds support for LSDA (exception table) generation for wasm EH. Wasm EH mostly follows the structure of Itanium-style exception tables, with one exception: a call site table entry in wasm EH corresponds to not a call site but a landing pad. In wasm EH, the VM is responsible for stack unwinding. After an exception occurs and the stack is unwound, the control flow is transferred to wasm 'catch' instruction by the VM, after which the personality function is called from the compiler-generated code. (Refer to WasmEHPrepare pass for more information on this part.) This patch: - Changes wasm.landingpad.index intrinsic to take a token argument, to make this 1:1 match with a catchpad instruction - Stores landingpad index info and catch type info MachineFunction in before instruction selection - Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an exception table - Adds WasmException class with overridden methods for table generation - Adds support for LSDA section in Wasm object writer Reviewers: dschuff, sbc100, rnk Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52748 llvm-svn: 344575	2018-10-16 00:09:12 +00:00
Craig Topper	e70c560b6d	[X86] Remove some isel patterns that shouldn't be possible. These included a bitcast of a load from v4f32 to v2f64, but DAG combine should have already changed the type of the load to remove the cast. llvm-svn: 344573	2018-10-15 23:34:58 +00:00
Craig Topper	2909a3d9d0	[X86] Fix a bad bitcast in the load form of vXi16 uniform shift patterns for EVEX encoded instructions. llvm-svn: 344563	2018-10-15 21:51:32 +00:00
Simon Pilgrim	095a7fe635	[AARCH64] Improve vector popcnt lowering with ADDLP AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors. This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258. Differential Revision: https://reviews.llvm.org/D53259 llvm-svn: 344554	2018-10-15 21:15:58 +00:00
Konstantin Zhuravlyov	94dfcc2eb2	AMDGPU: Generate .amdgcn_target for object code v3 Differential Revision: https://reviews.llvm.org/D53221 llvm-svn: 344552	2018-10-15 20:37:47 +00:00
Aleksandar Beserminji	81eb440772	[mips][micromips] Fix overlaping FDEs error When compiling static executable for micromips, CFI symbols are incorrectly labeled as MICROMIPS, which cause ".eh_frame_hdr refers to overlapping FDEs." error. This patch does not label CFI symbols as MICROMIPS, and FDEs do not overlap anymore. This patch also exposes another bug, which is fixed here: https://reviews.llvm.org/D52985 Differential Revision: https://reviews.llvm.org/D52987 llvm-svn: 344516	2018-10-15 14:39:12 +00:00

1 2 3 4 5 ...

49684 Commits