llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	059ebc90ee	[x86] try to make test immune to better div optimization; NFCI llvm-svn: 345639	2018-10-30 20:42:03 +00:00
Craig Topper	6958b5ffa9	[X86] In lowerVectorShuffleAsBroadcast, make peeking through CONCAT_VECTORS work correctly if we already walked through a bitcast that changed the element size. The CONCAT_VECTORS case was using the original mask element count to determine how to adjust the broadcast index. But if we looked through a bitcast the original mask size doesn't tell us anything about the concat_vectors. This patch switchs to using the concat_vectors input element count directly instead. Differential Revision: https://reviews.llvm.org/D53823 llvm-svn: 345626	2018-10-30 18:48:42 +00:00
Jonas Paulsson	611b533f1d	[SchedModel] Fix for read advance cycles with implicit pseudo operands. The SchedModel allows the addition of ReadAdvances to express that certain operands of the instructions are needed at a later point than the others. RegAlloc may add pseudo operands that are not part of the instruction descriptor, and therefore cannot have any read advance entries. This meant that in some cases the desired read advance was nullified by such a pseudo operand, which still had the original latency. This patch fixes this by making sure that such pseudo operands get a zero latency during DAG construction. Review: Matthias Braun, Ulrich Weigand. https://reviews.llvm.org/D49671 llvm-svn: 345606	2018-10-30 15:04:40 +00:00
Sanjay Patel	8b207defea	[DAGCombiner] narrow vector binops when extraction is cheap Narrowing vector binops came up in the demanded bits discussion in D52912. I don't think we're going to be able to do this transform in IR as a canonicalization because of the risk of creating unsupported widths for vector ops, but we already have a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression test diffs). This is artificially limited to not look through bitcasts because there are so many test diffs already, but that's marked with a TODO and is a small follow-up. Differential Revision: https://reviews.llvm.org/D53784 llvm-svn: 345602	2018-10-30 14:14:34 +00:00
Francis Visoiu Mistrih	85d3f1ee8f	[llc] Error out when -print-machineinstrs is used with an unknown pass We used to assert instead of reporting an error. PR39494 llvm-svn: 345589	2018-10-30 12:07:18 +00:00
Roman Lebedev	9ffca9b83c	[X86] Add extra-uses on the mask of pattern c of extract-{low,}bits.ll tests Summary: Because of the D48768, that pattern is always unfolded into pattern d, thus we had no test coverage. Reviewers: RKSimon, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53574 llvm-svn: 345583	2018-10-30 11:12:29 +00:00
Simon Pilgrim	858303b827	[SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodes Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy. This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle. Differential Revision: https://reviews.llvm.org/D53760 llvm-svn: 345578	2018-10-30 10:32:11 +00:00
David Bolvansky	dfdbb038e8	[DAGCombiner] Improve X div/rem Y fold if single bit element type Summary: Tests by @spatel, thanks Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: sdardis, atanasyan, llvm-commits, spatel Differential Revision: https://reviews.llvm.org/D52668 llvm-svn: 345575	2018-10-30 09:07:22 +00:00
Matthias Braun	c045c557b0	Relax fast register allocator related test cases; NFC - Relex hard coded registers and stack frame sizes - Some test cleanups - Change phi-dbg.ll to match on mir output after phi elimination instead of going through the whole codegen pipeline. This is in preparation for https://reviews.llvm.org/D52010 I'm committing all the test changes upfront that work before and after independently. llvm-svn: 345532	2018-10-29 20:10:42 +00:00
Simon Pilgrim	3a2f3c2c0a	[X86][SSE] getFauxShuffleMask - Fix shuffle mask adjustment for multiple inserted subvectors Part of the issue discovered in PR39483, although its not fully exposed until I reapply rL345395 (by reverting rL345451) llvm-svn: 345520	2018-10-29 18:25:48 +00:00
Francis Visoiu Mistrih	61c9de7565	[X86] Enable the MachineVerifier by default The machine verifier was disabled for x86 by default. There are now only 9 tests failing, compared to what previously was between 20 and 30. This is a good opportunity to file bugs for all the remaining issues, then explicitly disable the failing tests and enabling the machine verifier by default. This allows us to avoid adding new tests that break the verifier. PR27481 llvm-svn: 345513	2018-10-29 16:57:43 +00:00
Leonard Chan	905abe5b5d	[Intrinsic] Signed and Unsigned Saturation Subtraction Intirnsics Add an intrinsic that takes 2 integers and perform saturation subtraction on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53783 llvm-svn: 345512	2018-10-29 16:54:37 +00:00
Francis Visoiu Mistrih	bb1bd9ed79	[X86] Remove outdated test This test breaks the X86 MachineVerifier. It looks like the MIR part is completely useless. The original author suggests that it can be removed. Differential Revision: https://reviews.llvm.org/D53767 llvm-svn: 345501	2018-10-29 13:41:46 +00:00
Craig Topper	aa5eb2fbaa	[X86] Force floating point values in constant pool decoding to print in scientific notation so they can't be confused with integers. When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction. Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that. llvm-svn: 345488	2018-10-29 04:52:04 +00:00
Craig Topper	42aa87143d	[X86] Recognize constant splats in LowerFCOPYSIGN. llvm-svn: 345484	2018-10-28 23:51:35 +00:00
Craig Topper	8164f3923e	[X86] Add test case to show failure to handle splat vectors in the constant check in LowerFCOPYSIGN. llvm-svn: 345483	2018-10-28 23:51:33 +00:00
Roman Lebedev	1c340bc7ca	[X86][NFC] sse42-schedule.ll: disable XOP for BdVer2 tests Else we are clearly testing the wrong instruction. llvm-svn: 345476	2018-10-28 13:39:10 +00:00
Roman Lebedev	3adf88b746	[X86][NFC] sse41-schedule.ll: disable XOP for BdVer2 tests Else we are clearly testing the wrong instruction. llvm-svn: 345475	2018-10-28 13:39:06 +00:00
Roman Lebedev	cc554e4456	[X86][NFC] sse2-schedule.ll: disable XOP for BdVer2 tests Else we are clearly testing the wrong instruction. llvm-svn: 345474	2018-10-28 13:39:01 +00:00
Simon Pilgrim	9b77f0c291	[VectorLegalizer] Enable TargetLowering::expandFP_TO_UINT support. Add vector support to TargetLowering::expandFP_TO_UINT. This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this. llvm-svn: 345473	2018-10-28 13:07:25 +00:00
Craig Topper	c4b785ae1e	[DAGCombiner] Better constant vector support for FCOPYSIGN. Enable constant folding when both operands are vectors of constants. Turn into FNEG/FABS when the RHS is a splat constant vector. llvm-svn: 345469	2018-10-28 01:32:49 +00:00
Craig Topper	f206447dcd	[X86] Add test cases showing missed opportunities for optimizing vector fcopysign when the RHS is a splat constant. llvm-svn: 345468	2018-10-28 01:32:47 +00:00
Roman Lebedev	a5baf86744	AMD BdVer2 (Piledriver) Initial Scheduler model Summary: # Overview This is somewhat partial. * Latencies are good {F7371125} * All of these remaining inconsistencies //appear// to be noise/noisy/flaky. * NumMicroOps are somewhat good {F7371158} * Most of the remaining inconsistencies are from `Ld` / `Ld_ReadAfterLd` classes * Actual unit occupation (pipes, `ResourceCycles`) are undiscovered lands, i did not really look there. They are basically verbatum copy from `btver2` * Many `InstRW`. And there are still inconsistencies left... To be noted: I think this is the first new schedule profile produced with the new next-gen tools like llvm-exegesis! # Benchmark I realize that isn't what was suggested, but i'll start with some "internal" public real-world benchmark i understand - [[ https://github.com/darktable-org/rawspeed \| RawSpeed raw image decoding library ]]. Diff (the exact clang from trunk without/with this patch): ``` Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_mean -0.0607 -0.0604 234 219 233 219 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_median -0.0630 -0.0626 233 219 233 219 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_stddev +0.2581 +0.2587 1 2 1 2 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_mean -0.0770 -0.0767 144 133 144 133 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_median -0.0767 -0.0763 144 133 144 133 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_stddev -0.4170 -0.4156 1 0 1 0 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_mean -0.0271 -0.0270 463 450 463 450 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_median -0.0093 -0.0093 453 449 453 449 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_stddev -0.7280 -0.7280 13 4 13 4 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_pvalue 0.0004 0.0004 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_mean -0.0065 -0.0065 569 565 569 565 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_median -0.0077 -0.0077 569 564 569 564 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_stddev +1.0077 +1.0068 2 5 2 5 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_pvalue 0.0220 0.0199 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_mean +0.0006 +0.0007 312 312 312 312 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_median +0.0031 +0.0032 311 312 311 312 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_stddev -0.7069 -0.7072 4 1 4 1 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_pvalue 0.0004 0.0004 U Test, Repetitions: 25 vs 25 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_mean -0.0015 -0.0015 141 141 141 141 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_median -0.0010 -0.0011 141 141 141 141 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_stddev -0.1486 -0.1456 0 0 0 0 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_pvalue 0.6139 0.8766 U Test, Repetitions: 25 vs 25 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_mean -0.0008 -0.0005 60 60 60 60 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_median -0.0006 -0.0002 60 60 60 60 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_stddev -0.1467 -0.1390 0 0 0 0 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_pvalue 0.0137 0.0137 U Test, Repetitions: 25 vs 25 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_mean +0.0002 +0.0002 275 275 275 275 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_median -0.0015 -0.0014 275 275 275 275 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_stddev +3.3687 +3.3587 0 2 0 2 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_pvalue 0.4041 0.3933 U Test, Repetitions: 25 vs 25 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_mean +0.0004 +0.0004 67 67 67 67 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_median -0.0000 -0.0000 67 67 67 67 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_stddev +0.1947 +0.1995 0 0 0 0 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_pvalue 0.0074 0.0001 U Test, Repetitions: 25 vs 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_mean -0.0092 +0.0074 547 542 25 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_median -0.0054 +0.0115 544 541 25 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_stddev -0.4086 -0.3486 8 5 0 0 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_pvalue 0.3320 0.0000 U Test, Repetitions: 25 vs 25 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_mean +0.0015 +0.0204 218 218 12 12 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_median +0.0001 +0.0203 218 218 12 12 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_stddev +0.2259 +0.2023 1 1 0 0 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_pvalue 0.0000 0.0001 U Test, Repetitions: 25 vs 25 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_mean -0.0209 -0.0179 96 94 90 88 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_median -0.0182 -0.0155 95 93 90 88 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_stddev -0.6164 -0.2703 2 1 2 1 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_mean -0.0098 -0.0098 176 175 176 175 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_median -0.0126 -0.0126 176 174 176 174 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_stddev +6.9789 +6.9157 0 2 0 2 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_mean -0.0237 -0.0238 474 463 474 463 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_median -0.0267 -0.0267 473 461 473 461 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_stddev +0.7179 +0.7178 3 5 3 5 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_pvalue 0.6837 0.6554 U Test, Repetitions: 25 vs 25 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_mean -0.0014 -0.0013 1375 1373 1375 1373 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_median +0.0018 +0.0019 1371 1374 1371 1374 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_stddev -0.7457 -0.7382 11 3 10 3 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_mean -0.0080 -0.0289 22 22 10 10 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_median -0.0070 -0.0287 22 22 10 10 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_stddev +1.0977 +0.6614 0 0 0 0 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_mean +0.0132 +0.0967 35 36 10 11 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_median +0.0132 +0.0956 35 36 10 11 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_stddev -0.0407 -0.1695 0 0 0 0 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_mean +0.0331 +0.1307 13 13 6 6 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_median +0.0430 +0.1373 12 13 6 6 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_stddev -0.9006 -0.8847 1 0 0 0 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_pvalue 0.0016 0.0010 U Test, Repetitions: 25 vs 25 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_mean -0.0023 -0.0024 395 394 395 394 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_median -0.0029 -0.0030 395 394 395 393 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_stddev -0.0275 -0.0375 1 1 1 1 Phase One/P65/CF027310.IIQ/threads:8/real_time_pvalue 0.0232 0.0000 U Test, Repetitions: 25 vs 25 Phase One/P65/CF027310.IIQ/threads:8/real_time_mean -0.0047 +0.0039 114 113 28 28 Phase One/P65/CF027310.IIQ/threads:8/real_time_median -0.0050 +0.0037 114 113 28 28 Phase One/P65/CF027310.IIQ/threads:8/real_time_stddev -0.0599 -0.2683 1 1 0 0 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_mean +0.0206 +0.0207 405 414 405 414 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_median +0.0204 +0.0205 405 414 405 414 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_stddev +0.2155 +0.2212 1 1 1 1 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_mean -0.0109 -0.0108 147 145 147 145 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_median -0.0104 -0.0103 147 145 147 145 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_stddev -0.4919 -0.4800 0 0 0 0 Samsung/NX3000/_3184416.SRW/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX3000/_3184416.SRW/threads:8/real_time_mean -0.0149 -0.0147 220 217 220 217 Samsung/NX3000/_3184416.SRW/threads:8/real_time_median -0.0173 -0.0169 221 217 220 217 Samsung/NX3000/_3184416.SRW/threads:8/real_time_stddev +1.0337 +1.0341 1 3 1 3 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_pvalue 0.0001 0.0001 U Test, Repetitions: 25 vs 25 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_mean -0.0019 -0.0019 194 193 194 193 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_median -0.0021 -0.0021 194 193 194 193 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_stddev -0.4441 -0.4282 0 0 0 0 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_pvalue 0.0000 0.4263 U Test, Repetitions: 25 vs 25 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_mean +0.0258 -0.0006 81 83 19 19 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_median +0.0235 -0.0011 81 82 19 19 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_stddev +0.1634 +0.1070 1 1 0 0 ``` {F7443905} If we look at the `_mean`s, the time column, the biggest win is `-7.7%` (`Canon/EOS 5D Mark II/10.canon.sraw2.cr2`), and the biggest loose is `+3.3%` (`Panasonic/DC-GH5S/P1022085.RW2`); Overall: mean `-0.7436%`, median `-0.23%`, `cbrt(sum(time^3))` = `-8.73%` Looks good so far i'd say. llvm-exegesis details: {F7371117} {F7371125} {F7371128} {F7371144} {F7371158} Reviewers: craig.topper, RKSimon, andreadb, courbet, avt77, spatel, GGanesh Reviewed By: andreadb Subscribers: javed.absar, gbedwell, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52779 llvm-svn: 345463	2018-10-27 20:46:30 +00:00
Roman Lebedev	a51921877a	[NFC][X86] Baseline tests for AMD BdVer2 (Piledriver) Scheduler model Adding the baseline tests in a preparatory NFC commit, so that the actual commit shows the diff. Yes, i'm aware that a few of these codegen-based sched tests are testing wrong instructions, i will fix that afterwards. For https://reviews.llvm.org/D52779 llvm-svn: 345462	2018-10-27 20:36:11 +00:00
Sanjay Patel	15aae98424	[x86] make test immune to improved extraction in D53784; NFC llvm-svn: 345455	2018-10-27 16:46:10 +00:00
Craig Topper	4b89647b79	[X86] Add some isel patterns for scalar_to_vector/extract_vector_element that use the avx512 extended register classes when they are available. llvm-svn: 345448	2018-10-27 05:35:20 +00:00
Jonas Devlieghere	c5dd2995dc	Further split cpus test On GreenDragon, CodeGen/X86/cpus-no-x86_64.ll was still timing out even after breaking up the original test. I further split off the intel and AMD cpus which hopefully resolves this. http://green.lab.llvm.org/green/job/clang-stage2-cmake-RgSan/ llvm-svn: 345438	2018-10-26 23:50:23 +00:00
Sanjay Patel	bd48629041	[x86] adjust tests to preserve behavior; NFC I'm planning a binop optimization that would subvert the domain forcing ops in these tests, so turning them into zexts. llvm-svn: 345437	2018-10-26 23:06:28 +00:00
Reid Kleckner	98d880fbd7	[Spectre] Fix MIR verifier errors in retpoline thunks Summary: The main challenge here is that X86InstrInfo::AnalyzeBranch doesn't understand the way we're using a CALL instruction as a branch, so we can't list the CallTarget MBB as a successor of the entry block. If we don't list it as a successor, then the AsmPrinter doesn't print a label for the MBB. Fix the issue by inserting our own label at the beginning of the call target block. We can rely on the AsmPrinter to always emit it, even though the block appears to be unreachable, but address-taken. Fixes PR38391. Reviewers: thegameg, chandlerc, echristo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D53653 llvm-svn: 345426	2018-10-26 20:26:36 +00:00
Craig Topper	8315d9990c	[X86] Stop promoting vector and/or/xor/andn to vXi64. These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation. This patch removes the promotion and adds isel patterns instead. The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward. Differential Revision: https://reviews.llvm.org/D53268 llvm-svn: 345408	2018-10-26 17:21:26 +00:00
Sanjay Patel	6b40768f5a	[x86] commute blendvb with constant condition op to allow load folding This is a narrow fix for 1 of the problems mentioned in PR27780: https://bugs.llvm.org/show_bug.cgi?id=27780 I looked at more general solutions, but it's a mess. We canonicalize shuffle masks based on the number of elements accessed from each operand, and that's not optional. If you remove that, we'll crash because we fail to match isel patterns. So I'm waiting until we're sure that we have blendvb with constant condition and then commuting based on the load potential. Other cases like blend-with-immediate are already handled elsewhere, so this is probably not a common problem anyway. I didn't use "MayFoldLoad" because that checks for one-use and in these cases, we've screwed that up by creating a temporary PSHUFB using these operands that we're counting on to be killed later. Undoing that didn't look like a simple task because it's intertwined with determining if we actually use both operands of the shuffle or not.a Differential Revision: https://reviews.llvm.org/D53737 llvm-svn: 345390	2018-10-26 14:58:13 +00:00
Francis Visoiu Mistrih	08d321c9f9	[CodeGen] Remove out operands from PATCHABLE_OP The current model requires 1 out operand, but it is not used nor created. This fixed an x86 machine verifier issue. Part of PR27481. llvm-svn: 345384	2018-10-26 13:37:25 +00:00
Simon Pilgrim	11c01f402f	Regenerate test llvm-svn: 345379	2018-10-26 12:33:56 +00:00
George Rimar	088d96b43d	[Codegen] - Implement basic .debug_loclists section emission (DWARF5). .debug_loclists is the DWARF 5 version of the .debug_loc. With that patch, it will be emitted when DWARF 5 is used. Differential revision: https://reviews.llvm.org/D53365 llvm-svn: 345377	2018-10-26 11:25:12 +00:00
Sanjay Patel	c14aafdacc	[x86] add tests for missed load folding; NFC llvm-svn: 345325	2018-10-25 22:23:27 +00:00
Craig Topper	813064bf4d	[X86] Change X86 backend to look for 'min-legal-vector-width' attribute instead of 'required-vector-width' when determining whether 512-bit vectors should be legal. The required-vector-width attribute was only used for backend testing and has never been generated by clang. I believe clang is now generating min-legal-vector-width for vector uses in user code. With this I believe passing -mprefer-vector-width=256 to clang should prevent use of zmm registers in the generated assembly unless the user used a 512-bit intrinsic in their source code. llvm-svn: 345317	2018-10-25 21:16:06 +00:00
Francis Visoiu Mistrih	5be9e6de89	[CodeGen] Remove operands from FENTRY_CALL FENTRY_CALL is actually not taking any input / output operands. The machine verifier complains now because the target description says that: * It needs 1 unknown output * It needs 1 or more variable inputs llvm-svn: 345316	2018-10-25 21:12:15 +00:00
Craig Topper	4a825e7b29	[X86] Add some non-AVX512VL command lines to the *vl-vec-test-testn.ll tests. This will expose some regressions in the WIP and/or/xor promotion removal patch. llvm-svn: 345297	2018-10-25 18:23:48 +00:00
Craig Topper	ce0bc3814b	[X86] Add KNL command lines to movmsk-cmp.ll. Some of this code looks pretty bad and we should probably still be using movmskb more with avx512f. llvm-svn: 345293	2018-10-25 18:06:25 +00:00
Craig Topper	5d787ac4be	[X86] Remove some uarch tuning flags from KNL that look to have been inherited from SNB/IVB incorrectly KNL is based on a modified Silvermont core so I don't think these features apply. I think the LEA flag is probably also wrong, but I'm less sure as I barely understand the 3 LEA flags we have currently. Differential Revision: https://reviews.llvm.org/D53671 llvm-svn: 345285	2018-10-25 17:28:57 +00:00
Francis Visoiu Mistrih	7d55dd673b	[X86] Fix llc invocation on MIR test case The current state of the llc invocation is: * Running all the passes from dwarfehprepare to stack coloring (included) * It runs it from the LLVM IR included in the file * It ADDS the generated MI from ISel to the MI in the MIR file * The machine verifier doesn't like it. Differential Revision: https://reviews.llvm.org/D53698 llvm-svn: 345266	2018-10-25 14:11:07 +00:00
Simon Pilgrim	838eb24014	[TargetLowering] Improve vXi64 UINT_TO_FP vXf64 support (P38226) As suggested on D52965, this patch moves the i64 to f64 UINT_TO_FP expansion code from LegalizeDAG into TargetLowering and makes it available to LegalizeVectorOps as well. Not only does this help perform X86 lowering as a true vectorization instead of (partially vectorized) scalar conversions, it avoids the HADDPD op from the scalar code which can be slow on most targets. The AVX512F does have the vcvtusi2sdq scalar operation but we don't unroll to use it as it seems to only help for the v2f64 case - otherwise the unrolling cost will certainly be too high. My feeling is that we should leave it to the vectorizers - and if it generates the vector UINT_TO_FP we should use it. Differential Revision: https://reviews.llvm.org/D53649 llvm-svn: 345256	2018-10-25 11:15:57 +00:00
Carlos Alberto Enciso	9a24e1a7cd	[DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG. When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines. Differential Revision: https://reviews.llvm.org/D53287 llvm-svn: 345250	2018-10-25 09:58:59 +00:00
Reid Kleckner	a6c6698217	[X86] Adjust MIR test case to pacify machine verifier llvm-svn: 345227	2018-10-24 23:52:33 +00:00
Reid Kleckner	24d12c28e7	[X86] Fix pipeline tests when enabling MIR verification, NFC llvm-svn: 345226	2018-10-24 23:52:22 +00:00
Reid Kleckner	49a24278ba	[ELF] Fix large code model MIR verifier errors Instead of using the MOVGOT64r pseudo, use the existing MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to create a "scratch register operand" for the pseudo to use, and the register allocator can make better decisions. Fixes some X86 verifier errors tracked in PR27481. llvm-svn: 345219	2018-10-24 22:57:28 +00:00
Reid Kleckner	9c5bda652c	[X86] Add SP to tailcall register class to fix verifier error It's possible to do a tail call to a stack argument. LLVM already calculates the right stack offset to call through. Fixes the sibcall and musttail* verifier failures tracked at PR27481. llvm-svn: 345197	2018-10-24 21:09:34 +00:00
Simon Pilgrim	c5bb362b13	[X86][SSE] Add SimplifyDemandedBitsForTargetNode PMULDQ/PMULUDQ handling Add X86 SimplifyDemandedBitsForTargetNode and use it to simplify PMULDQ/PMULUDQ target nodes. This enables us to repeatedly simplify the node's arguments after the previous approach had to be reverted due to PR39398. Differential Revision: https://reviews.llvm.org/D53643 llvm-svn: 345182	2018-10-24 19:11:28 +00:00
Craig Topper	2417273255	[X86] Bring back the MOV64r0 pseudo instruction This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos. My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at. There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling. Differential Revision: https://reviews.llvm.org/D52757 llvm-svn: 345165	2018-10-24 17:32:09 +00:00
Robert Lougher	18bfb3a5ec	[CodeGen] skip lifetime end marker in isInTailCallPosition A lifetime end intrinsic between a tail call and the return should not prevent the call from being tail call optimized. Differential Revision: https://reviews.llvm.org/D53519 llvm-svn: 345163	2018-10-24 17:03:19 +00:00

1 2 3 4 5 ...

12719 Commits