llvm-project

Commit Graph

Author	SHA1	Message	Date
Michael Berg	bf90d1f263	Utilize new SDNode flag functionality to expand current support for fsub Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fsub. Reviewers: spatel, hfinkel, wristow, arsenm Reviewed By: spatel Subscribers: wdng Differential Revision: https://reviews.llvm.org/D47910 llvm-svn: 334306	2018-06-08 17:39:50 +00:00
Simon Pilgrim	89deac6694	[X86][BtVer2] Add support for all SUB/XOR 32/64 scalar instructions that should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits). llvm-svn: 334303	2018-06-08 17:00:45 +00:00
Simon Pilgrim	59e915c691	[X86] Fix schedule-x86_64.s tests to use different registers in reg-reg cases Same fix as rL334110: I noticed while working on zero-idiom + dependency-breaking support (PR36671) that most of our binary instruction schedule tests were reusing the same src registers, which would cause the tests to fail once we enable scalar zero-idiom support on btver2. llvm-svn: 334302	2018-06-08 16:40:15 +00:00
Simon Pilgrim	eab9d20424	[X86][SSE] Add SSE2/AVX2 vector rotate tests Now that we're custom lowering vector rotates for SSE in general we should be testing the combines with them as well. llvm-svn: 334290	2018-06-08 14:07:21 +00:00
Simon Pilgrim	a6afa310c9	[X86][SSE] Simplify combineVectorTruncationWithPACKUS to reduce code duplication Simplify combineVectorTruncationWithPACKUS to mask the upper bits followed by calling truncateVectorWithPACK instead of duplicating with similar code. This results in the codegen using (V)PACKUSDW on SSE41+ targets for vXi64/vXi32 inputs where before it always used PACKUSWB (along with a lot more bitcasting). I've raised PR37749 as until we avoid unnecessary concats back to 256-bit for bitwise ops, we can't avoid splitting the input value into 128-bit subvectors for masking. llvm-svn: 334289	2018-06-08 13:59:11 +00:00
Sanjay Patel	ab4ca0603c	[x86] restore test comment; NFC The description got deleted along with the FIXME note in rL334268. llvm-svn: 334288	2018-06-08 13:53:13 +00:00
Simon Pilgrim	ad45efc445	[X86][SSE] Consistently prefer lowering to PACKUS over PACKSS We have some combines/lowerings that attempt to use PACKSS-then-PACKUS and others that use PACKUS-then-PACKSS. PACKUS is much easier to combine with if we know the upper bits are zero as ComputeKnownBits can easily see through BITCASTs etc. especially now that rL333995 and rL334007 have landed. It also effectively works at byte level which further simplifies shuffle combines. The only (minor) annoyances are that ComputeKnownBits can sometimes take longer as it doesn't fail as quickly as ComputeNumSignBits (but I'm not seeing any actual regressions in tests) and PACKUSDW only became available after SSE41 so we have more codegen diffs between targets. llvm-svn: 334276	2018-06-08 10:29:00 +00:00
Sam Parker	16f963ba0d	[DAGCombine] Fix for PR37667 While trying to propagate AND masks back to loads, we currently allow one non-load node to be included as a leaf in chain. This fix now limits that node to produce only a single data value. Differential Revision: https://reviews.llvm.org/D47878 llvm-svn: 334268	2018-06-08 07:49:04 +00:00
Michael Berg	77b5be7ec6	propagate fast math flags via IR on fma and sub expressions Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode. Reviewers: spatel, arsenm, hfinkel, javed.absar Reviewed By: spatel Subscribers: nemanjai, wdng Differential Revision: https://reviews.llvm.org/D47388 llvm-svn: 334242	2018-06-07 22:49:09 +00:00
Sanjay Patel	898fbd7c47	[x86] add tests for backwards propagate mask bug (PR37060, PR37667); NFC llvm-svn: 334199	2018-06-07 14:11:18 +00:00
Simon Pilgrim	09953d8412	[X86][SSE] Simplify combineVectorTruncationWithPACKSS to reduce code duplication Simplify combineVectorTruncationWithPACKSS to just a SIGN_EXTEND_INREG followed by using the existing truncateVectorWithPACK instead of duplicating code. llvm-svn: 334193	2018-06-07 13:01:42 +00:00
Simon Pilgrim	0e29d8d81f	[X86][SSE] Add extra trunc(shl) test cases The existing trunc_shl_17_v8i16_v8i32 test case should (but doesn't) fold to zero, I've added 2 new test cases: - trunc_shl_16_v8i16_v8i32 which folds to zero (this is actually testing the target faux shuffle combine) - trunc_shl_15_v8i16_v8i32 which should perform the full shl + truncate llvm-svn: 334188	2018-06-07 11:22:52 +00:00
Simon Pilgrim	cc92897be9	[X86] Regenerate rotate tests Add 32-bit tests to show missed SHLD/SHRD cases llvm-svn: 334183	2018-06-07 10:13:09 +00:00
Tomasz Krupa	f8c7637027	[X86] Block UndefRegUpdate Summary: Prevent folding of operations with memory loads when one of the sources has undefined register update. Reviewers: craig.topper Subscribers: llvm-commits, mike.dvoretsky, ashlykov Differential Revision: https://reviews.llvm.org/D47621 llvm-svn: 334175	2018-06-07 08:48:45 +00:00
Karl-Johan Karlsson	abb11f805f	[BranchFolding] Fix live-in's when hoisting code Summary: When the branch folder hoist code into a predecessor it adjust live-in's in the blocks it hoist code from. However it fail to handle hoisted code that contain a defed register that originally is live-in in the block through a super register. This is fixed by replacing the live-in handling code with calls to utility functions in LivePhysRegs. Reviewers: kparzysz, gberry, MatzeB, uweigand, aprantl Reviewed By: kparzysz Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47529 llvm-svn: 334163	2018-06-07 07:20:33 +00:00
Roman Lebedev	488d28d4e5	[X86] Emit BZHI when mask is ~(-1 << nbits)) Summary: In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation. As it is seen from these tests, there is a reason for that. AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!). The other way around for X86. It would be much better to canonicalize. This patch is completely monkey-typing. I don't really understand how this works :) I have based it on `// x & (-1 >> (32 - y))` pattern. Also, when we only have `BMI`, i wonder if we could use `BEXTR` with `start=0` ? Related links: https://bugs.llvm.org/show_bug.cgi?id=36419 https://bugs.llvm.org/show_bug.cgi?id=37603 https://bugs.llvm.org/show_bug.cgi?id=37610 https://rise4fun.com/Alive/idM Reviewers: craig.topper, spatel, RKSimon, javed.absar Reviewed By: craig.topper Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D47453 llvm-svn: 334125	2018-06-06 19:38:16 +00:00
Roman Lebedev	cb56f7a550	[NFC][X86][AArch64] Reorganize/cleanup BZHI test patterns Summary: In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation. As it is seen from these tests, there is a reason for that. AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!). The other way around for X86. It would be much better to canonicalize. It would seem that there is too much tests, but this is most of all the auto-generated possible variants of C code that one would expect for BZHI to be formed, and then manually cleaned up a bit. So this should be pretty representable, which somewhat good coverage... Related links: https://bugs.llvm.org/show_bug.cgi?id=36419 https://bugs.llvm.org/show_bug.cgi?id=37603 https://bugs.llvm.org/show_bug.cgi?id=37610 https://rise4fun.com/Alive/idM Reviewers: javed.absar, craig.topper, RKSimon, spatel Reviewed By: RKSimon Subscribers: kristof.beyls, llvm-commits, RKSimon, craig.topper, spatel Differential Revision: https://reviews.llvm.org/D47452 llvm-svn: 334124	2018-06-06 19:38:10 +00:00
Michael Berg	cc1c4b6912	guard fsqrt with fmf sub flags Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483. It contains only context for fsqrt. Reviewers: spatel, hfinkel, arsenm Reviewed By: spatel Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai Differential Revision: https://reviews.llvm.org/D47749 llvm-svn: 334113	2018-06-06 18:47:55 +00:00
Simon Pilgrim	3d14158891	[X86][BMI][TBM] Only demand bottom 16-bits of the BEXTR control op (PR34042) Only the bottom 16-bits of BEXTR's control op are required (0:8 INDEX, 15:8 LENGTH). Differential Revision: https://reviews.llvm.org/D47690 llvm-svn: 334083	2018-06-06 10:52:10 +00:00
Sanjay Patel	59313be8d3	[CodeGen] assume max/default throughput for unspecified instructions This is a fix for the problem arising in D47374 (PR37678): https://bugs.llvm.org/show_bug.cgi?id=37678 We may not have throughput info because it's not specified in the model or it's not available with variant scheduling, so assume that those instructions can execute/complete at max-issue-width. Differential Revision: https://reviews.llvm.org/D47723 llvm-svn: 334055	2018-06-05 23:34:45 +00:00
Guozhi Wei	c4c6b548c5	[CodeGenPrepare] Move Extension Instructions Through Logical And Shift Instructions CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions. Differential Revision: https://reviews.llvm.org/D45537 This is re-commit of r331783, which was reverted by r333305. The performance regression was caused by some unlucky alignment, not a code generation problem. llvm-svn: 334049	2018-06-05 21:03:52 +00:00
Simon Pilgrim	f2f043acbb	[X86][SSE] Use multiplication scale factors for v8i16 SHL on pre-AVX2 targets. Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support. Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts. This is a step towards adding support for vXi16 vector rotates. Differential Revision: https://reviews.llvm.org/D47546 llvm-svn: 334023	2018-06-05 15:17:39 +00:00
Simon Pilgrim	fef9b6eea6	[X86][SSE] Add target shuffle support to X86TargetLowering::computeKnownBitsForTargetNode Ideally we'd use resolveTargetShuffleInputs to handle faux shuffles as well but: (a) that code path doesn't handle general/pre-legalized ops/types very well. (b) I'm concerned about the compute time as they recurse to calls to computeKnownBits/ComputeNumSignBits which would need depth limiting somehow. llvm-svn: 334007	2018-06-05 10:52:29 +00:00
Simon Pilgrim	7bbe7a2920	[X86][SSE] Add basic PACKUS support to X86TargetLowering::computeKnownBitsForTargetNode Helps improve analysis of saturation ops llvm-svn: 333995	2018-06-05 09:45:03 +00:00
Alexander Ivchenko	964b27fa21	[X86][CET] Shadow stack fix for setjmp/longjmp This is the new version of D46181, allowing setjmp/longjmp to work correctly with the Intel CET shadow stack by storing SSP on setjmp and fixing it on longjmp. The patch has been updated to use the cf-protection-return module flag instead of HasSHSTK, and the bug that caused D46181 to be reverted has been fixed with the test expanded to track that fix. patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D47311 llvm-svn: 333990	2018-06-05 09:22:30 +00:00
Craig Topper	f17b33d6c6	[X86] Make all instructions that operate on MMX types, but were added after the initial MMX support via one of the SSE features flags make them require the MMX feature as well. Passing -mattr=-mmx needs to disable these instructions since the MMX register class won't have been set up. But we don't want -mattr=-mmx to disable SSE so we have to do it separately. llvm-svn: 333984	2018-06-05 06:20:06 +00:00
Vedant Kumar	800255f9f1	[Debugify] Don't insert debug values after terminating deopts As is the case with musttail calls, the IR does not allow for instructions inserted after a terminating deopt. llvm-svn: 333976	2018-06-05 00:56:07 +00:00
Francis Visoiu Mistrih	ca69b3bf6d	[ShrinkWrap] Add optimization remarks to the shrink-wrapping pass Start by emitting remarks for very basic unsupported cases such as irreducible CFGs and EHFunclets. The end goal is to be able to cover all the cases where we give up with an explanation. llvm-svn: 333972	2018-06-05 00:27:24 +00:00
Amaury Sechet	800ac42573	Remove various use of undef in the X86 test suite as patern involving undef can collapse them. NFC llvm-svn: 333961	2018-06-04 22:09:26 +00:00
Amaury Sechet	e2729faf52	Revert "Regenerate expected test results for test/CodeGen/X86/pr23103.ll . NFC" This reverts commit cf25dfc503c861845947f3e6a9d308811ebb9da3. llvm-svn: 333960	2018-06-04 21:49:23 +00:00
Amaury Sechet	f5db3a15bf	Revert "Remove various use of undef in the X86 test suite as patern involving undef can collapse them. NFC" This reverts commit f0e85c194ae5e87476bc767304470dec85b6774f. llvm-svn: 333953	2018-06-04 21:20:45 +00:00
Alexander Ivchenko	2f038c4094	[X86][ELF][CET] Adding the .note.gnu.property ELF section in X86 In preparation for the proposed linker ABI changes (https://github.com/hjl-tools/linux-abi/wiki/linux-abi-draft.pdf, https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-cet.pdf), this patch enables emission of the .note.gnu.property section to ELF object files when building CET-enabled modules. patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D47145 llvm-svn: 333951	2018-06-04 21:07:35 +00:00
Amaury Sechet	87f1a240ba	Remove various use of undef in the X86 test suite as patern involving undef can collapse them. NFC llvm-svn: 333950	2018-06-04 20:57:27 +00:00
Amaury Sechet	1910090328	Regenerate expected test results for test/CodeGen/X86/pr23103.ll . NFC llvm-svn: 333949	2018-06-04 20:47:00 +00:00
Amaury Sechet	da661e9236	[DAGcombine] Teach the combiner about -a = ~a + 1 Summary: This include variant for add, uaddo and addcarry. usubo and subcarry require the carry to be flipped to preserve semantic, but we chose to do the transform anyway in that case as to push the transform down the carry chain. Reviewers: efriedma, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46505 llvm-svn: 333943	2018-06-04 19:23:22 +00:00
Andrea Di Biagio	39e5a5695f	[RFC][patch 3/3] Add support for variant scheduling classes in llvm-mca. This patch is the last of a sequence of three patches related to LLVM-dev RFC "MC support for variant scheduling classes". http://lists.llvm.org/pipermail/llvm-dev/2018-May/123181.html This fixes PR36672. The main goal of this patch is to teach llvm-mca how to solve variant scheduling classes. This patch does that, plus it adds new variant scheduling classes to the BtVer2 scheduling model to identify so-called zero-idioms (i.e. so-called dependency breaking instructions that are known to generate zero, and that are optimized out in hardware at register renaming stage). Without the BtVer2 change, this patch would not have had any meaningful tests. This patch is effectively the union of two changes: 1) a change that teaches llvm-mca how to resolve variant scheduling classes. 2) a change to the BtVer2 scheduling model that allows us to special-case packed XOR zero-idioms (this partially fixes PR36671). Differential Revision: https://reviews.llvm.org/D47374 llvm-svn: 333909	2018-06-04 15:43:09 +00:00
Craig Topper	9923eac358	[X86] Remove and autoupgrade masked avx512vnni intrinsics using the unmasked intrinsics and select instructions. llvm-svn: 333857	2018-06-03 23:24:17 +00:00
Vedant Kumar	77f4d4d8aa	[Debugify] Skip dbg.value placement for EH pads, musttail Placing meta-instructions into EH pads breaks certain IR invariants, as does placing instructions after a musttail call. llvm-svn: 333856	2018-06-03 22:50:22 +00:00
Simon Pilgrim	7c4446ce0c	[X86][TBM] Use realistic BEXTR control bits Avoid constant values that are guaranteed to give zero Found while investigating BEXTR optimizations for PR34042. llvm-svn: 333849	2018-06-03 18:15:06 +00:00
Simon Pilgrim	1f60e2b41b	[X86][AVX512] Cleanup intrinsics tests Ensure we test on 32-bit and 64-bit targets, and strip -mcpu usage. Part of ongoing work to ensure we test all intrinsic style tests on 32 and 64 bit targets where possible. llvm-svn: 333843	2018-06-03 14:56:04 +00:00
Simon Pilgrim	7d717fed0b	[X86][AVX512BW] Regenerate arithmetic tests using update_llc_test_checks.py script Require manual stripping of existing CHECKs as update_llc_test_checks doesn't remove them if they're outside the function llvm-svn: 333842	2018-06-03 14:31:30 +00:00
Simon Pilgrim	e370ade180	[X86][BMI1] Test i32 intrinsics on 32/64 bits + branch off i64 tests Further refactoring will wait until D47452 has landed. Part of ongoing work to ensure we test all intrinsic style tests on 32 and 64 bit targets where possible. llvm-svn: 333841	2018-06-03 14:11:34 +00:00
Simon Pilgrim	8dc43621ec	[X86][BMI] Remove CTTZ tests - this is fully covered in clz.ll llvm-svn: 333840	2018-06-03 13:55:17 +00:00
Simon Pilgrim	d4ef869e28	[X86][TBM] Branch off i32 intrinsics and test on 32/64 bits Part of ongoing work to ensure we test all intrinsic style tests on 32 and 64 bit targets where possible. llvm-svn: 333839	2018-06-03 13:38:52 +00:00
Simon Pilgrim	2b55e751ce	[X86][SSE] Cleanup AVX1 intrinsics tests Ensure we cover 32/64-bit targets for SSE/AVX/AVX512 cases as necessary, strip -mcpu usage. llvm-svn: 333834	2018-06-02 21:35:48 +00:00
Simon Pilgrim	58ff2ecc4b	[X86][SSE] Cleanup SSE1 intrinsics tests Ensure we cover 32/64-bit targets for SSE/AVX/AVX512 cases as necessary llvm-svn: 333833	2018-06-02 20:25:56 +00:00
Simon Pilgrim	8790844848	[X86][SSE] Cleanup SSE2 intrinsics tests Ensure we cover 32/64-bit targets for SSE/AVX/AVX512 cases as necessary llvm-svn: 333832	2018-06-02 19:43:14 +00:00
Simon Pilgrim	8c5b33a085	[X86][SSE] Cleanup SSE3/SSSE3 intrinsics tests Ensure we cover 32/64-bit targets for SSE/AVX/AVX512 cases as necessary llvm-svn: 333831	2018-06-02 18:41:46 +00:00
Simon Pilgrim	1c0fa05397	[X86][SSE4] Tweak rL333828 sse41/sse42 cleanup to recover SKX/EVEX2VEX testing Just testing for avx512f was missing the tests for EVEX TO VEX Compression encoding etc. llvm-svn: 333830	2018-06-02 18:01:09 +00:00
Simon Pilgrim	dda8daec73	[X86][SSE] Cleanup SSE4A/SSE41/SSE42 intrinsics tests Ensure we cover 32/64-bit targets for SSE/AVX/AVX512 cases as necessary Added some missing encoding checks to SSE4A tests llvm-svn: 333828	2018-06-02 17:33:26 +00:00

1 2 3 4 5 ...

11923 Commits