llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	350ab7aa1c	[DAG] Simplify OR(X,SHL(Y,BW/2)) eq/ne 0/-1 'all/any-of' style patterns Attempt to simplify all/any-of style patterns that concatenate 2 smaller integers together into an and(x,y)/or(x,y) + icmp 0/-1 instead. This is mainly to help some bool predicate reduction patterns where we end up concatenating bool vectors that have been bitcasted to integers. Differential Revision: https://reviews.llvm.org/D93599	2021-01-07 12:03:19 +00:00
Craig Topper	a4124e455e	[X86] When storing v1i1/v2i1/v4i1 to memory, make sure we store zeros in the rest of the byte We can't store garbage in the unused bits. It possible that something like zextload from i1/i2/i4 is created to read the memory. Those zextloads would be legalized assuming the extra bits are 0. I'm not sure that the code in lowerStore is executed for the v1i1/v2i1/v4i1 case. It looks like the DAG combine in combineStore may have converted them to v8i1 first. And I think we're missing some cases to avoid going to the stack in the first place. But I don't have time to investigate those things at the moment so I wanted to focus on the correctness issue. Should fix PR48147. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D91294	2020-11-12 21:28:18 -08:00
Fangrui Song	96b0b9a5e3	[X86] Enable shrink-wrapping for no-frame-pointer non-nounwind functions on platforms not using compact unwind The current compact unwind scheme does not work when the prologue is not at the start (the instructions before the prologue cannot be described). (Technically this is fixable, but it requires multiple compact unwind descriptors for one function.) rL255175 chose to not perform shrink-wrapping for no-frame-pointer functions not marked as nounwind to work around PR25614. This is overly limited, as platforms not supporting compact unwind (all non-Darwin) does not need the workaround. This patch restricts the limitation to compact unwind platforms. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D89930	2020-11-04 16:51:48 -08:00
Craig Topper	ba319ac47e	[X86] Remove a redundant COPY_TO_REGCLASS for VK16 after a KMOVWkr in an isel output pattern. KMOVWkr produces VK16, there's no reason to copy it to VK16 again. Test changes are presumably because we were scheduling based on the COPY that is no longer there.	2020-08-25 15:19:27 -07:00
Craig Topper	52855ed099	[X86] Add back support for matching VPTERNLOG from back to back logic ops. I think this mostly looks ok. The only weird thing I noticed was a couple rotate vXi8 tests picked up an extra logic op where we have (and (or (and), (andn)), X). Previously we matched the (or (and), (andn)) to vpternlog, but now we match the (and (or), X) and leave the and/andn unmatched.	2020-07-02 22:11:52 -07:00
Craig Topper	acf6c94a38	[X86] Teach lower512BitShuffle to try bitmask and bitblend before splitting v32i16/v64i8 on av512f only targets. We consider v32i16/v64i8 to be legal types on avx512f, but we don't have most operations until avx512bw. But we can use and/or/xor operations. So try those before splitting. This is especially helpful since we turn some ands with constant masks into shuffles in early DAG combines. So we should make sure we recover those back to AND.	2020-07-02 15:35:48 -07:00
Craig Topper	9b04d69cce	[X86] Prefer AND over PSHUFB for v64i8 when possible If the shuffle is a blend and one input is a 0 vector, we should prefer AND over PSHUFB since its available on more execution ports. Differential Revision: https://reviews.llvm.org/D82798	2020-06-29 16:26:53 -07:00
Craig Topper	095dceefa3	[X86] Correct some isel patterns for v1i1 KNOT/KANDN/KXNOR. The KNOT pattern was missing. The others were looking for a v1i1 -1 instead of a vector all ones.	2020-06-06 17:25:56 -07:00
Craig Topper	8dfb9627b7	[X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead. This moves v32i16/v64i8 to a model consistent with how we treat integer types with avx1. This does change the ABI for types vXi16/vXi8 vectors larger than 512 bits to pass in multiple zmms instead of multiple ymms. We'd already hacked some code to make v64i8/v32i16 pass in zmm. Cost model is still a bit of a mess. In some place I tried to match existing behavior. But really we need to account for splitting and concating costs. Cost model for shuffles is especially pessimistic. Differential Revision: https://reviews.llvm.org/D76212	2020-04-15 12:17:18 -07:00
Craig Topper	d8ad7cc088	[DAGCombiner][X86] Improve narrowExtractedVectorLoad to handle cases where the element size isn't byte sized by the subvector is. Summary: Follow up from D75377. If the subvector is byte sized and the index is aligned to the subvector size, we can shrink the load. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: dbabokin, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75434	2020-03-03 08:41:31 -08:00
Craig Topper	0cd6712a7a	[DAGCombiner][X86] Disable narrowExtractedVectorLoad if the element type size isn't byte sized The address calculation for the offset assumes that you can calculate the offset by multiplying the index by the store size of the element. But that only works if the element's store size is exactly its real size since we store vectors tightly packed in memory. There are improvements we could make to this like special casing extracting element 0. I think we could also handle cases where the extracted VT is byte sized and the index is aligned with the extract element count. Differential Revision: https://reviews.llvm.org/D75377	2020-03-01 18:13:25 -08:00
Craig Topper	b00ef8951b	[X86] Custom legalize v1i1 UADDSAT/USUBSAT/SADDSAT/UADDSAT to match v2i1/v4i1/v8i1 etc.	2020-02-20 15:13:46 -08:00
Craig Topper	d95a10a7f9	[X86] Custom legalize v1i1 add/sub/mul to xor/xor/and with avx512. We already did this for v2i1, v4i1, v8i1, etc.	2020-02-20 15:13:44 -08:00
Craig Topper	f559cecc3e	[X86] Add DCI.isBeforeLegalize() check to the v64i1 constant splitting code in combineStore. We only need to split after type legalization. If we're before we can just use a wide store and type legalization will split it. Add a v128i1 test to exercise it post type legalization.	2020-02-19 09:18:16 -08:00
Wang, Pengfei	9dc9e0ea64	[X86] Optimization of inserting vxi1 sub vector into vXi1 vector Summary: After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations. The history information at https://reviews.llvm.org/D68311 Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D71917	2020-01-03 09:25:25 +08:00
Hiroshi Yamauchi	ed50e6060b	[PGO][PGSO] Enable size optimizations in code gen / target passes for cold code. Summary: Split off of D67120. Reviewers: davidxl Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71288	2019-12-13 11:01:19 -08:00
Craig Topper	f65493a83e	[X86] Teach X86MCInstLower to swap operands of commutable instructions to enable 2-byte VEX encoding. Summary: The 2 source operands commutable instructions are encoded in the VEX.VVVV field and the r/m field of the MODRM byte plus the VEX.B field. The VEX.B field is missing from the 2-byte VEX encoding. If the VEX.VVVV source is 0-7 and the other register is 8-15 we can swap them to avoid needing the VEX.B field. This works as long as the VEX.W, VEX.mmmmm, and VEX.X fields are also not needed. Fixes PR36706. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68550	2019-11-04 22:07:46 -08:00
David Zarzycki	7b9fd37fa1	[X86] Emit KTEST when possible https://reviews.llvm.org/D69111 llvm-svn: 375197	2019-10-18 03:45:52 +00:00
Craig Topper	87aa59a0c7	[X86] Remove isel patterns for mask vpcmpgt/vpcmpeq. Switch vpcmp to these based on the immediate in MCInstLower The immediate form of VPCMP can represent these completely. The vpcmpgt/eq are just shorter encodings. This patch removes the isel patterns and just swaps the opcodes and removes the immediate in MCInstLower. This matches where we do some other encodings tricks. Removes over 10K bytes from the isel table. Differential Revision: https://reviews.llvm.org/D68446 llvm-svn: 373766	2019-10-04 18:02:46 +00:00
Craig Topper	74c7d6be28	[X86] Rewrite to the vXi1 subvector insertion code to not rely on the value of bits that might be undef The previous code tried to do a trick where we would extract the subvector from the location we were inserting. Then xor that with the new value. Take the xored value and clear out the bits above the subvector size. Then shift that xored subvector to the insert location. And finally xor that with the original vector. Since the old subvector was used in both xors, this would leave just the new subvector at the inserted location. Since the surrounding bits had been zeroed no other bits of the original vector would be modified. Unfortunately, if the old subvector came from undef we might aggressively propagate the undef. Then we end up with the XORs not cancelling because they aren't using the same value for the two uses of the old subvector. @bkramer gave me a case that demonstrated this, but we haven't reduced it enough to make it easily readable to see what's happening. This patch uses a safer, but more costly approach. It isolate the bits above the insertion and bits below the insert point and ORs those together leaving 0 for the insertion location. Then widens the subvector with 0s in the upper bits, shifts it into position with 0s in the lower bits. Then we do another OR. Differential Revision: https://reviews.llvm.org/D68311 llvm-svn: 373495	2019-10-02 17:47:09 +00:00
Craig Topper	c2d25ed1b3	[X86] Prevent crash in LowerBUILD_VECTORvXi1 for v64i1 vectors on 32-bit targets when the vector is a mix of constants and non-constant. We need to materialize the constants as two 32-bit values that are casted to v32i1 and then concatenated. llvm-svn: 372304	2019-09-19 06:50:39 +00:00
Craig Topper	f9a89b6788	[X86] Simplify b2b KSHIFTL+KSHIFTR using demanded elts. llvm-svn: 372155	2019-09-17 18:02:56 +00:00
Craig Topper	f1ba94ade0	[X86] Call SimplifyDemandedVectorElts on KSHIFTL/KSHIFTR nodes during DAG combine. llvm-svn: 372154	2019-09-17 18:02:52 +00:00
Craig Topper	95aea74494	[X86] Split oversized vXi1 vector arguments and return values into scalars on avx512 targets. Previously we tried to split them into narrower v64i1 or v16i1 pieces that each got promoted to vXi8 and then passed in a zmm or xmm register. But this crashes when you need to pass more pieces than available registers reserved for argument passing. The scalarizing done here generates much longer and slower code, but is consistent with the behavior of avx2 and earlier targets for these types. Fixes PR43323. llvm-svn: 372069	2019-09-17 04:41:14 +00:00
Craig Topper	359918dadf	[X86] Enable commuting of EVEX VCMP for all immediate values during isel. llvm-svn: 372065	2019-09-17 04:40:58 +00:00
Craig Topper	18e8d02e8c	[X86] Pass v32i16/v64i8 in zmm registers on KNL target. gcc and icc pass these types in zmm registers in zmm registers. This patch implements a quick hack to override the register type before calling convention handling to one that is legal. Longer term we might want to do something similar to 256-bit integer registers on AVX1 where we just split all the operations. Fixes PR42957 Differential Revision: https://reviews.llvm.org/D66708 llvm-svn: 370495	2019-08-30 17:35:08 +00:00
Craig Topper	8b5f2ab2a4	Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default." The assert that caused this to be reverted should be fixed now. Original commit message: This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 368183	2019-08-07 16:24:26 +00:00
Mitch Phillips	bd0d97e1c4	Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default." This reverts commit `3de33245d2`. This commit broke the MSan buildbots. See https://reviews.llvm.org/rL367901 for more information. llvm-svn: 368107	2019-08-06 23:00:43 +00:00
Craig Topper	3de33245d2	[X86] Enable -x86-experimental-vector-widening-legalization by default. This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 367901	2019-08-05 18:25:36 +00:00
Simon Pilgrim	49b3778e32	[TargetLowering] SimplifyDemandedBits - legal checks for SIGN/ZERO_EXTEND -> ZERO/ANY_EXTEND As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations \|\| isOperationLegal cases. We'll improve X86 legality in future commits. llvm-svn: 364290	2019-06-25 10:51:15 +00:00
Simon Pilgrim	d7cc0ec581	[TargetLowering] SimplifyDemandedBits - add ISD::INSERT_SUBVECTOR support llvm-svn: 358019	2019-04-09 16:52:21 +00:00
Craig Topper	0367553304	[X86] Call lowerShuffleAsBitMask for 512-bit vectors in lowerShuffleAsBlend. This patch enables the use of lowerShuffleAsBitMask for 512-bit blends before falling back to move immedate, GPR to k-register, and masked op. I had to make some changes to support v8i64 when i64 is not a legal type. And to support floating point types. This trades a load for the move immediate and GPR move which is higher latency. But its probably better for register pressure not having to hop through other register classes. The load+and should play better with LICM and rematerialization I think. Differential Revision: https://reviews.llvm.org/D59479 llvm-svn: 356618	2019-03-20 21:30:20 +00:00
Craig Topper	bf61525e8c	[X86] When lowering v1i1/v2i1/v4i1/v8i1 load/store with avx512f, but not avx512dq, use v16i1 as the intermediate mask type instead of v8i1. We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type. By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned. llvm-svn: 350989	2019-01-12 02:22:10 +00:00
Craig Topper	b97885cc2e	[X86] Change vXi1 extract_vector_elt lowering to be legal if the index is 0. Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector. We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions. Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector. This fixes some of the regressions from D350800. llvm-svn: 350918	2019-01-11 05:44:56 +00:00
Craig Topper	5d20eb240f	[X86] Disable DomainReassignment pass when AVX512BW is disabled to avoid injecting VK32/VK64 references into the MachineIR Summary: This pass replaces GR8/GR16/GR32/GR64 with their equivalent sized mask register classes. But VK32/VK64 aren't legal without AVX512BW. Apparently this mostly appears to work if the register coalescer is able to remove the VK32/VK64 register class reference. Or if we don't ever spill it. But there's no guarantee of that. Another Intel employee managed to trigger a crash due to this with ISPC. Unfortunately, I've lost the test case he sent me at the time. I'm trying to get him to reproduce it for me. I'd like to get this in before 8.0 branches since its a little scary. The regressions here are unfortunate, but I think we can make some improvements to DAG combine, load folding, etc. to fix them. Just not sure if we can get that done for 8.0. Fixes PR39741 Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56460 llvm-svn: 350800	2019-01-10 07:43:54 +00:00
Craig Topper	6265a15f2e	[X86] Add post-isel peephole to fold KAND+KORTEST into KTEST if only the zero flag is used. Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register. Differential Revision: https://reviews.llvm.org/D56246 llvm-svn: 350374	2019-01-04 00:10:58 +00:00
Craig Topper	5ef47ad82e	[X86] Add test cases for opportunities to use KTEST when check if the result of ANDing two mask registers is zero. The test cases are constructed to avoid folding the AND into a masked compare operation. Currently we emit a KAND and a KORTEST for these cases. llvm-svn: 350287	2019-01-03 07:12:54 +00:00
Sanjay Patel	c3717cd0d5	[DAGCombiner] don't hoist logic op if operands have other uses The AVX512 diffs are neutral, but the bswap test shows a clear overreach in hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can increase the instruction count. This could also fight with transforms trying to go in the opposite direction and possibly blow up/infinite loop. This might be enough to solve the bug noted here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html I did not add the hasOneUse() checks to all opcodes because I see a perf regression for at least one opcode. We may decide that's irrelevant in the face of potential compiler crashing, but I'll see if I can salvage that first. llvm-svn: 348508	2018-12-06 18:16:32 +00:00
Craig Topper	731ea7dbc1	[X86] Turn X86ISD::VSEXT into X86ISD::VZEXT if the upper bits aren't demanded. This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND. I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch. llvm-svn: 346539	2018-11-09 19:05:51 +00:00
Craig Topper	8315d9990c	[X86] Stop promoting vector and/or/xor/andn to vXi64. These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation. This patch removes the promotion and adds isel patterns instead. The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward. Differential Revision: https://reviews.llvm.org/D53268 llvm-svn: 345408	2018-10-26 17:21:26 +00:00
Simon Pilgrim	2d0f20cc04	[X86] Handle COPYs of physregs better (regalloc hints) Enable enableMultipleCopyHints() on X86. Original Patch by @jonpa: While enabling the mischeduler for SystemZ, it was discovered that for some reason a test needed one extra seemingly needless COPY (test/CodeGen/SystemZ/call-03.ll). The handling for that is resulted in this patch, which improves the register coalescing by providing not just one copy hint, but a sorted list of copy hints. On SystemZ, this gives ~12500 less register moves on SPEC, as well as marginally less spilling. Instead of improving just the SystemZ backend, the improvement has been implemented in common-code (calculateSpillWeightAndHint(). This gives a lot of test failures, but since this should be a general improvement I hope that the involved targets will help and review the test updates. Differential Revision: https://reviews.llvm.org/D38128 llvm-svn: 342578	2018-09-19 18:59:08 +00:00
Craig Topper	c2696d577b	[X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to isel I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code. I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering. Differential Revision: https://reviews.llvm.org/D43608 llvm-svn: 335173	2018-06-20 21:05:02 +00:00
Craig Topper	176ec8506f	[DAGCombiner] In visitBITCAST when trying to constant fold the bitcast, only call getBitcast if its an fp->int or int->fp conversion even when before legalize ops. Previously if !LegalOperations we would blindly call getBitcast and hope that getNode would constant fold it. But if the conversion is between a vector and a scalar, getNode has no simplification. This means we would just get back the original N. We would then return that N which would make the caller of visitBITCAST think that we used CombineTo and did our own worklist management. This prevents target specific optimizations from being called for vector/scalar bitcasts until after legal operations. llvm-svn: 331896	2018-05-09 17:14:27 +00:00
Craig Topper	b9a473d186	[X86] Combine (vXi1 (bitcast (-1)))) and (vXi1 (bitcast (0))) to all ones or all zeros vXi1 vector. llvm-svn: 331847	2018-05-09 06:07:20 +00:00
Craig Topper	f3cefad255	[DAGCombiner][X86] When promoting loads don't use ZEXTLOAD even its legal We were previously prefering ZEXTLOAD over EXTLOAD if it is legal. This triggers during X86's promotion of i16->i32. Not sure about other targets. Using ZEXTLOAD can prevent folding it to SEXTLOAD later if we were to promote a sign extended operand like we would need for SRA. However, X86 doesn't currently promote i16 SRA. I was looking into doing that which is how I found this issue. This is also blocking our ability to fold 4 byte aligned EXTLOADs with "loadi32". This is what caused most of the test changes here. Differential Revision: https://reviews.llvm.org/D45585#inline-402825 llvm-svn: 330781	2018-04-24 22:35:27 +00:00
Craig Topper	f2aae62228	[X86] Add a DAG combine to turn stores of vXi1 constants into scalar stores. llvm-svn: 326679	2018-03-04 19:33:15 +00:00
Craig Topper	1209eb7d66	[X86] Add a 32-bit mode command line to avx512-mask-op.ll. Add tests for storing v2i1 and v4i1 constants. llvm-svn: 326678	2018-03-04 19:33:13 +00:00
Craig Topper	4196dd12a2	[DAGCombiner] Add a peekThroughBitcast to MergeStoresOfConstantsOrVecElts to fix a crash if we are storing a bitcast of a constant. Loading a constant into a k-register in AVX512 requires a bitcast from a scalar constant. In the test case here we have a k-register store that gets split into multiple parts of KNL. MergeConsecutiveStores sees each of these pieces as a consecutive store and looks through the bitcast to find the underly scalar constant. But when we went to create the combined store we didn't look through the same bitcast. llvm-svn: 326677	2018-03-04 18:51:46 +00:00
Craig Topper	be31585be8	[X86] Lower v1i1/v2i1/v4i1/v8i1 load/stores to i8 load/store during op legalization if AVX512DQ is not supported. We were previously doing this with isel patterns. Moving it to op legalization gives us chance to see the required bitcast earlier. And it lets us remove some isel patterns. llvm-svn: 326669	2018-03-04 01:48:00 +00:00
Craig Topper	7fbea20b90	[SelectionDAG] Support known true/false SimplifySetCC cases for comparing against vector splats of constants. This is split off from D42948 and includes just the cases that constant fold to true or false. It also includes some refactoring to keep predicate checks together. This supports things like (setcc uge X, 0) -> true Differential Revision: https://reviews.llvm.org/D43489 llvm-svn: 325627	2018-02-20 21:48:14 +00:00

1 2 3 4 5

201 Commits