llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	a1dca3553e	[SelectionDAG] simplify select FP with undef condition llvm-svn: 347212	2018-11-19 14:42:28 +00:00
Sanjay Patel	7a51bdcf3b	[x86] add test for select FP with undef condition; NFC llvm-svn: 347211	2018-11-19 14:39:57 +00:00
Simon Pilgrim	f6c2fbdd1a	[X86] Add codegen tests for slow-shld scalar funnel shifts llvm-svn: 347195	2018-11-19 12:29:41 +00:00
Craig Topper	8b22bcd39f	[X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the sign bit in v4i32 MULH lowering. The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate. llvm-svn: 347185	2018-11-19 07:22:26 +00:00
Craig Topper	3616891046	[X86] Use compare with 0 to fill an element with sign bits when sign extending to v2i64 pre-sse4.1 Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter. llvm-svn: 347181	2018-11-19 04:33:20 +00:00
Craig Topper	053f1eea96	[X86] Remove most of the SEXTLOAD Custom setOperationAction calls under -x86-experimental-vector-widening-legalization. Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts. llvm-svn: 347180	2018-11-19 00:33:16 +00:00
Simon Pilgrim	7f92efa5a9	[X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions. llvm-svn: 347177	2018-11-18 22:13:31 +00:00
Craig Topper	0468c860b7	[X86] Add custom type legalization for extending v4i8/v4i16->v4i64. Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result. When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq. llvm-svn: 347176	2018-11-18 21:28:50 +00:00
Craig Topper	950f3842cc	[X86] Add a 32-bit command line with only sse2 to vector-sext.ll and vector-sext.ll to show some of the scalarized load sequences without 64-bit scalar support. Some of these sequeces look pretty bad since we have to copy the sign bit from a 32 bit register to a 64 bit register to finish a sign extend. llvm-svn: 347175	2018-11-18 21:28:47 +00:00
Simon Pilgrim	b31bdbd2e9	[X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts. SSE vector shifts only use the bottom 64-bits of the shift amount vector. llvm-svn: 347173	2018-11-18 20:21:52 +00:00
Craig Topper	11d50948e2	[X86] Disable combineToExtendVectorInReg under -x86-experimental-vector-widening-legalization. Add custom type legalization for extends. If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats. This patch disables combineToExtendVectorInReg when we are using widening. I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346. I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend. llvm-svn: 347172	2018-11-18 18:11:25 +00:00
Craig Topper	bc8148f7b0	[X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction. Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54671 llvm-svn: 347171	2018-11-18 17:59:28 +00:00
Sanjay Patel	8c0cd77bff	[DAG] add undef simplifications for select nodes Sadly, this duplicates (twice) the logic from InstSimplify. There might be some way to at least share the DAG versions of the code, but copying the folds seems to be the standard method to ensure that we don't miss these folds. Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no way to ensure that we do these kinds of simplifications unless the code is repeated at node creation time and during combines. There were other tests that would become worthless with this improvement that I changed as pre-commits: rL347161 rL347164 rL347165 rL347166 rL347167 I'm not sure how to salvage the remaining tests (diffs in this patch). So the x86 tests verify that the new code is working as intended. The AMDGPU test is actually similar to my motivating case: we have some undef value that has survived to machine IR in an x86 test, and then it gets folded in some weird way, or we crash if we don't transfer the undef flag. But we would have been better off never getting to that point by doing these simplifications. This will lead back to PR32023 someday... https://bugs.llvm.org/show_bug.cgi?id=32023 llvm-svn: 347170	2018-11-18 17:36:23 +00:00
Sanjay Patel	bc23408fe5	[x86] regenerate full checks; NFC llvm-svn: 347167	2018-11-18 16:56:17 +00:00
Simon Pilgrim	fec9f8657b	[X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts. Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts. llvm-svn: 347162	2018-11-18 15:52:08 +00:00
Sanjay Patel	40509997eb	[x86] make tests immune to improvements in undef handling llvm-svn: 347161	2018-11-18 15:27:19 +00:00
Simon Pilgrim	7fdbae3224	[X86][SSE] Add some generic masked gather codegen tests llvm-svn: 347159	2018-11-18 14:35:57 +00:00
Simon Pilgrim	cc1f5d2407	[X86][SSE] Use raw shuffle mask decode in SimplifyDemandedVectorEltsForTargetNode (PR39549) We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs. llvm-svn: 347158	2018-11-18 13:34:53 +00:00
Craig Topper	f56a57518d	[X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends from i8 or smaller without SSE4.1. Prefer to shrink the mul instead. The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack. llvm-svn: 347149	2018-11-18 05:53:21 +00:00
Craig Topper	0438d791fa	[X86] Add support for matching PACKUSWB from a v64i8 shuffle. llvm-svn: 347143	2018-11-17 18:54:43 +00:00
Craig Topper	c6c760f07f	[X86] Add test case to show missed opportunity to use PACKUSWB in v64i8 shuffle lowering. llvm-svn: 347142	2018-11-17 18:54:41 +00:00
Simon Pilgrim	0e1a9d5ee6	[X86][SSE] Add shuffle demanded elts test case for PR39549 llvm-svn: 347139	2018-11-17 14:06:03 +00:00
Craig Topper	dd61f11642	[X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and prefer-vector-width=256. llvm-svn: 347131	2018-11-17 02:36:07 +00:00
Craig Topper	d8da95bbe3	[X86] Add test cases to show incorrect use of a 512 bit vector in v32i8 multiply lowering with prefer-vector-width=256. On the min-legal-vector-width test this actually causes some of the v32i16 operations we emitted to be scalarized. llvm-svn: 347130	2018-11-17 02:36:02 +00:00
Stanislav Mekhanoshin	0ff7c8309d	DAG combiner: fold (select, C, X, undef) -> X Differential Revision: https://reviews.llvm.org/D54646 llvm-svn: 347110	2018-11-16 23:13:38 +00:00
Craig Topper	ee0333b4a9	[X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under -x86-experimental-vector-widening-legalization. This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur. There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement. llvm-svn: 347105	2018-11-16 22:53:00 +00:00
Rong Xu	3a38175723	[X86] Disable Condbr_merge pass Disable Condbr_merge pass for now due to PR39658. Will reenable the pass once the bug is fixed. llvm-svn: 347079	2018-11-16 19:35:00 +00:00
Simon Pilgrim	96f7924fe2	[X86] Add codegen tests for scalar funnel shifts llvm-svn: 347066	2018-11-16 17:48:52 +00:00
Sanjay Patel	8da76a6581	[x86] regenerate complete checks for test; NFC llvm-svn: 347051	2018-11-16 14:44:20 +00:00
Roman Lebedev	90c5b3f78e	[X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X` Summary: As discussed in previous review, and noted in the FIXME, if `X` is actually an `lshr Y, Z` (logical!), we can fold the `Z` into 'control`, and let the `BEXTR` do this too. We could just insert those 8 bits of shift amount into control, but it is better to instead zero-extend them, and 'or' them in place. We can only do this for `lshr`, not `ashr`, because we do not know that the mask cover only the bits of `Y`, and not any of the sign-extended bits. The obvious question is, is this actually legal to do? I believe it is. Relevant quotes, from `Intel® 64 and IA-32 Architectures Software Developer’s Manual`, `BEXTR — Bit Field Extract`: * `Bit 7:0 of the second source operand specifies the starting bit position of bit extraction.` * `A START value exceeding the operand size will not extract any bits from the second source operand.` * `Only bit positions up to (OperandSize -1) of the first source operand are extracted.` * `All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed.` * `The destination register is cleared if no bits are extracted.` FIXME: if we can do this, i wonder if we should prefer `BEXTR` over `BZHI` in such cases. Reviewers: RKSimon, craig.topper, spatel, andreadb Reviewed By: RKSimon, craig.topper, andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54095 llvm-svn: 347048	2018-11-16 13:04:54 +00:00
Craig Topper	079c37da58	[X86] Add custom type legalization for v2i8/v4i8/v8i8 mul under -x86-experimental-vector-widening. By early promoting the multiply to use an i16 element type we can avoid op legalization emit a second multiply for the 8 upper elements of the v16i8 type we would otherwise get. llvm-svn: 347032	2018-11-16 06:15:21 +00:00
Craig Topper	dc957d49f9	[X86] Add some test cases for vector multiplies on vectors shorter than 128 bits with -x86-experimental-vector-widening-legalization. llvm-svn: 347031	2018-11-16 06:15:20 +00:00
Craig Topper	c93ae2b0a2	Revert r347014 "[X86] Add some test cases for vector multiplies on vectors shorter than 128 bits with -x86-experimental-vector-widening-legalization." Apparently I failed to update this after turnign sign extend to any extend. llvm-svn: 347015	2018-11-16 01:57:55 +00:00
Craig Topper	36920b44f7	[X86] Add some test cases for vector multiplies on vectors shorter than 128 bits with -x86-experimental-vector-widening-legalization. llvm-svn: 347014	2018-11-16 01:52:32 +00:00
Craig Topper	5802b82b40	[X86] Use ANY_EXTEND instead of SIGN_EXTEND in the AVX2 and later path for legalizing vXi8 multiply. We aren't going to use the upper bits of the multiply result that the extend would effect. So we don't need a specific type of extend. This makes some reduction test cases shorter because we were previously trying to sign_extend a truncate which we can't eliminate. llvm-svn: 347011	2018-11-16 01:16:59 +00:00
Craig Topper	73bb04ab6f	[X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and combineMulToPMADDWD In reduceVMULWidth, we no longer need to worry about extending the vector to 128 bits first. Regular widening of extends, muls and shuffles will take care of that for us. In combineMulToPMADDWD, we can handle v2i32 multiplies and allow the VPMADDWD to be widened to v4i32 during type legalization by adding custom widening like we do have for AVG/ADDUS/SUBUS. I had to modify that code a little to allow different and output VTs. Differential Revision: https://reviews.llvm.org/D54512 llvm-svn: 346980	2018-11-15 18:59:31 +00:00
Simon Pilgrim	0db8cb0147	[X86] Fix MCNullStreamer support for modules with a CodeView flag This fixes -filetype=null support when compiling for a Win32 target and the module has a CodeView flag. The only places changed are the uses of getTargetStreamer function - this patch guards both of them with null checks. Committed on behalf of @eush (Eugene Sharygin) Differential Revision: https://reviews.llvm.org/D54008 llvm-svn: 346962	2018-11-15 15:17:15 +00:00
Craig Topper	553ac560aa	[X86] Add some custom type legalization rules for truncate with -x86-experimental-vector-widening-legalization. This avoids some nasty shuffles when we have avx512. It will also prevent using zmm truncate instructions when a ymm instruction that zeroes part of an xmm register will do. Also avoid using avx512 truncate instructions when the input is 128 bits or less. These instructions are 2 uops on skx so we can probably find a better single uop shuffle like pshufb. llvm-svn: 346936	2018-11-15 08:23:40 +00:00
Craig Topper	926dbdd601	[X86] Add -x86-experimental-vector-widening-legalization versions of shuffle-vs-trunc tests. llvm-svn: 346935	2018-11-15 08:23:37 +00:00
Craig Topper	ea6ced9d1a	[X86] Don't mark SEXTLOADS with narrow types as Custom with -x86-experimental-vector-widening-legalization. The narrow types end up requesting widening, but generic legalization will end up scalaring and using a build_vector to do the widening. llvm-svn: 346916	2018-11-15 00:21:41 +00:00
Craig Topper	0b2089da4b	[X86] Support v2i32/v4i16/v8i8 load/store using f64 on 32-bit targets under -x86-experimental-vector-widening-legalization. On 64-bit targets the type legalizer will use i64 to legalize these. But when i64 isn't legal, the type legalizer won't try an FP type. So do it manually instead. There are a few regressions in here due to some v2i32 operations like mul and div now being reassembled into a full vector just to store instead of storing the pieces. But this was already occuring in 64-bit mode so its not a new issue. llvm-svn: 346908	2018-11-14 23:02:09 +00:00
Simon Pilgrim	e8cc5e4e03	[X86] Update masked expandload/compressstore test names llvm-svn: 346903	2018-11-14 22:44:08 +00:00
Simon Pilgrim	9d9353aef5	[X86][SSE] Add SSE2/SSE42 masked load/store tests Now that the load/store tests are split the impact of running the tests on multiple (illegal) targets is a lot less impactful llvm-svn: 346896	2018-11-14 21:31:50 +00:00
Nirav Dave	1241dcb3cf	Bias physical register immediate assignments The machine scheduler currently biases register copies to/from physical registers to be closer to their point of use / def to minimize their live ranges. This change extends this to also physical register assignments from immediate values. This causes a reduction in reduction in overall register pressure and minor reduction in spills and indirectly fixes an out-of-registers assertion (PR39391). Most test changes are from minor instruction reorderings and register name selection changes and direct consequences of that. Reviewers: MatzeB, qcolombet, myatsina, pcc Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya, javed.absar, arphaman, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54218 llvm-svn: 346894	2018-11-14 21:11:53 +00:00
Simon Pilgrim	be527b545f	[X86] Split masked load/store test files llvm-svn: 346889	2018-11-14 20:44:59 +00:00
Simon Pilgrim	7f15568c40	[X86] Update masked load/store test names llvm-svn: 346887	2018-11-14 20:25:50 +00:00
Craig Topper	6c94264b1f	[X86] Allow pmulh to be formed from narrow vXi16 vectors under -x86-experimental-vector-widening-legalization Narrower vectors will be widened to 128 bits without changing the element size. And generic type legalization can already handle widening mulhu/mulhs. Differential Revision: https://reviews.llvm.org/D54513 llvm-svn: 346879	2018-11-14 18:16:21 +00:00
Simon Pilgrim	7501780ec6	[X86][AVX512] Remove constant pool shuffle decoding from SelectionDAG This patch removes the last use of the constant pool shuffle decode helper and consistently uses the 'getTargetShuffleMaskIndices' versions instead. The constant pool versions are now purely used for assembly comments. The avx512vbmi intrinsic upgrades had to be altered as they were being decoded as broadcasts, similar to what I fixed in rL346032. I don't think the change is critical - although its annoying that we lose the {k}{z} instruction test coverage as they are tricky to generate.... Differential Revision: https://reviews.llvm.org/D54083 llvm-svn: 346850	2018-11-14 11:26:35 +00:00
Craig Topper	789cc8170d	[X86] Add -x86-experimental-vector-widening command lines to pmulh.ll I've only added sse2 and sse4.1 variants as I'm only interested in the two v4i16 tests and I don't expect that to different with AVX other than a v prefix. llvm-svn: 346834	2018-11-14 07:51:26 +00:00
Cameron McInally	cbde0d9c7b	[IR] Add a dedicated FNeg IR Instruction The IEEE-754 Standard makes it clear that fneg(x) and fsub(-0.0, x) are two different operations. The former is a bitwise operation, while the latter is an arithmetic operation. This patch creates a dedicated FNeg IR Instruction to model that behavior. Differential Revision: https://reviews.llvm.org/D53877 llvm-svn: 346774	2018-11-13 18:15:47 +00:00

1 2 3 4 5 ...

12823 Commits