llvm-project

Commit Graph

Author	SHA1	Message	Date
Diana Picus	69ce1c1319	[GlobalISel] Allow multiple VRegs in ArgInfo. NFC Allow CallLowering::ArgInfo to contain more than one virtual register. This is useful when passes split aggregates into several virtual registers, but need to also provide information about the original type to the call lowering. Used in follow-up patches. Differential Revision: https://reviews.llvm.org/D63548 llvm-svn: 364509	2019-06-27 08:50:53 +00:00
Mikael Holmen	7b81b61368	Silence gcc warning after r364458 Without the fix gcc 7.4.0 complains with ../lib/Target/X86/X86ISelLowering.cpp: In function 'bool getFauxShuffleMask(llvm::SDValue, llvm::SmallVectorImpl<int>&, llvm::SmallVectorImpl<llvm::SDValue>&, llvm::SelectionDAG&)': ../lib/Target/X86/X86ISelLowering.cpp:6690:36: error: enumeral and non-enumeral type in conditional expression [-Werror=extra] int Idx = (ZeroMask[j] ? SM_SentinelZero : (i + j + Ofs)); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1plus: all warnings being treated as errors llvm-svn: 364507	2019-06-27 08:16:18 +00:00
Craig Topper	9153501f07	[X86] Remove (vzext_movl (scalar_to_vector (load))) matching code from selectScalarSSELoad. I think this will be turning into vzext_load during DAG combine. llvm-svn: 364499	2019-06-27 05:52:00 +00:00
Craig Topper	9ea5a32251	[X86] Teach selectScalarSSELoad to not narrow volatile loads. llvm-svn: 364498	2019-06-27 05:51:56 +00:00
Craig Topper	3d12971e1c	[X86] Rework the logic in LowerBuildVectorv16i8 to make better use of any_extend and break false dependencies. Other improvements This patch rewrites the loop iteration to only visit every other element starting with element 0. And we work on the "even" element and "next" element at the same time. The "First" logic has been moved to the bottom of the loop and doesn't run on every element. I believe it could create dangling nodes previously since we didn't check if we were going to use SCALAR_TO_VECTOR for the first insertion. I got rid of the "First" variable and just do a null check on V which should be equivalent. We also no longer use undef as the starting V for vectors with no zeroes to avoid false dependencies. This matches v8i16. I've changed all the extends and OR operations to use MVT::i32 since that's what they'll be promoted to anyway. I've tried to use zero_extend only when necessary and use any_extend otherwise. This resulted in some improvements in tests where we are now able to promote aligned (i32 (extload i8)) to a 32-bit load. Differential Revision: https://reviews.llvm.org/D63702 llvm-svn: 364469	2019-06-26 20:16:19 +00:00
Craig Topper	afa58b6ba1	[X86] Remove isTypePromotionOfi1ZeroUpBits and its helpers. This was trying to optimize concat_vectors with zero of setcc or kand instructions. But I think it produced the same code we produce for a concat_vectors with 0 even it it doesn't come from one of those operations. llvm-svn: 364463	2019-06-26 19:45:48 +00:00
Simon Pilgrim	dfe079ffbf	[X86][SSE] getFauxShuffleMask - handle OR(x,y) where x and y have no overlapping bits Create a per-byte shuffle mask based on the computeKnownBits from each operand - if for each byte we have a known zero (or both) then it can be safely blended. Fixes PR41545 llvm-svn: 364458	2019-06-26 18:21:26 +00:00
Simon Pilgrim	435ee9fb1f	[X86][SSE] X86TargetLowering::isCommutativeBinOp - add PMULDQ Allows narrowInsertExtractVectorBinOp to reduce vector size instead of the more restricted SimplifyDemandedVectorEltsForTargetNode llvm-svn: 364434	2019-06-26 14:58:11 +00:00
Simon Pilgrim	6b687bf681	[X86][SSE] X86TargetLowering::isCommutativeBinOp - add PCMPEQ Allows narrowInsertExtractVectorBinOp to reduce vector size llvm-svn: 364432	2019-06-26 14:40:49 +00:00
Simon Pilgrim	b13c6f1a9d	[X86][SSE] X86TargetLowering::isBinOp - add PCMPGT Allows narrowInsertExtractVectorBinOp to reduce vector size llvm-svn: 364431	2019-06-26 14:34:41 +00:00
Simon Pilgrim	24f96a0eee	[X86] shouldScalarizeBinop - never scalarize target opcodes. We have (almost) no target opcodes that have scalar/vector equivalents - for now assume we can't scalarize them (we can add exceptions if we need to). llvm-svn: 364429	2019-06-26 14:21:29 +00:00
Roman Lebedev	13889145f0	[X86][Codegen] X86DAGToDAGISel::matchBitExtract(): consistently capture lambdas by value llvm-svn: 364420	2019-06-26 12:19:52 +00:00
Roman Lebedev	fbb2e40d5c	[X86] X86DAGToDAGISel::matchBitExtract(): pattern c: truncation awareness Summary: The one thing of note here is that the 'bitwidth' constant (32/64) was previously pessimistic. Given `x & (-1 >> (C - z))`, we were taking `C` to be `bitwidth(x)`, but in reality we want `(-1 >> (C - z))` pattern to mean "low z bits must be all-ones". And for that, `C` should be `bitwidth(-1 >> (C - z))`, i.e. of the shift operation itself. Last pattern D does not seem to exhibit any of these truncation issues. Although it has the opposite problem - if we extract low bits (no shift) from i64, and then truncate to i32, then we fail to shrink this 64-bit extraction into 32-bit extraction. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62806 llvm-svn: 364419	2019-06-26 12:19:47 +00:00
Roman Lebedev	b0ecc1cc6b	[X86] X86DAGToDAGISel::matchBitExtract(): pattern b: truncation awareness Summary: (Not so) boringly identical to pattern a (D62786) Not yet sure how do deal with the last pattern c. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62793 llvm-svn: 364418	2019-06-26 12:19:39 +00:00
Roman Lebedev	8b9a03973a	[X86] X86DAGToDAGISel::matchBitExtract(): pattern a: truncation awareness Summary: Finally tying up loose ends here. The problem is quite simple: If we have pattern `(x >> start) & (1 << nbits) - 1`, and then truncate the result, that truncation will be propagated upwards, into the `and`. And that isn't currently handled. I'm only fixing pattern `a` here, the same fix will be needed for patterns `b`/`c` too. I think this isn't missing any extra legality checks, since we only look past truncations. Similary, i don't think we can get any other truncation there other than i64->i32. Reviewers: craig.topper, RKSimon, spatel Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62786 llvm-svn: 364417	2019-06-26 12:19:11 +00:00
Hans Wennborg	6876de90e8	Fix the build after r364401 It was failing with: /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:18772:66: error: call of overloaded 'makeArrayRef(<brace-enclosed initializer list>)' is ambiguous scaleShuffleMask<int>(Scale, makeArrayRef<int>({ 0, 2, 1, 3 }), Mask); ^ /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:18772:66: note: candidates are: In file included from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/CodeGen/MachineFunction.h:20:0, from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/CodeGen/CallingConvLower.h:19, from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.h:17, from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:14: /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/ADT/ArrayRef.h:480:15: note: llvm::ArrayRef<T> llvm::makeArrayRef(const std::vector<_RealType>&) [with T = int] ArrayRef<T> makeArrayRef(const std::vector<T> &Vec) { ^ /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/ADT/ArrayRef.h:485:37: note: llvm::ArrayRef<T> llvm::makeArrayRef(const llvm::ArrayRef<T>&) [with T = int] template <typename T> ArrayRef<T> makeArrayRef(const ArrayRef<T> &Vec) { ^ llvm-svn: 364414	2019-06-26 11:56:38 +00:00
Simon Pilgrim	c0711af7f9	[X86][AVX] combineExtractSubvector - 'little to big' extract_subvector(bitcast()) support Ideally this needs to be a generic combine in DAGCombiner::visitEXTRACT_SUBVECTOR but there's some nasty regressions in aarch64 due to neon shuffles not handling bitcasts at all..... llvm-svn: 364407	2019-06-26 11:21:09 +00:00
Simon Pilgrim	3845a4f849	[X86][AVX] truncateVectorWithPACK - avoid bitcasted shuffles truncateVectorWithPACK is often used in conjunction with ComputeNumSignBits which struggles when peeking through bitcasts. This fix tries to avoid bitcast(shuffle(bitcast())) patterns in the 256-bit 64-bit sublane shuffles so we can still see through at least until lowering when the shuffles will need to be bitcasted to widen the shuffle type. llvm-svn: 364401	2019-06-26 09:50:11 +00:00
Clement Courbet	be98e0ab78	[ExpandMemCmp] Honor prefer-vector-width. Reviewers: gchatelet, echristo, spatel, atdt Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63769 llvm-svn: 364384	2019-06-26 07:06:49 +00:00
Matt Arsenault	8fcc70f141	Don't look for the TargetFrameLowering in the implementation The same oddity was apparently copy-pasted between multiple targets. llvm-svn: 364349	2019-06-25 20:53:35 +00:00
Craig Topper	4577b8c17c	[X86] Remove isel patterns that look for (vzext_movl (scalar_to_vector (load))) I believe these all get canonicalized to vzext_movl. The only case where that wasn't true was when the load was loadi32 and the load was an extload aligned to 32 bits. But that was fixed in r364207. Differential Revision: https://reviews.llvm.org/D63701 llvm-svn: 364337	2019-06-25 17:31:52 +00:00
Craig Topper	14ea14ae85	[X86] Add a DAG combine to turn vzmovl+load into vzload if the load isn't volatile. Remove isel patterns for vzmovl+load We currently have some isel patterns for treating vzmovl+load the same as vzload, but that shrinks the load which we shouldn't do if the load is volatile. Rather than adding isel checks for volatile. This patch removes the patterns and teachs DAG combine to merge them into vzload when its legal to do so. Differential Revision: https://reviews.llvm.org/D63665 llvm-svn: 364333	2019-06-25 17:08:26 +00:00
Simon Pilgrim	aae4b68703	[X86] lowerShuffleAsSpecificZeroOrAnyExtend - add ANY_EXTEND TODO. lowerShuffleAsSpecificZeroOrAnyExtend should be able to lower to ANY_EXTEND_VECTOR_INREG as well as ZER_EXTEND_VECTOR_INREG. llvm-svn: 364313	2019-06-25 13:36:53 +00:00
Clement Courbet	3bc5ad551a	[ExpandMemCmp] Move all options to TargetTransformInfo. Split off from D60318. llvm-svn: 364281	2019-06-25 08:04:13 +00:00
Craig Topper	7fccb2ac5e	[X86] Don't a vzext_movl in LowerBuildVectorv16i8/LowerBuildVectorv8i16 if there are no zeroes in the vector we're building. In LowerBuildVectorv16i8 we took care to use an any_extend if the first pair is in the lower 16-bits of the vector and no elements are 0. So bits [31:16] will be undefined. But we still emitted a vzext_movl to ensure that bits [127:32] are 0. If we don't need any zeroes we should be consistent and make all of 127:16 undefined. In LowerBuildVectorv8i16 we can just delete the vzext_movl code because we only use the scalar_to_vector when there are no zeroes. So the vzext_movl is always unnecessary. Found while investigating whether (vzext_movl (scalar_to_vector (loadi32)) patterns are necessary. At least one of the cases where they were necessary was where the loadi32 matched 32-bit aligned 16-bit extload. Seemed weird that we required vzext_movl for that case. Differential Revision: https://reviews.llvm.org/D63700 llvm-svn: 364207	2019-06-24 17:28:41 +00:00
Craig Topper	033774e144	[X86] Cleanups and safety checks around the isFNEG This patch does a few things to start cleaning up the isFNEG function. -Remove the Op0/Op1 peekThroughBitcast calls that seem unnecessary. getTargetConstantBitsFromNode has its own peekThroughBitcast inside. And we have a separate peekThroughBitcast on the return value. -Add a check of the scalar size after the first peekThroughBitcast to ensure we haven't changed the element size and just did something like f32->i32 or f64->i64. -Remove an unnecessary check that Op1's type is floating point after the peekThroughBitcast. We're just going to look for a bit pattern from a constant. We don't care about its type. -Add VT checks on several places that consume the return value of isFNEG. Due to the peekThroughBitcasts inside, the type of the return value isn't guaranteed. So its not safe to use it to build other nodes without ensuring the type matches the type being used to build the node. We might be able to replace these checks with bitcasts instead, but I don't have a test case so a bail out check seemed better for now. Differential Revision: https://reviews.llvm.org/D63683 llvm-svn: 364206	2019-06-24 17:28:26 +00:00
Matt Arsenault	faeaedf8e9	GlobalISel: Remove unsigned variant of SrcOp Force using Register. One downside is the generated register enums require explicit conversion. llvm-svn: 364194	2019-06-24 16:16:12 +00:00
Matt Arsenault	e3a676e9ad	CodeGen: Introduce a class for registers Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set needed to build after changing the return type of MachineOperand::getReg(). llvm-svn: 364191	2019-06-24 15:50:29 +00:00
Craig Topper	e8da65c698	[X86] Turn v16i16->v16i8 truncate+store into a any_extend+truncstore if we avx512f, but not avx512bw. Ideally we'd be able to represent this truncate as a any_extend to v16i32 and a truncate, but SelectionDAG doens't know how to not fold those together. We have isel patterns to use a vpmovzxwd+vpdmovdb for the truncate, but we aren't able to simultaneously fold the load and the store from the isel pattern. By pulling the truncate into the store we can successfully hide it from the DAG combiner. Then we can isel pattern match the truncstore and load+any_extend separately. llvm-svn: 364163	2019-06-23 23:51:21 +00:00
Craig Topper	c8d94e7889	[X86] Fix isel pattern that was looking for a bitcasted load. Remove what appears to be a copy/paste mistake. DAG combine should ensure bitcasts of loads don't exist. Also remove 3 patterns that are identical to the block above them. llvm-svn: 364158	2019-06-23 19:17:50 +00:00
Craig Topper	cadd826d0a	[X86][SelectionDAG] Cleanup and simplify masked_load/masked_store in tablegen. Use more precise PatFrags for scalar masked load/store. Rename masked_load/masked_store to masked_ld/masked_st to discourage their direct use. We need to check truncating/extending and compressing/expanding before using them. This revealed that our scalar masked load/store patterns were misusing these. With those out of the way, renamed masked_load_unaligned and masked_store_unaligned to remove the "_unaligned". We didn't check the alignment anyway so the name was somewhat misleading. Make the aligned versions inherit from masked_load/store instead from a separate identical version. Merge the 3 different alignments PatFrags into a single version that uses the VT from the SDNode to determine the size that the alignment needs to match. llvm-svn: 364150	2019-06-23 06:06:04 +00:00
Simon Pilgrim	a962c1bc0f	[X86][SSE] Fold extract_subvector(vselect(x,y,z),0) -> vselect(extract_subvector(x,0),extract_subvector(y,0),extract_subvector(z,0)) llvm-svn: 364136	2019-06-22 17:57:01 +00:00
Craig Topper	4649a051bf	[X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into (insert_subvector allzeros, (vzmovl X), 0) 128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg. This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns. Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining. I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload. Differential Revision: https://reviews.llvm.org/D63512 llvm-svn: 364095	2019-06-21 19:10:21 +00:00
Craig Topper	4569cdbcf5	[X86] Don't mark v64i8/v32i16 ISD::SELECT as custom unless they are legal types. We don't have any Custom handling during type legalization. Only operation legalization. Fixes PR42355 llvm-svn: 364093	2019-06-21 18:50:00 +00:00
Craig Topper	ce6c06dfdd	[X86] Add a debug print of the node in the default case for unhandled opcodes in ReplaceNodeResults. This should be unreachable, but bugs can make it reachable. This adds a debug print so we can see the bad node in the output when the llvm_unreachable triggers. llvm-svn: 364091	2019-06-21 18:49:21 +00:00
Simon Pilgrim	5dba4ed208	[X86][AVX] Combine INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) as shuffle Subvector shuffling often ends up as insert/extract subvector. llvm-svn: 364090	2019-06-21 18:35:04 +00:00
Craig Topper	6af1be9664	[X86] Use vmovq for v4i64/v4f64/v8i64/v8f64 vzmovl. We already use vmovq for v2i64/v2f64 vzmovl. But we were using a blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or movsd+xorpd under optsize. I think the blend with 0 or movss/d is only needed for vXi32 where we don't have an instruction that can move 32 bits from one xmm to another while zeroing upper bits. movq is no worse than blendpd on any known CPUs. llvm-svn: 364079	2019-06-21 17:24:21 +00:00
Simon Pilgrim	96e77ce626	[X86] isBinOp - move commutative ops to isCommutativeBinOp. NFCI. TargetLoweringBase::isBinOp checks isCommutativeBinOp as a fallback, so don't duplicate. llvm-svn: 364072	2019-06-21 16:23:28 +00:00
Simon Pilgrim	36a999ffb8	[X86] X86ISD::ANDNP is a (non-commutative) binop The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better. llvm-svn: 364038	2019-06-21 12:42:39 +00:00
Simon Pilgrim	9184b009cf	[X86] createMMXBuildVector - call with BuildVectorSDNode directly. NFCI. llvm-svn: 364030	2019-06-21 11:25:06 +00:00
Simon Pilgrim	c26b8f2afc	[X86] combineAndnp - use isNOT instead of manually checking for (XOR x, -1) llvm-svn: 364026	2019-06-21 11:13:15 +00:00
Simon Pilgrim	b5733581c4	[X86] foldVectorXorShiftIntoCmp - use isConstOrConstSplat. NFCI. Use the isConstOrConstSplat helper instead of inspecting the build vector manually. llvm-svn: 364024	2019-06-21 10:54:30 +00:00
Simon Pilgrim	771c33e375	[X86][AVX] isNOT - handle concat_vectors(xor X, -1, xor Y, -1) pattern llvm-svn: 364022	2019-06-21 10:44:15 +00:00
Fangrui Song	dc8de6037c	Simplify std::lower_bound with llvm::{bsearch,lower_bound}. NFC llvm-svn: 364006	2019-06-21 05:40:31 +00:00
Craig Topper	9e1665f2d6	[X86] Add BLSI to isUseDefConvertible. Summary: BLSI sets the C flag is the input is not zero. So if its followed by a TEST of the input where only the Z flag is consumed, we can replace it with the opposite check of the C flag. We should be able to do the same for BLSMSK and BLSR, but the naive test case for those is being optimized to a subo by CodeGenPrepare. Reviewers: spatel, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63589 llvm-svn: 363957	2019-06-20 17:52:53 +00:00
Simon Pilgrim	a4d705e0ef	[X86] LowerAVXExtend - handle ANY_EXTEND_VECTOR_INREG lowering as well. llvm-svn: 363922	2019-06-20 11:31:54 +00:00
Craig Topper	b4ea64570c	[X86] Remove memory instructions form isUseDefConvertible. The caller of this is looking for comparisons of the input to these instructions with 0. But the memory instructions input is an addess not a value input in a register. llvm-svn: 363907	2019-06-20 04:58:40 +00:00
Craig Topper	451f7feb64	[X86] Add v64i8/v32i16 to several places in X86CallingConv.td where they seemed obviously missing. llvm-svn: 363906	2019-06-20 04:29:00 +00:00
Sanjay Patel	b5640b6fe8	[x86] avoid vector load narrowing with extracted store uses (PR42305) This is an exception to the rule that we should prefer xmm ops to ymm ops. As shown in PR42305: https://bugs.llvm.org/show_bug.cgi?id=42305 ...the store folding opportunity with vextractf128 may result in better perf by reducing the instruction count. Differential Revision: https://reviews.llvm.org/D63517 llvm-svn: 363853	2019-06-19 18:13:47 +00:00
Simon Pilgrim	0018b78ef6	[X86][SSE] combineToExtendVectorInReg - add ANY_EXTEND support TODO. NFCI. So I don't forget - there's a load of yak shaving to do first. llvm-svn: 363847	2019-06-19 17:42:37 +00:00

1 2 3 4 5 ...

19003 Commits