llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	b23c942ce4	[VectorLegalizer] ExpandANY_EXTEND_VECTOR_INREG/ExpandZERO_EXTEND_VECTOR_INREG - widen source vector The *_EXTEND_VECTOR_INREG opcodes were relaxed back around rL346784 to support source vector widths that are smaller than the output - it looks like the legalizers were never updated to account for this. This patch inserts the smaller source vector into an undef vector of the same width of the result before performing the shuffle+bitcast to correctly handle this. Part of the yak shaving to solve the crashes from rL364264 and rL364272 llvm-svn: 364295	2019-06-25 11:31:37 +00:00
Simon Tatham	4cf18c2849	[ARM] Explicit lowering of half <-> double conversions. If an FP_EXTEND or FP_ROUND isel dag node converts directly between f16 and f32 when the target CPU has no instruction to do it in one go, it has to be done in two steps instead, going via f32. Previously, this was done implicitly, because all such CPUs had the storage-only implementation of f16 (i.e. the only thing you can do with one at all is to convert it to/from f32). So isel would legalize the f16 into an f32 as soon as it saw it, by inserting an fp16_to_fp node (or vice versa), and then the fp_extend would already be f32->f64 rather than f16->f64. But that technique can't support a target CPU which has full f16 support but _not_ f64, such as some variants of Arm v8.1-M. So now we provide custom lowering for FP_EXTEND and FP_ROUND, which checks support for f16 and f64 and decides on the best thing to do given the combination of flags it gets back. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60692 llvm-svn: 364294	2019-06-25 11:24:50 +00:00
Simon Tatham	86b7a1e660	[ARM] Add remaining miscellaneous MVE instructions. This final batch includes the tail-predicated versions of the low-overhead loop instructions (LETP); the VPSEL instruction to select between two vector registers based on the predicate mask without having to open a VPT block; and VPNOT which complements the predicate mask in place. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62681 llvm-svn: 364292	2019-06-25 11:24:33 +00:00
Simon Tatham	e6824160dd	[ARM] Add MVE vector load/store instructions. This adds the rest of the vector memory access instructions. It includes contiguous loads/stores, with an ordinary addressing mode such as [r0,#offset] (plus writeback variants); gather loads and scatter stores with a scalar base address register and a vector of offsets from it (written [r0,q1] or similar); and gather/scatters with a vector of base addresses (written [q0,#offset], again with writeback). Additionally, some of the loads can widen each loaded value into a larger vector lane, and the corresponding stores narrow them again. To implement these, we also have to add the addressing modes they need. Also, in AsmParser, the `isMem` query function now has subqueries `isGPRMem` and `isMVEMem`, according to which kind of base register is used by a given memory access operand. I've also had to add an extra check in `checkTargetMatchPredicate` in the AsmParser, without which our last-minute check of `rGPR` register operands against SP and PC was failing an assertion because Tablegen had inserted an immediate 0 in place of one of a pair of tied register operands. (This matches the way the corresponding check for `MCK_rGPR` in `validateTargetOperandClass` is guarded.) Apparently the MVE load instructions were the first to have ever triggered this assertion, but I think only because they were the first to have a combination of the usual Arm pre/post writeback system and the `rGPR` class in particular. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62680 llvm-svn: 364291	2019-06-25 11:24:18 +00:00
Simon Pilgrim	49b3778e32	[TargetLowering] SimplifyDemandedBits - legal checks for SIGN/ZERO_EXTEND -> ZERO/ANY_EXTEND As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations \|\| isOperationLegal cases. We'll improve X86 legality in future commits. llvm-svn: 364290	2019-06-25 10:51:15 +00:00
Nemanja Ivanovic	47b7d13459	[PowerPC] Emit XXSEL for vec_sel and code that has the same pattern As pointed out in https://bugs.llvm.org/show_bug.cgi?id=41777 we do not emit a vector select even when the pretty much asks for one. This patch changes that. Differential revision: https://reviews.llvm.org/D61658 llvm-svn: 364289	2019-06-25 10:46:13 +00:00
Sam Parker	a6fd919cb3	[ARM] DLS/LE low-overhead loop code generation Introduce three pseudo instructions to be used during DAG ISel to represent v8.1-m low-overhead loops. One maps to set_loop_iterations while loop_decrement_reg is lowered to two, so that we can separate the decrement and branching operations. The pseudo instructions are expanded pre-emission, where we can still decide whether we actually want to generate a low-overhead loop, in a new pass: ARMLowOverheadLoops. The pass currently bails, reverting to an sub, icmp and br, in the cases where a call or stack spill/restore happens between the decrement and branching instructions, or if the loop is too large. Differential Revision: https://reviews.llvm.org/D63476 llvm-svn: 364288	2019-06-25 10:45:51 +00:00
Roman Lebedev	cdd43eac4f	[Codegen] TargetLowering::SimplifySetCC(): omit urem when possible Summary: This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll` The missing fold, at least partially, looks trivial: https://rise4fun.com/Alive/Zsln i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two, and the `urem` is of something that may at most have a single bit set (or no bits set at all), the `urem` is not needed. Reviewers: RKSimon, craig.topper, xbolva00, spatel Reviewed By: xbolva00, spatel Subscribers: xbolva00, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63390 llvm-svn: 364286	2019-06-25 10:01:42 +00:00
Clement Courbet	3bc5ad551a	[ExpandMemCmp] Move all options to TargetTransformInfo. Split off from D60318. llvm-svn: 364281	2019-06-25 08:04:13 +00:00
Craig Topper	079924b0b7	Revert r363802, r363850, and r363856 "[TargetLowering] SimplifyDemandedBits..." This reverts the following patches. "[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support" We can end up with an any_extend_vector_inreg with a 256 bit result type and a 128 bit result type. This is allowed by the ISD opcode, but the generic operation legalizer is only able to expand cases where the total vector width is the same. The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg. The SimplifyDemandedBits changes are allowing those nodes to become aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom handling and never lets them get to the generic legalizer. We need to do the same for aext_vec_inreg. llvm-svn: 364264	2019-06-25 01:32:42 +00:00
Matt Arsenault	25bc27965a	AMDGPU/GlobalISel: Fix regbankselect for amdgcn.class llvm-svn: 364262	2019-06-25 01:07:22 +00:00
Huihui Zhang	4626613ffe	[InstCombine] Fold icmp eq/ne (and %x, C), 0 iff (-C) is power of two -> %x u</u>= (-C) earlier. Summary: To generate simplified IR, make sure fold (X & ~C) ==/!= 0 --> X u</u>= C+1 is scheduled before fold ((X << Y) & C) == 0 -> (X & (C >> Y)) == 0. https://rise4fun.com/Alive/7ZN Reviewers: lebedev.ri, efriedma, spatel, craig.topper Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63505 llvm-svn: 364255	2019-06-25 00:09:10 +00:00
David Blaikie	f895e1bded	DataExtractor: use decodeSLEB128 to implement getSLEB128 Should've been NFC, but turns out DataExtractor had better test coverage for decoding SLEB128 than the decodeSLEB128 did - revealing a couple of bugs (one in the error handling, another in sign extension). So fixed those to get the DataExtractor tests passing again. llvm-svn: 364253	2019-06-24 23:45:18 +00:00
Sanjay Patel	2675b0c8ab	[InstCombine] squash is-not-power-of-2 using ctpop This is the Demorgan'd 'not' of the pattern handled in: D63660 / rL364153 This is another intermediate IR step towards solving PR42314: https://bugs.llvm.org/show_bug.cgi?id=42314 We can test if a value is not a power-of-2 using ctpop(X) > 1, so combining that with an is-zero check of the input is the same as testing if not exactly 1 bit is set: (X == 0) \|\| (ctpop(X) u> 1) --> ctpop(X) != 1 llvm-svn: 364246	2019-06-24 22:35:26 +00:00
Vasileios Porpodas	3081f78776	[SLP] NFC: Fixed typo in comment llvm-svn: 364237	2019-06-24 21:40:48 +00:00
Matt Arsenault	8025842599	InstCombine: Preserve nuw when reassociating nuw ops [3/3] Alive says this is OK. llvm-svn: 364235	2019-06-24 21:37:03 +00:00
Matt Arsenault	5d82ecd5d9	InstCombine: Preserve nuw when reassociating nuw ops [2/3] Alive says this is OK. llvm-svn: 364234	2019-06-24 21:37:02 +00:00
Matt Arsenault	5a89ba7343	InstCombine: Preserve nuw when reassociating nuw ops [1/3] Alive says this is OK. llvm-svn: 364233	2019-06-24 21:36:59 +00:00
David Blaikie	8242f35d50	NFC: DataExtractor: use decodeULEB128 to implement getULEB128 llvm-svn: 364230	2019-06-24 20:43:36 +00:00
Nikita Popov	f1ffc4305d	[CVP] Reenable nowrap flag inference Inference of nowrap flags in CVP has been disabled, because it triggered a bug in LFTR (https://bugs.llvm.org/show_bug.cgi?id=31181). This issue has been fixed in D60935, so we should be able to reenable nowrap flag inference now. Differential Revision: https://reviews.llvm.org/D62776 llvm-svn: 364228	2019-06-24 20:13:13 +00:00
Peter Collingbourne	9c8282a9b3	llvm-symbolizer: Add a FRAME command. This command prints a description of the referenced function's stack frame. For each formal parameter and local variable, the tool prints: - function name - variable name - file/line of declaration - FP-relative variable location (if available) - size in bytes - HWASAN tag offset This information will be used by the HWASAN runtime to identify local variables in UAR reports. Differential Revision: https://reviews.llvm.org/D63468 llvm-svn: 364225	2019-06-24 20:03:23 +00:00
Roland Froese	ea08248b2b	[CodeGen] Add missing vector type legalization for ctlz_zero_undef Widen vector result type for ctlz_zero_undef and cttz_zero_undef the same as ctlz and cttz. Differential Revision: https://reviews.llvm.org/D63463 llvm-svn: 364221	2019-06-24 19:27:07 +00:00
Cameron McInally	fe3f15cf90	[SLP] Support unary FNeg vectorization Differential Revision: https://reviews.llvm.org/D63609 llvm-svn: 364219	2019-06-24 19:24:23 +00:00
Matt Arsenault	dbb6c03175	AMDGPU/GlobalISel: Select G_TRUNC llvm-svn: 364215	2019-06-24 18:02:18 +00:00
Matt Arsenault	14d0b646b7	AMDGPU/GlobalISel: RegBankSelect for amdgcn.class llvm-svn: 364214	2019-06-24 18:00:47 +00:00
Matt Arsenault	8fcd5ade3e	AMDGPU/GlobalISel: Split VALU s64 G_ZEXT/G_SEXT in RegBankSelect Scalar extends to s64 can use S_BFE_{I64\|U64}, but vector extends need to extend to the 32-bit half, and then to 64. I'm not sure what the line should be between what RegBankSelect handles, and what instruction select does, but for now I'm erring on the side of RegBankSelect for future post-RBS combines. llvm-svn: 364212	2019-06-24 17:54:12 +00:00
Tim Renouf	d2fdb956e0	[AMDGPU] Allow any value in unused src0 field in v_nop Summary: The LLVM disassembler assumes that the unused src0 operand of v_nop is zero. Other tools can put another value in that field, which is still valid. This commit fixes the LLVM disassembler to recognize such an encoding as v_nop, in the same way as we already do for s_getpc. Differential Revision: https://reviews.llvm.org/D63724 Change-Id: Iaf0363eae26ff92fc4ebc716216476adbff37a6f llvm-svn: 364208	2019-06-24 17:35:20 +00:00
Craig Topper	7fccb2ac5e	[X86] Don't a vzext_movl in LowerBuildVectorv16i8/LowerBuildVectorv8i16 if there are no zeroes in the vector we're building. In LowerBuildVectorv16i8 we took care to use an any_extend if the first pair is in the lower 16-bits of the vector and no elements are 0. So bits [31:16] will be undefined. But we still emitted a vzext_movl to ensure that bits [127:32] are 0. If we don't need any zeroes we should be consistent and make all of 127:16 undefined. In LowerBuildVectorv8i16 we can just delete the vzext_movl code because we only use the scalar_to_vector when there are no zeroes. So the vzext_movl is always unnecessary. Found while investigating whether (vzext_movl (scalar_to_vector (loadi32)) patterns are necessary. At least one of the cases where they were necessary was where the loadi32 matched 32-bit aligned 16-bit extload. Seemed weird that we required vzext_movl for that case. Differential Revision: https://reviews.llvm.org/D63700 llvm-svn: 364207	2019-06-24 17:28:41 +00:00
Craig Topper	033774e144	[X86] Cleanups and safety checks around the isFNEG This patch does a few things to start cleaning up the isFNEG function. -Remove the Op0/Op1 peekThroughBitcast calls that seem unnecessary. getTargetConstantBitsFromNode has its own peekThroughBitcast inside. And we have a separate peekThroughBitcast on the return value. -Add a check of the scalar size after the first peekThroughBitcast to ensure we haven't changed the element size and just did something like f32->i32 or f64->i64. -Remove an unnecessary check that Op1's type is floating point after the peekThroughBitcast. We're just going to look for a bit pattern from a constant. We don't care about its type. -Add VT checks on several places that consume the return value of isFNEG. Due to the peekThroughBitcasts inside, the type of the return value isn't guaranteed. So its not safe to use it to build other nodes without ensuring the type matches the type being used to build the node. We might be able to replace these checks with bitcasts instead, but I don't have a test case so a bail out check seemed better for now. Differential Revision: https://reviews.llvm.org/D63683 llvm-svn: 364206	2019-06-24 17:28:26 +00:00
Matt Arsenault	f8a841b88e	AMDGPU/GlobalISel: Fix selecting G_IMPLICIT_DEF for s1 Try to fail for scc, since I don't think that should ever be produced. llvm-svn: 364199	2019-06-24 16:24:03 +00:00
Matt Arsenault	ae171f1e9f	Hexagon: Rename another copy of Register class For some reason clang is happy with the conflict, but MSVC is not. llvm-svn: 364196	2019-06-24 16:16:19 +00:00
Matt Arsenault	f8f1ace5bb	ARC: Fix -Wimplicit-fallthrough llvm-svn: 364195	2019-06-24 16:16:16 +00:00
Matt Arsenault	faeaedf8e9	GlobalISel: Remove unsigned variant of SrcOp Force using Register. One downside is the generated register enums require explicit conversion. llvm-svn: 364194	2019-06-24 16:16:12 +00:00
Matt Arsenault	e3a676e9ad	CodeGen: Introduce a class for registers Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set needed to build after changing the return type of MachineOperand::getReg(). llvm-svn: 364191	2019-06-24 15:50:29 +00:00
Bjorn Pettersson	3260ef16bb	[AMDGPU] Remove unused variable AllSGPRSpilledToVGPRs. NFC Summary: Removing the unused variable AllSGPRSpilledToVGPRs in SIFrameLowering::processFunctionBeforeFrameFinalized to avoid error: variable 'AllSGPRSpilledToVGPRs' set but not used [-Werror=unused-but-set-variable] Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63721 llvm-svn: 364190	2019-06-24 15:50:18 +00:00
Matt Arsenault	2bc35b7938	Hexagon: Rename Register class This avoids a naming conflict in a future patch. llvm-svn: 364188	2019-06-24 15:27:29 +00:00
Sanjay Patel	89efefb170	[InstCombine] reduce funnel-shift i16 X, X, 8 to bswap X Prefer the more exact intrinsic to remove a use of the input value and possibly make further transforms easier (we will still need to match patterns with funnel-shift of wider types as pieces of bswap, especially if we want to canonicalize to funnel-shift with constant shift amount). Discussed in D46760. llvm-svn: 364187	2019-06-24 15:20:49 +00:00
Matt Arsenault	5dbd9228c4	AMDGPU/GlobalISel: Fix RegBankSelect for s1 sext/zext/anyext This needs different handling if the source is known to be a valid condition or not. Handle turning it into shifts or a select during regbankselect. llvm-svn: 364186	2019-06-24 14:53:58 +00:00
Matt Arsenault	60957cb74c	AMDGPU: Fold frame index into MUBUF This matters for byval uses outside of the entry block, which appear as copies. Previously, the only folding done was during selection, which could not see the underlying frame index. For any uses outside the entry block, the frame index was materialized in the entry block relative to the global scratch wave offset. This may produce worse code in cases where the offset ends up not fitting in the MUBUF offset field. A better heuristic would be helpfu for extreme frames. llvm-svn: 364185	2019-06-24 14:53:56 +00:00
Matt Arsenault	942404d01b	AMDGPU: Cleanup checking when spills need emergency slots Address fixme, which should no longer be a problem since r363757. llvm-svn: 364182	2019-06-24 14:34:40 +00:00
Simon Pilgrim	b617b0808d	[InstCombine] SliceUpIllegalIntegerPHI - bail on out of range shifts trunc(lshr) handling - if the shift is out of range (undefined) then bail like we do for non-constant shifts. Fixes OSS Fuzz #15217 llvm-svn: 364181	2019-06-24 13:13:36 +00:00
Simon Pilgrim	69144a925e	[DAGCombine] visitMUL - allow shift by zero in MulByConstant. This can occur under certain circumstances when undefs are created later on in the constant multipliers (e.g. in this case due to SimplifyDemandedVectorElts). Its better to let the shift by zero to occur and perform any cleanup afterward. Fixes OSS Fuzz #15429 llvm-svn: 364179	2019-06-24 12:47:17 +00:00
Bjorn Pettersson	485a421876	[ConstantFolding] Use hasVectorInstrinsicScalarOpd. NFC Summary: Use the hasVectorInstrinsicScalarOpd helper function in ConstantFoldVectorCall. Reviewers: rengolin, RKSimon, dblaikie Reviewed By: rengolin, RKSimon Subscribers: tschuett, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63705 llvm-svn: 364178	2019-06-24 12:07:17 +00:00
Bjorn Pettersson	512b118779	[Scalarizer] Add scalarizer support for smul.fix.sat Summary: Handle smul.fix.sat in the scalarizer. This is done by adding smul.fix.sat to the set of "isTriviallyVectorizable" intrinsics. The addition of smul.fix.sat in isTriviallyVectorizable and hasVectorInstrinsicScalarOpd can also be seen as a preparation to be able to use hasVectorInstrinsicScalarOpd in ConstantFolding. Reviewers: rengolin, RKSimon, dblaikie Reviewed By: rengolin Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63704 llvm-svn: 364177	2019-06-24 12:07:11 +00:00
Simon Tatham	fe8017621e	[ARM] Add MVE interleaving load/store family. This adds the family of loads and stores with names like VLD20.8 and VST42.32, which load and store parts of multiple q-registers in such a way that executing both VLD20 and VLD21, or all four of VLD40..VLD43, will distribute 2 or 4 vectors' worth of memory data across the lanes of the same number of registers but in a transposed order. In addition to the Tablegen descriptions of the instructions themselves, this patch also adds encode and decode support for the QQPR and QQQQPR register classes (representing the range of loaded or stored vector registers), and tweaks to the parsing system for lists of vector registers to make it return the right format in this case (since, unlike NEON, MVE regards q-registers as primitive, and not just an alias for two d-registers). llvm-svn: 364172	2019-06-24 10:00:39 +00:00
Pavel Labath	bb6d0b8e7b	[Support] Fix error handling in DataExtractor::get[US]LEB128 Summary: These functions are documented as not modifying the offset argument if the extraction fails (just like other DataExtractor functions). However, while reviewing D63591 we discovered that this is not the case -- if the function reaches the end of the data buffer, it will just return the value parsed until that point and set offset to point to the end of the buffer. This fixes the functions to act as advertised, and adds a regression test. Reviewers: dblaikie, probinson, bkramer Subscribers: kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63645 llvm-svn: 364169	2019-06-24 09:11:24 +00:00
Craig Topper	e8da65c698	[X86] Turn v16i16->v16i8 truncate+store into a any_extend+truncstore if we avx512f, but not avx512bw. Ideally we'd be able to represent this truncate as a any_extend to v16i32 and a truncate, but SelectionDAG doens't know how to not fold those together. We have isel patterns to use a vpmovzxwd+vpdmovdb for the truncate, but we aren't able to simultaneously fold the load and the store from the isel pattern. By pulling the truncate into the store we can successfully hide it from the DAG combiner. Then we can isel pattern match the truncstore and load+any_extend separately. llvm-svn: 364163	2019-06-23 23:51:21 +00:00
Sanjoy Das	e2291f5af9	Fix typo in comment; NFC llvm-svn: 364159	2019-06-23 19:22:13 +00:00
Craig Topper	c8d94e7889	[X86] Fix isel pattern that was looking for a bitcasted load. Remove what appears to be a copy/paste mistake. DAG combine should ensure bitcasts of loads don't exist. Also remove 3 patterns that are identical to the block above them. llvm-svn: 364158	2019-06-23 19:17:50 +00:00
Philip Reames	d22a2a9a72	[IndVars] Remove dead instructions after folding trivial loop exit In rL364135, I taught IndVars to fold exiting branches in loops with a zero backedge taken count (i.e. loops that only run one iteration). This extends that to eliminate the dead comparison left around. llvm-svn: 364155	2019-06-23 17:06:57 +00:00

1 2 3 4 5 ...

124005 Commits