llvm-project

Commit Graph

Author	SHA1	Message	Date
Jessica Paquette	8a4d9f04b5	[AArch64][GlobalISel] Support -tailcallopt This adds support for `-tailcallopt` tail calls to CallLowering. This piggy-backs off the changes from D67577, since doing it without a bit of refactoring gets extremely ugly. Support is basically ported from AArch64ISelLowering. The main difference here is that tail calls in `-tailcallopt` change the ABI, so there's some extra bookkeeping for the stack. Show that we are correctly lowering these by updating tail-call.ll. Also show that we don't do anything strange in general by updating fastcc-reserved.ll, which passes `-tailcallopt`, but doesn't emit any tail calls. Differential Revision: https://reviews.llvm.org/D67580 llvm-svn: 372177	2019-09-17 20:24:23 +00:00
Craig Topper	f9a89b6788	[X86] Simplify b2b KSHIFTL+KSHIFTR using demanded elts. llvm-svn: 372155	2019-09-17 18:02:56 +00:00
Craig Topper	f1ba94ade0	[X86] Call SimplifyDemandedVectorElts on KSHIFTL/KSHIFTR nodes during DAG combine. llvm-svn: 372154	2019-09-17 18:02:52 +00:00
Nemanja Ivanovic	1461fb6e78	[PowerPC] Exploit single instruction load-and-splat for word and doubleword We currently produce a load, followed by (possibly a move for integers and) a splat as separate instructions. VSX has always had a splatting load for doublewords, but as of Power9, we have it for words as well. This patch just exploits these instructions. Differential revision: https://reviews.llvm.org/D63624 llvm-svn: 372139	2019-09-17 16:45:20 +00:00
David Green	91724b8530	[ARM] Add a SelectTAddrModeImm7 for MVE narrow loads and stores We were previously using the SelectT2AddrModeImm7 for both normal and narrowing MVE loads/stores. As the narrowing instructions do not accept sp as a register, it makes little sense to optimise a FrameIndex into the load, only to have to recover that later on. This adds a SelectTAddrModeImm7 which does not do that folding, and uses it for narrowing load/store patterns. Differential Revision: https://reviews.llvm.org/D67489 llvm-svn: 372134	2019-09-17 15:32:28 +00:00
David Green	c42ca16cfa	[ARM] Fixup pipeline test. NFC llvm-svn: 372133	2019-09-17 15:25:24 +00:00
David Green	22a2209433	[ARM] Reserve an emergency spill slot for fp16 addressing modes that need it Similar to D67327, but this time for the FP16 VLDR and VSTR instructions that use the AddrMode5FP16 addressing mode. We need to reserve an emergency spill slot for instructions that will be out of range to use sp directly. AddrMode5FP16 is 8 bits with a scale of 2. Differential Revision: https://reviews.llvm.org/D67483 llvm-svn: 372132	2019-09-17 15:23:09 +00:00
Sam Parker	1d9ba08543	[ARM] Fix for buildbots Remove setPreservesCFG from ARMConstantIslandPass and add a couple of -verify-machine-dom-info instances into the existing codegen tests. llvm-svn: 372126	2019-09-17 14:21:36 +00:00
Sam Parker	f1d069e54d	[ARM] Fix for buildbots Add --verifymachineinstrs and update the remaining low overhead loop tests. llvm-svn: 372121	2019-09-17 13:46:26 +00:00
David Green	1ff9553057	[ARM] Fix for MVE load/store stack accesses MVE loads and stores have a 7 bit immediate range, scaled by the length of the type. This needs to be taught to the stack estimation code to ensure that an emergency spill slot is reserved in case we run out of registers when materialising stack indices. Also the narrowing loads/stores can be created with frame indices even though they do not accept SP as a register. We need in those cases to make sure we have an emergency register to use as the frame base, as SP can never be used. Differential Revision: https://reviews.llvm.org/D67327 llvm-svn: 372114	2019-09-17 12:58:51 +00:00
Sam Parker	36c922278e	[ARM][LowOverheadLoops] Add LR def safety check Converting the *LoopStart pseudo instructions into DLS/WLS results in LR being defined. These instructions were inserted on the assumption that LR would already contain the loop counter because a mov is introduced during ISel as the the consumers in the loop can only use LR. That assumption proved wrong! So perform a safety check, finding an appropriate place to insert the DLS/WLS instructions or revert if this isn't possible. Differential Revision: https://reviews.llvm.org/D67539 llvm-svn: 372111	2019-09-17 12:19:32 +00:00
Luis Marques	3d0fbafd0b	[RISCV] Switch to the Machine Scheduler Most of the test changes are trivial instruction reorderings and differing register allocations, without any obvious performance impact. Differential Revision: https://reviews.llvm.org/D66973 llvm-svn: 372106	2019-09-17 11:15:35 +00:00
Luis Marques	2d550d19b3	Revert Patch from Phabricator This reverts r372092 (git commit `e38695a025`) llvm-svn: 372104	2019-09-17 10:52:09 +00:00
Luis Marques	e38695a025	Patch from Phabricator llvm-svn: 372092	2019-09-17 09:43:08 +00:00
David Bolvansky	e80fcf0340	[SimplifyLibCalls] Mark known arguments with nonnull Reviewers: efriedma, jdoerfert Reviewed By: jdoerfert Subscribers: ychen, rsmith, joerg, aaron.ballman, lebedev.ri, uenoku, jdoerfert, hfinkel, javed.absar, spatel, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D53342 llvm-svn: 372091	2019-09-17 09:32:52 +00:00
Alexander Timofeev	6524a7a2b9	[AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed Defferential Revision: https://reviews.llvm.org/D67101 Reviewers: rampitec, vpykhtin llvm-svn: 372086	2019-09-17 09:08:58 +00:00
Sam Parker	95b28a4c72	[ARM] LE support in ConstantIslands The low-overhead branch extension provides a loop-end 'LE' instruction that performs no decrement nor compare, it just jumps backwards. This patch modifies the constant islands pass to try to insert LE instructions in place of a Thumb2 conditional branch, instead of shrinking it. This only happens if a cmp can be converted to a cbn/z and used to exit the loop. Differential Revision: https://reviews.llvm.org/D67404 llvm-svn: 372085	2019-09-17 09:08:05 +00:00
Craig Topper	95aea74494	[X86] Split oversized vXi1 vector arguments and return values into scalars on avx512 targets. Previously we tried to split them into narrower v64i1 or v16i1 pieces that each got promoted to vXi8 and then passed in a zmm or xmm register. But this crashes when you need to pass more pieces than available registers reserved for argument passing. The scalarizing done here generates much longer and slower code, but is consistent with the behavior of avx2 and earlier targets for these types. Fixes PR43323. llvm-svn: 372069	2019-09-17 04:41:14 +00:00
Craig Topper	769dd59a27	[X86] Allow masked VBROADCAST instructions to be turned into BLENDM with a broadcast load to avoid a copy. The BLENDM instructions allow an 2 sources and an independent destination while masked VBROADCAST has the destination tied to the source. llvm-svn: 372068	2019-09-17 04:41:10 +00:00
Craig Topper	2cc57bedd5	[X86] Add support for commuting EVEX VCMP instructons with any immediate value. Previously we limited to the EQ/NE/TRUE/FALSE/ORD/UNORD immediates. llvm-svn: 372067	2019-09-17 04:41:05 +00:00
Craig Topper	d51576a3f0	[X86] Add test case for missed opportunity to commute a VCMP instruction after unfolding one load in order to fold another load. llvm-svn: 372066	2019-09-17 04:41:01 +00:00
Craig Topper	359918dadf	[X86] Enable commuting of EVEX VCMP for all immediate values during isel. llvm-svn: 372065	2019-09-17 04:40:58 +00:00
Amara Emerson	9d64721ca5	[GlobalISel] Partially revert r371901. r371901 was overeager and widenScalarDst() and the like in the legalizer attempt to increment the insert point given in order to add new instructions after the currently legalizing inst. In cases where the insertion point is not exactly the current instruction, then callers need to de-compensate for the behaviour by decrementing the insertion iterator before calling them. It's not a nice state of affairs, for now just undo the problematic parts of the change. llvm-svn: 372050	2019-09-16 23:46:03 +00:00
Lei Huang	bfb197d7a3	[PowerPC] Cust lower fpext v2f32 to v2f64 from extract_subvector v4f32 This is a follow up patch from https://reviews.llvm.org/D57857 to handle extract_subvector v4f32. For cases where we fpext of v2f32 to v2f64 from extract_subvector we currently generate on P9 the following: lxv 0, 0(3) xxsldwi 1, 0, 0, 1 xscvspdpn 2, 0 xxsldwi 3, 0, 0, 3 xxswapd 0, 0 xscvspdpn 1, 1 xscvspdpn 3, 3 xscvspdpn 0, 0 xxmrghd 0, 0, 3 xxmrghd 1, 2, 1 stxv 0, 0(4) stxv 1, 0(5) This patch custom lower it to the following sequence: lxv 0, 0(3) # load the v4f32 <w0, w1, w2, w3> xxmrghw 2, 0, 0 # Produce the following vector <w0, w0, w1, w1> xxmrglw 3, 0, 0 # Produce the following vector <w2, w2, w3, w3> xvcvspdp 2, 2 # FP-extend to <d0, d1> xvcvspdp 3, 3 # FP-extend to <d2, d3> stxv 2, 0(5) # Store <d0, d1> (%vecinit11) stxv 3, 0(4) # Store <d2, d3> (%vecinit4) Differential Revision: https://reviews.llvm.org/D61961 llvm-svn: 372029	2019-09-16 20:04:15 +00:00
Roman Lebedev	69911b8d01	[ARM][Codegen] Autogenerate arm-cgp-casts.ll test. Apparently it got broken by r372009 while i thought it was r372012. llvm-svn: 372019	2019-09-16 18:28:22 +00:00
Simon Pilgrim	3df0daddfd	[X86][AVX] matchShuffleWithSHUFPD - add support for zeroable operands Determine if all of the uses of LHS/RHS operands can be replaced with a zero vector. llvm-svn: 372013	2019-09-16 17:30:33 +00:00
David Green	8d21460dc5	[ARM] A predicate cast of a predicate cast is a predicate cast The adds some very basic folding of PREDICATE_CASTS, removing cases when they are chained together. These would already be removed eventually, as these are lowered to copies. This just allows it to happen earlier, which can help other simplifications. Differential Revision: https://reviews.llvm.org/D67591 llvm-svn: 372012	2019-09-16 17:29:07 +00:00
Oliver Cruickshank	ee6fbebbaf	[ARM] Add patterns for BSWAP intrinsic on MVE BSWAP can use the VREV instruction on MVE to produce better results than expanding. llvm-svn: 372002	2019-09-16 15:20:10 +00:00
Oliver Cruickshank	e9510a6cad	[ARM] Add patterns for bitreverse intrinsic on MVE BITREVERSE can use the VBRSR which will reverse and right shift. Shifting right by 0 will just reverse the bits. llvm-svn: 372001	2019-09-16 15:20:03 +00:00
Oliver Cruickshank	5f799ef162	[ARM] Lower CTTZ on MVE Lower CTTZ on MVE using VBRSR and VCLS which will reverse the bits and count the leading zeros, equivalent to a count trailing zeros (CTTZ). llvm-svn: 372000	2019-09-16 15:19:56 +00:00
Oliver Cruickshank	cd1a0b9271	[ARM] Add patterns for CTLZ on MVE CTLZ intrinsic can use the VCLS instruction on MVE, which produces better results than expanding. llvm-svn: 371999	2019-09-16 15:19:49 +00:00
Matt Arsenault	07b8597656	AMDGPU/GlobalISel: Fix some broken run lines llvm-svn: 371992	2019-09-16 14:14:40 +00:00
Matt Arsenault	1fc07d6648	AMDGPU/GlobalISel: Fix RegBankSelect for G_FRINT and G_FCEIL llvm-svn: 371991	2019-09-16 14:14:37 +00:00
Matt Arsenault	bf7524db35	AMDGPU/GlobalISel: Remove another illegal select test llvm-svn: 371990	2019-09-16 14:14:31 +00:00
David Green	ce7328cb61	[ARM] Fold VCMP into VPT MVE has VPT instructions, which perform the duties of both a VCMP and a VPST in a single instruction, performing the compare and starting the VPT block in one. This teaches the MVEVPTBlockPass to fold them, searching back through the basicblock for a valid VCMP and creating the VPT from its operands. There are some changes to the VPT instructions to accommodate this, altering the order of the operands to match the VCMP better, and changing P0 register defs to be VPR defs, as is used in other places. Differential Revision: https://reviews.llvm.org/D66577 llvm-svn: 371982	2019-09-16 13:02:41 +00:00
Kerry McLaughlin	e55b3bf40e	[SVE][Inline-Asm] Add constraints for SVE predicate registers Summary: Adds the following inline asm constraints for SVE: - Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive - Upa: SVE predicate register with full range, P0 to P15 Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin Reviewed By: rovka Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66524 llvm-svn: 371967	2019-09-16 09:45:27 +00:00
Sjoerd Meijer	b1e1a26e8e	[AArch64] Some more FP16 FMA pattern matching After our previous machinecombiner exercises (rL371321, rL371818, rL371833), we were still missing a few FP16 FMA patterns. Differential Revision: https://reviews.llvm.org/D67576 llvm-svn: 371960	2019-09-16 07:32:13 +00:00
Matt Arsenault	255d157672	AMDGPU/GlobalISel: Remove illegal select tests These fail in a release build. llvm-svn: 371955	2019-09-16 04:21:10 +00:00
Matt Arsenault	bc8de8a8da	AMDGPU/GlobalISel: Select SMRD loads for more types llvm-svn: 371954	2019-09-16 00:54:07 +00:00
Matt Arsenault	48b158acae	AMDGPU/GlobalISel: RegBankSelect for kill llvm-svn: 371953	2019-09-16 00:48:37 +00:00
Matt Arsenault	01c7f40de3	AMDGPU/GlobalISel: Legalize s1 source G_[SU]ITOFP llvm-svn: 371952	2019-09-16 00:37:10 +00:00
Matt Arsenault	60169ed613	AMDGPU/GlobalISel: Set type on vgpr live in special arguments Fixes assertion with workitem ID intrinsics used in non-kernel functions. llvm-svn: 371951	2019-09-16 00:33:00 +00:00
Matt Arsenault	9f52c1ea58	AMDGPU/GlobalISel: Select S16->S32 fptoint llvm-svn: 371950	2019-09-16 00:32:56 +00:00
Matt Arsenault	0a6123595f	AMDGPU/GlobalISel: Select s32->s16 G_[US]ITOFP llvm-svn: 371949	2019-09-16 00:29:12 +00:00
Matt Arsenault	f5d5cd205e	AMDGPU/GlobalISel: Fix VALU s16 fneg llvm-svn: 371948	2019-09-16 00:20:54 +00:00
Jinsong Ji	07d824a7c3	[PowerPC][NFC] Add a testcase for fdiv expansion. Pre-commit for following patch. llvm-svn: 371938	2019-09-15 20:02:25 +00:00
David Green	b325c05732	[ARM] Masked loads and stores Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc, and so is currently behind an option. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. Differential Revision: https://reviews.llvm.org/D67186 llvm-svn: 371932	2019-09-15 14:14:47 +00:00
David Green	06b309d527	[ARM] Simplify and update vmla test. NFC llvm-svn: 371930	2019-09-15 11:53:05 +00:00
Simon Pilgrim	b743e94cdc	[TargetLowering] SimplifyDemandedBits - add EXTRACT_SUBVECTOR support. Call SimplifyDemandedBits on the source vector. llvm-svn: 371923	2019-09-14 16:38:26 +00:00
Thomas Lively	ae530c5c80	[WebAssembly] Narrowing and widening SIMD ops Summary: Implements target-specific LLVM intrinsics and clang builtins for these new SIMD operations, as described at https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#integer-to-integer-narrowing. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D67425 llvm-svn: 371906	2019-09-13 22:54:41 +00:00

1 2 3 4 5 ...

30684 Commits