llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	53eb0f8c07	[AMDGPU] Attempt to reschedule withou clustering We want to have more load/store clustering but we also want to maintain low register pressure which are oposit targets. Allow scheduler to reschedule regions without mutations applied if we hit a register limit. Differential Revision: https://reviews.llvm.org/D73386	2020-01-27 10:27:16 -08:00
Matt Arsenault	97711228fd	AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.load.format	2020-01-27 13:23:35 -05:00
Matt Arsenault	ce7ca2caf2	AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.load	2020-01-27 13:05:55 -05:00
Matt Arsenault	198624c39d	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load.format	2020-01-27 13:02:19 -05:00
Matt Arsenault	fc90222a91	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load Use intermediate instructions, unlike with buffer stores. This is necessary because of the need to have an internal way to distinguish between signed and unsigned extloads. This introduces some duplication and near duplication with the buffer store selection path. The store handling should maybe be moved into legalization to match and eliminate the duplication.	2020-01-27 12:49:23 -05:00
Matt Arsenault	e60d658260	AMDGPU/GlobalISel: Handle VOP3NoMods	2020-01-27 09:03:44 -08:00
Matt Arsenault	0968234590	AMDGPU/GlobalISel: Minor refactor of MUBUF complex patterns This will make it easier to support the small variants in the complex patterns for atomics.	2020-01-27 09:00:00 -08:00
Matt Arsenault	bef27175c7	AMDGPU: Fix not using f16 fsin/fcos I noticed this because this accidentally started working for GlobalISel.	2020-01-27 08:59:59 -08:00
Simon Pilgrim	2d5e281b0f	[X86][AVX] Add a more aggressive SimplifyMultipleUseDemandedBits to simplify masked store masks. Fixes a poor codegen issue noticed in PR11210.	2020-01-27 16:44:25 +00:00
Matt Arsenault	a1d33ce73a	AMDGPU/GlobalISel: Custom legalize v2s16 G_SHUFFLE_VECTOR Try to keep simple v2s16 cases as-is. This will more naturally map to how the VOP3P op_sel modifiers work compared to the expansion involving bitcasts and bitshifts. This could maybe try harder with wider source vector types, although that could be handled with a pre-legalize combine.	2020-01-27 08:28:05 -08:00
Matt Arsenault	4e69df091d	Revert "AMDGPU: Temporary drop s_mul_hi_i/u32 patterns" This reverts commit `fe23ed2c68`. It was never really clear this was responsible for the performance regressions that caused this to be reverted. It's been a long time, and we need to have scalar patterns for this to get GlobalISel working.	2020-01-27 08:07:21 -08:00
Matt Arsenault	bc3d900fa5	AMDGPU/GlobalISel: Fix not using global atomics on gfx9+ For some reason the flat/global atomics end up in the generated matcher table in a different order from SelectionDAG. Use AddedComplexity to prefer checking for global atomics first.	2020-01-27 07:42:42 -08:00
Matt Arsenault	ac0b9b4ccf	AMDPGPU/GlobalISel: Select more MUBUF global addressing modes The handling of the high bits of the resource descriptor seem weird to me, where the 3rd dword changes based on the instruction.	2020-01-27 07:28:36 -08:00
Matt Arsenault	fdaad485e6	AMDGPU/GlobalISel: Initial selection of MUBUF addr64 load/store Fixes the main reason for compile failures on SI, but doesn't really try to use the addressing modes yet.	2020-01-27 07:13:56 -08:00
Matt Arsenault	2214bc81d0	AMDGPU: Allow i16 shader arguments Not allowing this just creates unnecessary complications when writing simple tests.	2020-01-27 06:55:32 -08:00
Jay Foad	1bf00219fc	[AMDGPU] Handle multiple base operands in areMemAccessesTriviallyDisjoint Summary: This is in preparation for getMemOperandsWithOffset returning more base operands. Depends on D73455. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73456	2020-01-27 14:45:21 +00:00
Jay Foad	6461eadf8f	[AMDGPU] Handle multiple base operands in shouldClusterMemOps Summary: This is in preparation for getMemOperandsWithOffset returning more base operands. Depends on D73454. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73455	2020-01-27 14:45:21 +00:00
Jay Foad	fcf5254fa7	[AMDGPU] Handle frame index base operands in memOpsHaveSameBasePtr Summary: This is in preparation for getMemOperandsWithOffset returning more base operands. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, arphaman, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73454	2020-01-27 14:45:21 +00:00
vpykhtin	4332f1a4c8	[AMDGPU] Fix GCN regpressure trackers for INLINEASM instructions. Differential revision: https://reviews.llvm.org/D73338	2020-01-27 17:25:25 +03:00
Matt Arsenault	2a160ba5b0	GlobalISel: Reimplement widenScalar for G_UNMERGE_VALUES results Only use shifts if the requested type exactly matches the source type, and create sub-unmerges otherwise.	2020-01-27 06:18:26 -08:00
David Green	8a6b948eb5	[MVE] Fixup order of gather writeback intrinsic outputs The MVE_VLDRWU32_qi_pre gather loads, like the other _pre/_post mve loads returns the writeback as result 0, the value as result 1. The llvm ir intrinsic seems to have this the other way around though, and so when lowering from one to the other we need to switch the first two outputs. I've also fixed up the types of _pre/_post on normal MVE loads. There we were already getting the values the right way around, just not for the types. I don't believe this was causing anything to go wrong, but it was very confusing to read in the debug output. Differential Revision: https://reviews.llvm.org/D73370	2020-01-27 14:08:06 +00:00
Sjoerd Meijer	b567ff2fa0	[ARM][MVE] Tail-predication: support constant trip count We had support for runtime trip count values, but not constants, and this adds supports for that. And added a minor optimisation while I was add it: don't invoke Cleanup when there's nothing to clean up. Differential Revision: https://reviews.llvm.org/D73198	2020-01-27 11:05:26 +00:00
Sam Parker	6c2df5d14f	[ARM][LowOverheadLoops] Dont ignore VCTP When expanding the LoopStart, we try to remove the iteration count calculation. However, if part of the calculation was also used to calculate the number of elements we could end up deleting instructions that were required to feed DLSTP/WLSTP. Differential Revision: https://reviews.llvm.org/D73275	2020-01-27 10:59:12 +00:00
Petar Avramovic	cbf03aee6d	[MIPS GlobalISel] Select population count (popcount) G_CTPOP is generated from llvm.ctpop.<type> intrinsics, clang generates these intrinsics from __builtin_popcount and __builtin_popcountll. Add lower and narrow scalar for G_CTPOP. Lower G_CTPOP for MIPS32. Differential Revision: https://reviews.llvm.org/D73216	2020-01-27 09:59:50 +01:00
Petar Avramovic	8bc7ba5b9e	[MIPS GlobalISel] Select count trailing zeros llvm.cttz.<type> intrinsic has additional i1 argument is_zero_undef, it tells whether zero as the first argument produces a defined result. G_CTTZ is generated from llvm.cttz.<type> (<type> <src>, i1 false) intrinsics, clang generates these intrinsics from __builtin_ctz and __builtin_ctzll. G_CTTZ_ZERO_UNDEF comes from llvm.cttz.<type> (<type> <src>, i1 true). Clang generates such intrinsics as parts of expansion of builtin_ffs and builtin_ffsll. It is also traditionally part of and many algorithms that are now predicated on avoiding zero-value inputs. Add narrow scalar (algorithm uses G_CTTZ_ZERO_UNDEF) for G_CTTZ. Lower G_CTTZ and G_CTTZ_ZERO_UNDEF for MIPS32. Differential Revision: https://reviews.llvm.org/D73215	2020-01-27 09:51:06 +01:00
Petar Avramovic	2b66d32f3f	[MIPS GlobalISel] Select count leading zeros llvm.ctlz.<type> intrinsic has additional i1 argument is_zero_undef, it tells whether zero as the first argument produces a defined result. MIPS clz instruction returns 32 for zero input. G_CTLZ is generated from llvm.ctlz.<type> (<type> <src>, i1 false) intrinsics, clang generates these intrinsics from __builtin_clz and __builtin_clzll. G_CTLZ_ZERO_UNDEF can also be generated from llvm.ctlz with true as second argument. It is also traditionally part of and many algorithms that are now predicated on avoiding zero-value inputs. Add narrow scalar for G_CTLZ (algorithm uses G_CTLZ_ZERO_UNDEF). Lower G_CTLZ_ZERO_UNDEF and select G_CTLZ for MIPS32. Differential Revision: https://reviews.llvm.org/D73214	2020-01-27 09:43:38 +01:00
Roman Lebedev	76fcf900d5	[X86][BdVer2] Polish LEA instruction scheduling info Based on exhaustive llvm-exegesis measurements. There may still be some imperfections for LEA16r/LEA32r. Much like was observed in D68646, i'm also measuring some outliers with some specific registers.	2020-01-26 22:17:27 +03:00
Simon Pilgrim	fa19d67a2a	[X86][AVX] Extend combineCommutableSHUFP to handle v8f32 and v16f32 commutable shufps patterns	2020-01-26 19:04:12 +00:00
Simon Pilgrim	1a81b296cd	[X86][SSE] combineCommutableSHUFP - permilps(shufps(load(),x)) --> permilps(shufps(x,load())) Pull out combineTargetShuffle code added in rG3fd5d1c6e7db into a helper function and extend it to handle shufps(shufps(load(),x),y) and shufps(y,shufps(load(),x)) cases as well.	2020-01-26 14:36:23 +00:00
Maheaha Shivamallappa	66f93071cd	AMDGPU/GlobalISel: Clean-up code around ISel for Intrinsics. Summary: A minor code clean-up around ISel for intrinsic llvm.amdgcn.end.cf() Reviewers: arsenm, mshivama Reviewed By: arsenm Tags: #llvm Differential Revision: https://reviews.llvm.org/D73358	2020-01-26 14:09:31 +05:30
Craig Topper	3fdd435a4b	[X86] Use a macro to convert X86ISD names to strings in getTargetNodeName. Every case in the switch had a string version of themselves. Two of them had a typo that used : instead of :: By using a macro we can automate the string creation and avoid the possibility of typos like this. This is similar to what is done on the AMDGPU target.	2020-01-25 18:27:29 -08:00
Tom Stellard	cb297050bb	AMDGPU/SILoadStoreOptimizer: Fix uninitialized variable error This was introduced by `86c944d790` and caught by the sanitizer-x86_64-linux-fast bot.	2020-01-24 21:53:05 -08:00
Tom Stellard	86c944d790	AMDGPU/SILoadStoreOptimizer: Improve merging of out of order offsets Summary: This improves merging of sequences like: store a, ptr + 4 store b, ptr + 8 store c, ptr + 12 store d, ptr + 16 store e, ptr + 20 store f, ptr Prior to this patch the basic block was scanned in order to find instructions to merge and the above sequence would be transformed to: store4 <a, b, c, d>, ptr + 4 store e, ptr + 20 store r, ptr With this change, we now sort all the candidate merge instructions by their offset, so instructions are visited in offset order rather than in the order they appear in the basic block. We now transform this sequnce into: store4 <f, a, b, c>, ptr store2 <d, e>, ptr + 16 Another benefit of this change is that since we have sorted the mergeable lists by offset, we can easily check if an instruction is mergeable by checking the offset of the instruction that becomes before or after it in the sorted list. Once we determine an instruction is not mergeable we can remove it from the list and avoid having to do the more expensive mergeablilty checks. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Reviewed By: arsenm, nhaehnle Subscribers: kerbowa, merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65966	2020-01-24 19:45:56 -08:00
@justice_adams (Justice Adams)	daee63f974	[SelectionDag] Updated FoldConstantArithmetic method signature in preparation for merge with FoldConstantVectorArithmetic Updated FoldConstantArithmetic method signature to match that of FoldConstantVectorArithmetic in preparation for merging the two functions together https://bugs.llvm.org/show_bug.cgi?id=36544 This is the first step in combining the various FoldConstantVectorArithmetic and FoldConstantVectorArithmetic functions into one FoldConstantArithmetic function. Differential Revision: https://reviews.llvm.org/D72870	2020-01-24 18:00:58 -05:00
Craig Topper	2c1decc040	[X86] Break the loop in LowerReturn into 2 loops. NFCI I believe for STRICT_FP I need to use a STRICT_FP_EXTEND for the extending to f80 for returning f32/f64 in 32-bit mode when SSE is enabled. The STRICT_FP_EXTEND node requires a Chain. I need to get that node onto the chain before any CopyToRegs are emitted. This is because all the CopyToRegs are glued and chained together. So I can't put a STRICT_FP_EXTEND on the chain between the glued nodes without also glueing the STRICT_ FP_EXTEND. This patch moves all the extend creation to a first pass and then creates the copytoregs and fills out RetOps in a second pass. Differential Revision: https://reviews.llvm.org/D72665	2020-01-24 14:44:38 -08:00
Roman Lebedev	70cbf8c71c	[X86] Make `llc --help` output readable again Long `cl::value_desc()` is added right after the flag name, before `cl::desc()` column. And thus the `cl::desc()` column, for all flags, is padded to the right, which makes the output unreadable.	2020-01-25 01:43:52 +03:00
Heejin Ahn	65eb11306e	[WebAssembly] Update bleeding-edge CPU features Summary: This adds bulk memory and tail call to "bleeding-edge" CPU, since their implementation in LLVM/clang seems mostly complete. Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D73322	2020-01-24 14:27:35 -08:00
Heejin Ahn	764f4089e8	[WebAssembly] Add reference types target feature Summary: This adds the reference types target feature. This does not enable any more functionality in LLVM/clang for now, but this is necessary to embed the info in the target features section, which is used by Binaryen and Emscripten. It turned out that after D69832 `-fwasm-exceptions` crashed because we didn't have the reference types target feature. Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73320	2020-01-24 14:26:27 -08:00
Matt Arsenault	3b93945587	AMDGPU/GlobalISel: Select wqm, softwqm and wwm intrinsics	2020-01-24 13:06:44 -08:00
Matt Arsenault	87c46a3129	AMDGPU: Don't error on ds.ordered intrinsic in function These should be assumed to be called from a compute context. Also don't use a 2 entry switch over constants.	2020-01-24 13:06:44 -08:00
Stanislav Mekhanoshin	be8e38cbd9	Correct NumLoads in clustering Scheduler sends NumLoads argument into shouldClusterMemOps() one less the actual cluster length. So for 2 instructions it will pass just 1. Correct this number. This is NFC for in tree targets. Differential Revision: https://reviews.llvm.org/D73292	2020-01-24 12:45:28 -08:00
Matt Arsenault	84e035d8f1	AMDGPU: Don't check constant address space for atomic stores We define a separate list for storable address spaces. This saves entry in the matcher table address space list.	2020-01-24 12:15:09 -08:00
Stanislav Mekhanoshin	555d8f4ef5	[AMDGPU] Bundle loads before post-RA scheduler We are relying on atrificial DAG edges inserted by the MemOpClusterMutation to keep loads and stores together in the post-RA scheduler. This does not work all the time since it allows to schedule a completely independent instruction in the middle of the cluster. Removed the DAG mutation and added pass to bundle already clustered instructions. These bundles are unpacked before the memory legalizer because it does not work with bundles but also because it allows to insert waitcounts in the middle of a store cluster. Removing artificial edges also allows a more relaxed scheduling. Differential Revision: https://reviews.llvm.org/D72737	2020-01-24 11:33:38 -08:00
Stanislav Mekhanoshin	44b865fa7f	[AMDGPU] Allow narrowing muti-dword loads Currently BE allows only a little load narrowing because of the fear it will produce sub-dword ext loads. However, we can always allow narrowing if we are shrinking one multi-dword load to another multi-dword load. In particular we were unable to reduce s_load_dwordx8 into s_load_dwordx4 if identity shuffle was used to extract low 4 dwords. Differential Revision: https://reviews.llvm.org/D73133	2020-01-24 11:03:41 -08:00
Austin Kerbow	c226646337	Resubmit: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI Summary: Enable the new diveregence analysis by default for AMDGPU. Resubmit with test updates since GPUDA was causing failures on Windows. Reviewers: rampitec, nhaehnle, arsenm, thakis Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73315	2020-01-24 10:39:40 -08:00
Yuta Saito	c5bd3d0726	Support Swift calling convention for WebAssembly targets This adds basic support for the Swift calling convention with WebAssembly targets. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D71823	2020-01-24 10:30:46 -08:00
David Green	b535aa405a	[ARM] Use reduction intrinsics for larger than legal reductions The codegen for splitting a llvm.vector.reduction intrinsic into parts will be better than the codegen for the generic reductions. This will only directly effect when vectorization factors are specified by the user. Also added tests to make sure the codegen for larger reductions is OK. Differential Revision: https://reviews.llvm.org/D72257	2020-01-24 17:07:24 +00:00
Kazushi (Jam) Marukawa	0fca35c652	[VE] global variable isel patterns Summary: Asm expr fixups, isel patterns and tests for global variables addresses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73355	2020-01-24 17:35:14 +01:00
Simon Pilgrim	3fd5d1c6e7	[X86][SSE] combineTargetShuffle - permilps(shufps(load(),x)) --> permilps(shufps(x,load())) Moves lowerShuffleWithSHUFPS commutation code from rG30fcd29fe479 to catch cases during combine	2020-01-24 15:23:20 +00:00
Kazushi (Jam) Marukawa	08ebd8c79e	[VE] aligned load/store isel patterns Summary: Aligned load/store isel patterns and tests for i1/i8/16/32/64 (including extension and truncation) and fp32/64. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73276	2020-01-24 15:16:54 +01:00

1 2 3 4 5 ...

55651 Commits