llvm-project

Commit Graph

Author	SHA1	Message	Date
hsmahesha	33fd4a18e7	[AMDGPU/MemOpsCluster] Clean-up fixme's around mem ops clustering logic Get rid of all fixmes and base heuristic on `num-clustered-dwords`. The main intuition behind this is as follows. The existing heuristic roughly summarizes as below: * Assume, all the mem ops instructions participating in the clustering process, loads/stores same num bytes * If num bytes loaded by each mem op is 4 bytes, then cluster at max 5 mem ops, that is at max 20 bytes * If num bytes loaded by each mem op is 8 bytes, then cluster at max 3 mem ops, that is at max 24 bytes * If num bytes loaded by each mem op is 16 bytes, then cluster at max 2 mem ops, that is at max 32 bytes So, we need to make sure that the new heuristic do not completey deviate away from the above one, and it properly handles both the sub-word loads and the wide loads. Reviewed By: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D84354	2020-07-30 21:41:13 +05:30
Jay Foad	62fd7f767c	[MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics tryLatency compares two sched candidates. For the top zone it prefers the one with lesser depth, but only if that depth is greater than the total latency of the instructions we've already scheduled -- otherwise its latency would be hidden and there would be no stall. Unfortunately it only tests the depth of one of the candidates. This can lead to situations where the TopDepthReduce heuristic does not kick in, but a lower priority heuristic chooses the other candidate, whose depth is greater than the already scheduled latency, which causes a stall. The fix is to apply the heuristic if the depth of either candidate is greater than the already scheduled latency. All this also applies to the BotHeightReduce heuristic in the bottom zone. Differential Revision: https://reviews.llvm.org/D72392	2020-07-17 11:02:13 +01:00
Jay Foad	5ab2e14d31	[AMDGPU] Fix typos in performCtlz_CttzCombine() Fix two obvious errors in the code and also update the test check. Also add one test to catch the failure. Patch by Ruiling Song! Differential Revision: https://reviews.llvm.org/D83280	2020-07-14 10:18:18 +01:00
Piotr Sobczak	0045786f14	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Re-commit D81925 with a bugfix D82370. Differential Revision: https://reviews.llvm.org/D81925 Differential Revision: https://reviews.llvm.org/D82370	2020-06-25 10:38:23 +02:00
Piotr Sobczak	6d9565d6d5	Revert "[AMDGPU] Select s_cselect" This caused some failures detected by the buildbot with expensive checks enabled. This reverts commit `4067de569f`.	2020-06-19 16:41:04 +02:00
Piotr Sobczak	4067de569f	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81925	2020-06-19 16:17:46 +02:00
Matt Arsenault	7af7b96a9b	AMDGPU: Move R600 test compatability hack Instead of handling the r600 intrinsics on amdgcn, handle the amdgcn intrinsics on r600.	2020-02-10 10:02:06 -08:00
Craig Topper	2af1640f9a	[LegalizeDAG][X86][AMDGPU] Use ANY_EXTEND instead of ZERO_EXTEND when promoting ISD::CTTZ/CTTZ_ZERO_UNDEF. Summary: For CTTZ we place a set bit just past where the non-promoted type stopped so the extended bits won't be used for the count. For CTTZ_ZERO_UNDEF we don't care what happens if no bits are set in the original type and we end up counting into the extended bits. So we can just use ANY_EXTEND for both cases. This matches what is done in type legalization for these operations. We make no effort to force the upper bits to zero. Differential Revision: https://reviews.llvm.org/D74111	2020-02-07 22:25:56 -08:00
Jay Foad	2252cac694	[ANDGPU] getMemOperandsWithOffset: support BUF non-stack-access instructions with resource but no vaddr Summary: This enables clustering for many more BUF instructions. Reviewers: rampitec, arsenm, nhaehnle Subscribers: jvesely, wdng, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73868	2020-02-03 22:49:30 +00:00
Craig Topper	58ecffd857	[DAGCombiner][AMDGPU][X86] Turn cttz/ctlz into cttz_zero_undef/ctlz_zero_undef if we can prove the input is never zero X86 currently has a late DAG combine after cttz/ctlz are turned into BSR+BSF+CMOV to detect this and remove the CMOV. But we should be able to do this much earlier and avoid creating the cmov all together. For the changed AMDGPU test case it appears that previously the i8 cttz was type legalized to i16 which introduced an OR with 256 in order to limit the result to 8 on the widened type. At this point the result is known to never be zero, but nothing checked that. Then operation legalization is told to promote all i16 cttz to i32. This introduces an extend and a truncate and another OR with 65536 to limit the result to 16. With the DAG combiner change we are able to prevent the creation of the second OR since the opcode will have been changed to cttz_zero_undef after the first OR. I the lack of the OR caused the instruction to change to v_ffbl_b32_sdwa Differential Revision: https://reviews.llvm.org/D42985 llvm-svn: 324427	2018-02-06 23:54:37 +00:00
Wei Ding	7ab1f7a421	AMDGPU : Fix an error for the llvm.cttz implementation. Differential Revision: http://reviews.llvm.org/D39014 llvm-svn: 316037	2017-10-17 21:49:52 +00:00
Wei Ding	5676acad9e	Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ. Differential Revision: http://reviews.llvm.org/D37348 llvm-svn: 315610	2017-10-12 19:37:14 +00:00
Alexander Timofeev	982aee6a38	[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097	2017-07-04 17:32:00 +00:00
NAKAMURA Takumi	e4a741376b	Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default" It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054	2017-07-04 02:14:18 +00:00
Alexander Timofeev	ea7f08bee5	[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026	2017-07-03 14:54:11 +00:00
Matt Arsenault	3dbeefa978	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444	2017-03-21 21:39:51 +00:00
Matt Arsenault	7aad8fd8f4	Enable FeatureFlatForGlobal on Volcanic Islands This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982	2017-01-24 22:02:15 +00:00
Tom Stellard	45bb48ea19	R600 -> AMDGPU rename llvm-svn: 239657	2015-06-13 03:28:10 +00:00

18 Commits