llvm-project

Commit Graph

Author	SHA1	Message	Date
Carl Ritson	5bf2a9dd40	[AMDGPU] Update VMEM scalar write hazard mitigation sequence Using s_waitcnt_depctr 0xffe3 is potentially faster than v_nop. Reviewed By: rampitec, foad Differential Revision: https://reviews.llvm.org/D83872	2020-07-16 11:37:45 +09:00
Matt Arsenault	b27a538dda	AMDGPU: Fix illegally constant folding from V_MOV_B32_sdwa This was assumed to be a simple move, and interpreting the immediate modifier operand as a materialized immediate. Apparently the SDWA pass never produces these, but GlobalISel does emit these for some vector shuffles.	2020-05-18 15:34:33 -04:00
Sebastian Neubauer	5d3a69feca	[AMDGPU] New llvm.amdgcn.ballot intrinsic Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its boolean argument in all active lanes, and zero in all inactive lanes. This is intended to replace the existing llvm.amdgcn.icmp and llvm.amdgcn.fcmp intrinsics after a suitable transition period. Use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D65088	2020-03-31 10:35:39 +02:00
Sebastian Neubauer	2f857eadf5	[AMDGPU] Use script to generate atomic optimizations test This is a preparation for introducing a llvm.amdgcn.ballot intrinsic in D65088.	2020-03-11 09:59:36 +01:00
cdevadas	e53a9d96e6	Resubmit: [AMDGPU] Invert the handling of skip insertion. The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092	2020-01-22 13:18:32 +09:00
Nicolai Hähnle	a80291ce10	Revert "[AMDGPU] Invert the handling of skip insertion." This reverts commit `0dc6c249bf`. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa.	2020-01-21 09:17:25 +01:00
Matt Arsenault	8945b23af5	AMDGPU: Update more tests to use modern buffer intrinsics	2020-01-16 14:29:38 -05:00
cdevadas	0dc6c249bf	[AMDGPU] Invert the handling of skip insertion. The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092	2020-01-15 15:18:16 +05:30
Jay Foad	eac23862a8	[AMDGPU] gfx10 atomic optimizer changes. Summary: Add support for gfx10, where all DPP operations are confined to work within a single row of 16 lanes, and wave32. Reviewers: arsenm, sheredom, critson, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65644 llvm-svn: 369745	2019-08-23 10:07:43 +00:00
Jay Foad	27ec195f39	[AMDGPU] Fix DPP combiner check for exec modification Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks all instructions in the basic block, even beyond the last use. That meant that the DPP combiner no longer worked in any basic block that ended with a control flow instruction, and in particular it didn't work on code sequences generated by the atomic optimizer. Fix it by reinstating the old behaviour but in a new helper function execMayBeModifiedBeforeAnyUse, and limiting the number of instructions scanned. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64393 llvm-svn: 365910	2019-07-12 15:59:40 +00:00
Neil Henning	8c10fa1a90	[AMDGPU] Fix DPP sequence in atomic optimizer. This commit fixes the DPP sequence in the atomic optimizer (which was previously missing the row_shr:3 step), and works around a read_register exec bug by using a ballot instead. Differential Revision: https://reviews.llvm.org/D57737 llvm-svn: 353703	2019-02-11 14:44:14 +00:00
Neil Henning	233a02d0ed	[AMDGPU] Fix the new atomic optimizer in pixel shaders. The new atomic optimizer I previously added in D51969 did not work correctly when a pixel shader was using derivatives, and had helper lanes active. To fix this we add an llvm.amdgcn.ps.live call that guards a branch around the entire atomic operation - ensuring that all helper lanes are inactive within the wavefront when we compute our atomic results. I've added a test case that can cause derivatives, and exposes the problem. Differential Revision: https://reviews.llvm.org/D53930 llvm-svn: 346128	2018-11-05 12:04:48 +00:00

12 Commits