This switches to the workaround that HSA defaults to
for the mesa path.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <vedran@miletic.net>
llvm-svn: 292982
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1
This makes one particular compute shader 8x faster.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D21136
llvm-svn: 272556
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,
The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewers: mareko, arsenm, nhaehnle, tstellarAMD
Subscribers: qcolombet, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18941
Patch By: Bas Nieuwenhuizen
llvm-svn: 266336
This makes it possible to distinguish between mesa shaders
and other kernels even in the presence of compute shaders.
Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Differential Revision: http://reviews.llvm.org/D18559
llvm-svn: 265589
Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.
llvm-svn: 260651
Due to the SGPR init bug, every program claims to use the same number
of SGPRs anyway, so there's no point in trying to shift those registers
down from their initial spot of reservation.
Add a test that uses VGPR spilling and blocks most SGPRs from being used for
the scratch resource register. Previously, this would run into an assertion.
Differential Revision: http://reviews.llvm.org/D15724
llvm-svn: 256870
If we know we have stack objects, we reserve the registers
that the private buffer resource and wave offset are passed
and use them directly.
If not, reserve the last 5 SGPRs just in case we need to spill.
After register allocation, try to pick the next available registers
instead of the last SGPRs, and then insert copies from the inputs
to the reserved registers in the progloue.
This also only selectively enables all of the input registers
which are really required instead of always enabling them.
llvm-svn: 254331
It does not work because of emergency stack slots.
This pass was supposed to eliminate dummy registers for the
spill instructions, but the register scavenger can introduce
more during PrologEpilogInserter, so some would end up
left behind if they were needed.
The potential for spilling the scratch resource descriptor
and offset register makes doing something like this
overly complicated. Reserve registers to use for the resource
descriptor and use them directly in eliminateFrameIndex.
Also removes creating another scratch resource descriptor
when directly selecting scratch MUBUF instructions.
The choice of which registers are reserved is temporary.
For now it attempts to pick the next available registers
after the user and system SGPRs.
llvm-svn: 254329