These two instructions are normally selected, but when the
two address pass converts mac into mad we end up with the
mad where we could have one of these.
Differential Revision: https://reviews.llvm.org/D37389
llvm-svn: 312928
An encoding does not allow to use SDWA in an instruction with
scalar operands, either literals or SGPRs. That is however possible
to copy these operands into a VGPR first.
Several copies of the value are produced if multiple SDWA conversions
were done. To cleanup MachineLICM (to hoist copies out of loops),
MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace
SGPR to VGPR copy with immediate copy right to the VGPR) runs are added
after the SDWA pass.
Differential Revision: https://reviews.llvm.org/D33583
llvm-svn: 304219
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
This switches to the workaround that HSA defaults to
for the mesa path.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <vedran@miletic.net>
llvm-svn: 292982
Because of the special immediate operand, the constant
bus is already used so SGPRs are never useful.
r263212 changed the name of the immediate operand, which
broke the verifier check for the restriction.
llvm-svn: 274564
Summary:
This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.
This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.
shader-db stats:
2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)
Reviewers: nhaehnle, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18451
llvm-svn: 264589
The two-address instruction pass will convert these back to v_mad_f32
if necessary.
Differential Revision: http://reviews.llvm.org/D11060
llvm-svn: 242038