Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.
Differential Revision: https://reviews.llvm.org/D88540
The insertion of waterfall loops splits the current basic block into
three blocks. So the basic block that we iterate over must be updated.
This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for
divergent calls in branches before.
Differential Revision: https://reviews.llvm.org/D90596
Extend loadSRsrcFromVGPR to allow moving a range of instructions into
the loop. The call instruction is surrounded by copies into physical
registers which should be part of the waterfall loop.
Differential Revision: https://reviews.llvm.org/D88291
The addend in a REL32 reloc needs to be adjusted to account for the
offset from the PC value returned by the s_getpc instruction to the
point where the reloc is applied. This was being done correctly for
(GOTPC)REL32_LO but not for (GOTPC)REL32_HI. This will only make a
difference if the target symbol happens to get loaded almost exactly
a multiple of 4G away from the relocated instructions.
Differential Revision: https://reviews.llvm.org/D86938
The default constructor wasn't setting isSet o the ArgDescriptor, so
while these had the value set, they were treated as missing. This only
ended up mattering in the indirect call case (and for regular calls in
GlobalISel, which current doesn't have a way to support the variable
ABI).
This will likely introduce catastrophic performance regressions on
older subtargets, but should be correct. A follow up change will
remove the old fp32-denormals subtarget features, and switch to using
the new denormal-fp-math/denormal-fp-math-f32 attributes. Frontends
should be making sure to add the denormal-fp-math-f32 attribute when
appropriate to avoid performance regressions.
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us to removes the scratch wave
offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138
This isn't really usable, and requires using the
-amdgpu-fixed-function-abi flag to work.
Assumes a uniform call target, and will hit a verifier error if the
call target ends up in a VGPR. Also doesn't attempt to do anything
sensible for the reported register/stack usage.