Commit Graph

3 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin 038d884a50 [AMDGPU] Use flat scratch instructions where available
The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.

At the very least missing:

- GlobalISel;
- Some optimizations in frame elimination in between vector
  and scalar ALU;
- It shall finally allow to always materialize frame index
  as an SGPR, but that is not implemented and frame elimination
  cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
  is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
  expression to recover the unswizzled scratch address;

Differential Revision: https://reviews.llvm.org/D89170
2020-10-26 14:40:42 -07:00
Carl Ritson 4a7de36afc [AMDGPU] Avoid use of V_READLANE into EXEC in SGPR spills
Always prefer to clobber input SGPRs and restore them after the
spill.  This applies to both spills to VGPRs and scratch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D81914
2020-06-20 12:10:47 +09:00
Carl Ritson da33c96d47 [AMDGPU] Make SGPR spills exec mask agnostic
Explicitly set the exec mask for SGPR spills and reloads.
This fixes a bug where SGPR spills to memory could be incorrect
if the exec mask was 0 (or differed between spill and reload).

Additionally pack scalar subregisters (upto 16/32 per VGPR),
so that the majority of scalar types can be spilt or reloaded
with a simple memory access.  This should amortize some of the
additional overhead of manipulating the exec mask.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D80282
2020-06-03 12:34:26 +09:00