The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.
At the very least missing:
- GlobalISel;
- Some optimizations in frame elimination in between vector
and scalar ALU;
- It shall finally allow to always materialize frame index
as an SGPR, but that is not implemented and frame elimination
cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
expression to recover the unswizzled scratch address;
Differential Revision: https://reviews.llvm.org/D89170
In two instances of CreateStackTemporary we are sometimes promoting
alignments beyond the stack alignment. I have introduced a new function
called getReducedAlign that will return the alignment for the broken
down parts of illegal vector types. For example, on NEON a <32 x i8>
type is made up of two <16 x i8> types - in this case the sensible
alignment is 16 bytes, not 32.
In the legalization code wherever we create stack temporaries I have
started using the reduced alignments instead for illegal vector types.
I added a test to
CodeGen/AArch64/build-one-lane.ll
that tries to insert an element into an illegal fixed vector type
that involves creating a temporary stack object.
Differential Revision: https://reviews.llvm.org/D80370
Even though series of cmd/cndmask can produce quite a lot of
code that is still better than a loop. In case of doubles we
would even produce two loops.
Differential Revision: https://reviews.llvm.org/D80032
Since SRSRC has alignment requirements, first find non GIT pointer clobbered
registers for SRSRC and then if those registers clobber preloaded Scratch Wave
Offset register, copy the Scratch Wave Offset register to a free SGPR.
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us to removes the scratch wave
offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138
We are relying on atrificial DAG edges inserted by the
MemOpClusterMutation to keep loads and stores together in the
post-RA scheduler. This does not work all the time since it
allows to schedule a completely independent instruction in the
middle of the cluster.
Removed the DAG mutation and added pass to bundle already
clustered instructions. These bundles are unpacked before the
memory legalizer because it does not work with bundles but also
because it allows to insert waitcounts in the middle of a store
cluster.
Removing artificial edges also allows a more relaxed scheduling.
Differential Revision: https://reviews.llvm.org/D72737
Summary:
The stride should depend on the wave size, not the hardware generation.
Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be
relevant.
Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6
Reviewers: arsenm, rampitec, mareko
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63808
llvm-svn: 364788
We use both -long-option and --long-option in tests. Switch to --long-option for consistency.
In the "llvm-readelf" mode, -long-option is discouraged as it conflicts with grouped short options and it is not accepted by GNU readelf.
While updating the tests, change llvm-readobj -s to llvm-readobj -S to reduce confusion ("s" is --section-headers in llvm-readobj but --symbols in llvm-readelf).
llvm-svn: 359649
Summary:
Using HI here makes no logical sense, since the dword is only
32 bits to begin with.
Current Mesa master does not look at the relocation type at all,
so this change is fine. Future Mesa will rely on this, however.
Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1
Reviewers: arsenm, rampitec, mareko
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D55369
llvm-svn: 349620
This feature is only relevant to shaders, and is no longer used. When disabled,
lowering of reserved registers for shaders causes a compiler crash.
Remove the feature and add a test for compilation of shaders at OptNone.
Differential Revision: https://reviews.llvm.org/D53829
llvm-svn: 345763
Two issues found when doing codegen for splitting vector with non-zero alloca addr space:
DAGTypeLegalizer::SplitVecRes_INSERT_VECTOR_ELT/SplitVecOp_EXTRACT_VECTOR_ELT uses dummy pointer info for creating
SDStore. Since one pointer operand contains multiply and add, InferPointerInfo is unable to
infer the correct pointer info, which ends up with a dummy pointer info for the target to lower
store and results in isel failure. The fix is to introduce MachinePointerInfo::getUnknownStack to
represent MachinePointerInfo which is known in alloca address space but without other information.
TargetLowering::getVectorElementPointer uses value type of pointer in addr space 0 for
multiplication of index and then add it to the pointer. However the pointer may be in an addr
space which has different size than addr space 0. The fix is to use the pointer value type for
index multiplication.
Differential Revision: https://reviews.llvm.org/D39758
llvm-svn: 319622
Introduce pseudo-registers for registers needed for stack
access, which are replaced during finalizeLowering.
Note these pseudo-registers are currently only used for the
used register location, and not for determining their
input argument register.
This is better because it avoids the need to try to predict
whether a call will be emitted from the IR, and also
detects stack objects introduced by legalization.
Test changes are from the HasStackObjects check being more
accurate since stack objects introduced during legalization
are now known.
llvm-svn: 308325
Remove dependency of SDWA pass on SIShrinkInstructions.
The goal is to move SDWA even higher in the stack to avoid second run
of MachineLICM, MachineCSE and SIFoldOperands.
Also added handling to preserve original src modifiers.
Differential Revision: https://reviews.llvm.org/D33860
llvm-svn: 304665
An encoding does not allow to use SDWA in an instruction with
scalar operands, either literals or SGPRs. That is however possible
to copy these operands into a VGPR first.
Several copies of the value are produced if multiple SDWA conversions
were done. To cleanup MachineLICM (to hoist copies out of loops),
MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace
SGPR to VGPR copy with immediate copy right to the VGPR) runs are added
after the SDWA pass.
Differential Revision: https://reviews.llvm.org/D33583
llvm-svn: 304219