On AMDGPU SGPR spills are really spilled to another register.
The spiller creates the spills to new frame index objects,
which is used as a placeholder.
This will eventually be replaced with a reference to a position
in a VGPR to write to and the frame index deleted. It is
most likely not a real stack location that can be shared
with another stack object.
This is a problem when StackSlotColoring decides it should
combine a frame index used for a normal VGPR spill with
a real stack location and a frame index used for an SGPR.
Add an ID field so that StackSlotColoring has a way
of knowing the different frame index types are
incompatible.
llvm-svn: 308673
In moveToVALU(), move to vector ALU is performed, all instrs in
the use chain will be visited. We do not want the same node to be
pushed to the visit worklist more than once.
Differential Revision: https://reviews.llvm.org/D34726
llvm-svn: 308039
Summary:
1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it.
2. There were several problems with support of VOPC instructions in SDWA peephole pass.
Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl
Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye
Differential Revision: https://reviews.llvm.org/D34626
llvm-svn: 306413
Summary:
With scalar stores, M0 is clobbered and therefore marked as implicitly
defined. However, it is also dead.
This fixes an assertion when the Greedy Register Allocator decides to
optimize a spill/restore pair away again (via tryHintsRecoloring).
Reviewers: arsenm
Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D33319
llvm-svn: 306375
Summary:
Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers.
Added several subtarget features for GFX9 SDWA.
This diff also contains changes from D34026.
Depends D34026
Reviewers: vpykhtin, rampitec, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D34241
llvm-svn: 305986
Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9.
Reviewers: dp, arsenm, vpykhtin
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, artem.tamazov
Differential Revision: https://reviews.llvm.org/D34026
llvm-svn: 305886
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
1. RegisterClass::getSize() is split into two functions:
- TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const;
- TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const;
2. RegisterClass::getAlignment() is replaced by:
- TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const;
This will allow making those values depend on subtarget features in the
future.
Differential Revision: https://reviews.llvm.org/D31783
llvm-svn: 301221
Summary:
Fix a compiler bug when the lane select happens to end up in a VGPR.
Clarify the semantic of the corresponding intrinsic to be that of
the corresponding GLSL: the lane select must be uniform across a
wave front, otherwise results are undefined.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D32343
llvm-svn: 301197
This is possible in ways that are not compiler bugs,
so stop asserting on them.
This emits an extra error when emitting objects when it
can't encode the new pseudo, but I'm not sure that matters.
llvm-svn: 299712
Previously compiler often extracted common immediates into specific register, e.g.:
```
%vreg0 = S_MOV_B32 0xff;
%vreg2 = V_AND_B32_e32 %vreg0, %vreg1
%vreg4 = V_AND_B32_e32 %vreg0, %vreg3
```
Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands:
```
SDWA src: %vreg2 src_sel:BYTE_0
SDWA src: %vreg4 src_sel:BYTE_0
```
With this change peephole check if operand is either immediate or register that is copy of immediate.
llvm-svn: 299202
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.
The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.
Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.
Differential Revision: https://reviews.llvm.org/D31284
llvm-svn: 298846
StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.
llvm-svn: 298729
Since v_max_f32_e64/v_max_f16_e64 can be folded if the target
instruction supports the clamp bit, we also need to maintain
modifiers when converting v_mac to v_mad.
This fixes a rendering issue with Dirt Rally because a v_mac
instruction with the clamp bit set was converted to a v_mad
but that bit was lost during the conversion.
Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit")
Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>
llvm-svn: 297556
Before frame offsets are calculated, try to eliminate the
frame indexes used by SGPR spills. Then we can delete them
after.
I think for now we can be sure that no other instruction
will be re-using the same frame indexes. It should be easy
to notice if this assumption ever breaks since everything
asserts if it tries to use a dead frame index later.
The unused emergency stack slot seems to still be left behind,
so an additional 4 bytes is still wasted.
llvm-svn: 295753
The operand types were defined to fit the fp16_to_fp node, which
has the half as an integer type. v_cvt_f32_f16 does support
source modifiers, so change this to have an FP type and modifiers.
For targets without legal f16, this requires recognizing the
bit operations and trying to produce them.
llvm-svn: 293857
Leave early ifcvt disabled for now since there are some
shader-db regressions.
This causes some immediate improvements, but could be better.
The cost checking that the pass does is based on critical path
length for out of order CPUs which we do not want so it skips out
on many cases we want.
llvm-svn: 293016
Inline spiller can decide to move a spill as early as possible in the basic block.
It will skip phis and label, but we also need to make sure it skips instructions
in the basic block prologue which restore exec mask.
Added isPositionLike callback in TargetInstrInfo to detect instructions which
shall be skipped in addition to common phis, labels etc.
Differential Revision: https://reviews.llvm.org/D27997
llvm-svn: 292554
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.
See https://reviews.llvm.org/D28057 for the whole discussion.
Differential Revision: https://reviews.llvm.org/D28556
llvm-svn: 291891
When the instruction is processed the first time, it may be
deleted resulting in crashes. While the new test adds the same
user to the worklist twice, this particular case doesn't crash
but I'm not sure why.
llvm-svn: 290191
Since 32-bit instructions with 32-bit input immediate behavior
are used to materialize 16-bit constants in 32-bit registers
for 16-bit instructions, determining the legality based
on the size is incorrect. Change operands to have the size
specified in the type.
Also adds a workaround for a disassembler bug that
produces an immediate MCOperand for an operand that
is supposed to be OPERAND_REGISTER.
The assembler appears to accept out of bounds immediates and
truncates them, but this seems to be an issue for 32-bit
already.
llvm-svn: 289306
Use vaddr/vdst for the same purposes.
This also fixes a beg in SIInsertWaits for the
operand check. The stored value operand is currently called
data0 in the single offset case, not data.
llvm-svn: 288188
m0 may need to be written for spill code, so
we don't want general code uses relying on the
value stored in it.
This introduces a few code quality regressions where copies
from m0 are not coalesced into copies of a copy of m0.
llvm-svn: 287841