Summary:
v2: Make ReturnsVoid private, so that I can another 8 lines of code and
look more productive.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16034
llvm-svn: 257622
Summary:
Return values can be stored in SGPRs (i32) and VGPRs (f32).
This will be used by functions which expect some bytecode or other binary to
be appended at the end. It allows defining in which registers the return
values will be stored.
v2: don't do this for compute shaders
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16033
llvm-svn: 257621
Summary:
It is off by default, but can be used
with --misched=si
Patch by: Axel Davy
Reviewers: arsenm, tstellarAMD, nhaehnle
Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D11885
llvm-svn: 257609
Summary:
With the ability to concatenate shader binaries, the limit of 15 no longer
applies.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16031
llvm-svn: 257592
Summary:
This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM.
The register assigns VGPR locations to PS inputs, while the ENA register
determines whether or not they are loaded.
Mesa needs to set some inputs as not-movable, so that a pixel shader prolog
binary appended at the beginning can assume where some inputs are.
v2: Make PSInputAddr private, because there is never enough silly getters
and setters for people to read.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16030
llvm-svn: 257591
Summary: ret.ll will contain a test for this
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16029
llvm-svn: 257590
The hardware instruction's output on 0 is -1 rather than 32.
Eliminate a test and select to -1. This removes an extra instruction
from the compatability function with HSAIL's firstbit instruction.
llvm-svn: 257352
Summary:
Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs,
increasing the code size and VGPR pressure. These moves are now folded away.
Note that this lack of operand folding was not a problem for VMEM loads,
because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register
coalescer.
Some tests are updated, note that the fsub.ll test explicitly checks that
the move is elided.
With the IR generated by current Mesa, the changes are obviously relatively
minor:
7063 shaders in 3531 tests
Totals:
SGPRS: 351872 -> 352560 (0.20 %)
VGPRS: 199984 -> 200732 (0.37 %)
Code Size: 9876968 -> 9881112 (0.04 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave
Wait states: 295164 -> 295337 (0.06 %)
Totals from affected shaders:
SGPRS: 65784 -> 66472 (1.05 %)
VGPRS: 38064 -> 38812 (1.97 %)
Code Size: 1993828 -> 1997972 (0.21 %) bytes
LDS: 42 -> 42 (0.00 %) blocks
Scratch: 795648 -> 783360 (-1.54 %) bytes per wave
Wait states: 54026 -> 54199 (0.32 %)
Reviewers: tstellarAMD, arsenm, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15875
llvm-svn: 257074
Summary:
Somehow, I first interpreted the docs as saying space for xnack_mask is only
reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and
went back to actually test what is happening, and it turns out that xnack_mask
is always reserved at least on Tonga and Carrizo, in the sense that flat_scr
is always fixed below the SGPRs that are used to implement xnack_mask, whether
or not they are actually used.
I confirmed this by writing a shader using inline assembly to tease out the
aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where
we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so
xnack_mask is s[76:77] and vcc is s[78:79]).
This patch changes both the calculation of the total number of SGPRs and the
various register reservations to account for this.
It ought to be possible to use the gap left by xnack_mask when the feature
isn't used, but this patch doesn't try to do that. (Note that the same applies
to vcc.)
Note that previously, even before my earlier change in r256794, the SGPRs that
alias to xnack_mask could end up being used as well when flat_scr was unused
and the total number of SGPRs happened to fall on the right alignment
(e.g. highest regular SGPR being used s29 and VCC used would lead to number
of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there
were some conflict due to such aliasing, we should have noticed that already.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15898
llvm-svn: 257073
Summary:
This is admittedly something that you could only run into by manually
playing around with shader assembly because the SITypeWriter pass is
skipped for compute.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15902
llvm-svn: 256980
Due to the SGPR init bug, every program claims to use the same number
of SGPRs anyway, so there's no point in trying to shift those registers
down from their initial spot of reservation.
Add a test that uses VGPR spilling and blocks most SGPRs from being used for
the scratch resource register. Previously, this would run into an assertion.
Differential Revision: http://reviews.llvm.org/D15724
llvm-svn: 256870
Summary:
We had to sets of identical FLAT patterns one inside the
HasFlatAddressSpace predicate and one inside the useFlatForGloabl
predicate. This patch merges these sets into a single pattern
under the isCIVI predicate.
The reason we can remove the predicates is that when MUBUF instructions
are legal, the instruction selector will prefer selecting those over
FLAT instructions because MUBUF patterns have a higher complexity score.
So, in this case having patterns for FLAT instructions will have no effect.
This change also simplifies the process for forcing global address space
loads to use FLAT instructions, since we no only have to disable the
MUBUF patterns instead of having to disable the MUBUF patterns and
enable the FLAT patterns.
Reviewers: arsenm, cfang
Subscribers: llvm-commits
llvm-svn: 256807
Summary:
Enabling this feature will account for the two SGPRs used by the hardware
to store the XNACK_MASK physically.
The hardware only requires this reservation when the XNACK feature is
explicitly enabled. At some point, HSA will probably want to do that, but
it does increase SGPR register pressure, so leave it disabled by default
for now (but do add a small test).
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15869
llvm-svn: 256794
Summary: This was accidently moved to CIInstructions.td in r256282
Reviewers: cfang, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15763
llvm-svn: 256775
Summary:
The comment explains it: emitError does not necessarily exit the compilation
process, and then using NoRegister leads to assertions later on.
This generates incorrect code, of course, but the user should know to not use
the result when an error has been emitted.
It would be nice to have a test-case for this inside the LLVM repository,
but llc exits on error. shader-db tests trigger the underlying issue at least
on Tonga.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15826
llvm-svn: 256757
Summary:
For some reason doing executing an MUBUF instruction with the addr64
bit set and a zero base pointer in the resource descriptor causes
the memory operation to be dropped when the shader is executed using
the HSA runtime.
This kind of MUBUF instruction is commonly used when the pointer is
stored in VGPRs. The base pointer field in the resource descriptor
is set to zero and and the pointer is stored in the vaddr field.
This patch resolves the issue by only using flat instructions for
global memory operations when targeting HSA. This is an overly
conservative fix as all other configurations of MUBUF instructions
appear to work.
NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll
Reviewers: tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15543
llvm-svn: 256282
Summary:
For some reason doing executing an MUBUF instruction with the addr64
bit set and a zero base pointer in the resource descriptor causes
the memory operation to be dropped when the shader is executed using
the HSA runtime.
This kind of MUBUF instruction is commonly used when the pointer is
stored in VGPRs. The base pointer field in the resource descriptor
is set to zero and and the pointer is stored in the vaddr field.
This patch resolves the issue by only using flat instructions for
global memory operations when targeting HSA. This is an overly
conservative fix as all other configurations of MUBUF instructions
appear to work.
Reviewers: tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15543
llvm-svn: 256273
Summary:
These register has different encodings on CI and VI, so we add pseudo
FLAT_SCRACTH registers to be used before MC, and subtarget specific
registers to be used by the MC layer.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15661
llvm-svn: 256178
Summary:
The analysis of shader inputs was completely wrong. We were passing the
wrong index to AttributeSet::hasAttribute() and the logic for which
inputs where in SGPRs was wrong too.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15608
llvm-svn: 256082
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.
llvm-svn: 256075
Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.
Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)
Together with r255906, this fixes a VM fault in Unreal Elemental Demo.
While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15622
llvm-svn: 256072
Summary:
The method insertNOPs expected the number of wait states to be passed as
parameter, while eliminateFrameIndex passed the immediate argument for the
S_NOP, leading to an off-by-one error. Rename the method to make the
meaning of its parameter clearer. The number of 4 / 5 wait states (which
is what the method has always _tried_ to do according to the comment) is
correct according to the hardware docs.
I stumbled upon this while trying to track down the cause of
https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed,
this patch unfortunately does not fix that bug...
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15542
llvm-svn: 255906
Summary: I'm not sure how things worked before without this.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15492
llvm-svn: 255692