Summary:
Instead of trying to replace SMRD instructions with a VGPR base pointer
with an equivalent MUBUF instruction, we now copy the base pointer to
SGPRs using v_readfirstlane.
This is safe to do, because any load selected as an SMRD instruction
has been proven to have a uniform base pointer, so each thread in the
wave will have the same pointer value in VGPRs.
This will fix some errors on VI from trying to replace SMRD instructions
with addr64-enabled MUBUF instructions that don't exist.
Reviewers: arsenm, cfang, nhaehnle
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17305
llvm-svn: 261385
Summary:
This was broken in r260694 which swapped the address and data operands
for flat store instructions. The code in SIInsertWaits assumes
that the data operand always comes before the address operand, so
we need to add a special case for flat.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17366
llvm-svn: 261330
Summary:
These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa
for the GL_ARB_shader_image_load_store extension.
IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has
a legacy name and pretends not to read memory.
Differential Revision: http://reviews.llvm.org/D17276
llvm-svn: 261224
Tests for the new scalarize all private access options will be
included with a future commit.
The only functional change is to make the split/scalarize behavior
for private access of > 4 element vectors to be consistent
with the flat/global handling. This makes the spilling worse
in the two changed tests.
llvm-svn: 260804
Historically, AMD internal sp3 assembler has flat_store* addr, data
format. To match existing code and to enable reuse, change LLVM
definitions to match. Also update MC and CodeGen tests.
Differential Revision: http://reviews.llvm.org/D16927
Patch by: Nikolay Haustov
llvm-svn: 260694
Summary:
It is possible that the loop condition can be a boolean constant (infinite loop,
for example). So we sould handle constant condition in annotating a loop. This
patch adds this functionality to support annotating constant condition.
Reviewers: tstellarAMD, arsenm
Subscribers: llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D15093
llvm-svn: 260692
This was hardcoded to the static private size, but this
would be missing the offset and additional size for someday
when we have dynamic sizing.
Also stops always initializing flat_scratch even when unused.
In the future we should stop emitting this unless flat instructions
are used to access private memory. For example this will initialize
it almost always on VI because flat is used for global access.
llvm-svn: 260658
Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.
llvm-svn: 260651
Summary:
It's possible to have resource descriptors and samplers stored in
VGPRs, either by a VMEM instruction or in the case of samplers,
floating-point calculations. When this happens, we need to use
v_readfirstlane to copy these values back to sgprs.
Reviewers: mareko, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17102
llvm-svn: 260599
If the two operands to an instruction were both
subregisters of the same super register, it would incorrectly
think this counted as the same constant bus use.
This fixes the verifier error in fmin_legacy.ll which
was missing -verify-machineinstrs.
llvm-svn: 260495
Summary:
This fixes a crash where subsequent spills would be unable to scavenge
a register. In particular, it fixes a crash in piglit's
spec@glsl-1.50@execution@geometry@max-input-components (the test still
has a shader that fails to compile because of too many SGPR spills, but
at least it doesn't crash any more).
This is a candidate for the release branch.
Reviewers: arsenm, tstellarAMD
Subscribers: qcolombet, arsenm
Differential Revision: http://reviews.llvm.org/D16558
llvm-svn: 260427
If a range has a lower bound of 0, add an AssertZext from the
nearest floor power of two.
This allows operations with some workitem intrinsics with known
maximum ranges to use fast 24-bit multiplies.
llvm-svn: 260109
The current situation isn't great, because the amount of padding
requires is determined by the inverse order of the first encountered
use. We should eventually somehow sort these to minimize wasted space.
Another problem is the alignment of kernel arguments isn't
respected. The group_segment_alignment is always emitted as
the default 16, and typed arguments with higher alignments
or an explicitly set alignment are also ignored.
llvm-svn: 259912
Recommited, after some fixing with test cases.
Updated test cases:
test/CodeGen/AArch64/arm64-misched-memdep-bug.ll
test/CodeGen/AArch64/tailcall_misched_graph.ll
Temporarily disabled test cases:
test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll
test/CodeGen/PowerPC/ppc64-fastcc.ll (partially updated)
test/CodeGen/PowerPC/vsx-fma-m.ll
test/CodeGen/PowerPC/vsx-fma-sp.ll
http://reviews.llvm.org/D8705
Reviewers: Hal Finkel, Andy Trick.
llvm-svn: 259673
If we can't assume the pointer value isn't within the bounds
of the object, it seems risky to try to replace the pointer
calculations.
llvm-svn: 259573
When promoting allocas to LDS, we know we are indexing
into a specific area just created, and the calculation
will also never overflow.
Also emit some of the muls as nsw nuw, because instcombine
infers this already from the range metadata. I think
putting this on the other adds and muls might be OK too,
but I'm not 100% sure.
llvm-svn: 259545
Re-commit of r258951 after fixing layering violation.
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.
There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.
llvm-svn: 259498
The AMDGPUPromoteAlloca pass was emitting the read.local.size
calls, which with HSA was incorrectly selected to reading from
the offset mesa uses off of the kernarg pointer.
Error on intrinsics which aren't supported by HSA, and start
emitting the correct IR to read the workgroup size
out of the dispatch pointer.
Also initialize the pass so it can be tested with opt, and
start moving towards not depending on the subtarget as an
argument.
Start emitting errors for the intrinsics not handled with HSA.
llvm-svn: 259297
Only the dispatch.ptr intrinsic is supposed to be used now to get
the workgroup size, and the read.local.size intrinsics do not
work correctly.
llvm-svn: 259296
Re-commit of r258951 after fixing layering violation.
The related LLVM patch adds a backend diagnostic type for reporting
unsupported features, this adds a printer for them to clang.
In the case where debug location information is not available, I've
changed the printer to report the location as the first line of the
function, rather than the closing brace, as the latter does not give the
user any information. This also affects optimisation remarks.
Differential Revision: http://reviews.llvm.org/D16590
llvm-svn: 259035
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.
There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.
The implementation of DiagnosticInfoUnsupported::print must be in
lib/Codegen rather than in the existing file in lib/IR/ to avoid
introducing a dependency from IR to CodeGen.
Differential Revision: http://reviews.llvm.org/D16590
llvm-svn: 258951
Summary:
This is a candidate for stable, along with all patches that add the "stoney"
processor.
Reviewers: tstellarAMD
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16485
llvm-svn: 258922
When no device name is specified, default to kaveri
for HSA since SI is not supported and it woud fail.
Default to "tahiti" instead of "SI" since these are
effectively the same, and tahiti is an actual device.
Move default device handling to the TargetMachine
rather than the AMDGPUSubtarget. The module ISA version
is computed from the device name provided with the target
machine, so the attributes printed by the AsmPrinter were
inconsistent with those computed in the subtarget.
Also remove DevName field from subtarget since it's redundant
with getCPU() in the superclass.
llvm-svn: 258901
I did my best to try to update all the uses in tests that
just happened to use the old ones to the newer intrinsics.
I'm not sure I got all of the immediate operand conversions
correct, since the value seems to have been ignored by the
old pattern but I don't think it really matters.
llvm-svn: 258787
Some of the special intrinsics now that now correspond to a instruction
also have special setting of some registers, e.g. llvm.SI.sendmsg sets
m0 as well as use s_sendmsg. Using these explicit register intrinsics
may be a better option.
Reading the exec mask and others may be useful for debugging. For this
I'm not sure this is entirely correct because we would want this to
be convergent, although it's possible this is already treated
sufficently conservatively.
llvm-svn: 258785
For historic reasons, the behavior of .align differs between targets.
Fortunately, there are alternatives, .p2align and .balign, which make the
interpretation of the parameter explicit, and which behave consistently across
targets.
This patch teaches MC to use .p2align instead of .align, so that people reading
code for multiple architectures don't have to remember which way each platform
does its .align directive.
Differential Revision: http://reviews.llvm.org/D16549
llvm-svn: 258750