Add maximum NSA size limit as an ISA feature.
Use this to reduce NSA usage on GFX10.1 to avoid stability issues
with 4 and 5 dwords NSA instructions.
Maintain use of longer NSA instructions on GFX10.3.
Note: this also contains some minor fixes for GlobalISel which
did not work correctly with non-NSA form instructions on GFX10.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D103348
gfx9 does not work with negative offsets, gfx10 works only with
aligned negative offsets, but not with unaligned negative offsets.
This is slightly more conservative than needed, gfx9 does support
negative offsets when a VGPR address is used and gfx10 supports
negative, unaligned offsets when an SGPR address is used, but we
do not make use of that with this patch.
Differential Revision: https://reviews.llvm.org/D101292
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.
Fixed version of D99587, which does not rely on the address space.
Differential Revision: https://reviews.llvm.org/D99743
Split out some of the instructions predicated on the dot2-insts target
feature into a new dot7-insts, in preparation for subtargets that have
some but not all of these instructions. NFCI.
Differential Revision: https://reviews.llvm.org/D98717
gfx1030 added a new way to implement readcyclecounter using the
SHADER_CYCLES hardware register, but the s_memtime instruction still
exists, so the MC layer should still accept it and the
llvm.amdgcn.s.memtime intrinsic should still work.
Differential Revision: https://reviews.llvm.org/D97928
Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.
The possible settings are:
- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On
GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".
The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.
Differential Revision: https://reviews.llvm.org/D85882
It was removed in GFX10 GPUs, but LLVM could
generate it.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D94020
Change-Id: Id1c716d71313edcfb768b2b175a6789ef9b01f3c
We have this subtarget feature so it makes sense to use it here. This is
NFC because it's always defined by default on GFX8+.
Differential Revision: https://reviews.llvm.org/D93202
Both ds_read_b128 and ds_read2_b64 are valid for 128bit 16-byte aligned
loads but the one that will be selected is determined either by the order in
tablegen or by the AddedComplexity attribute. Currently ds_read_b128 has
priority.
While ds_read2_b64 has lower alignment requirements, we cannot always
restrict ds_read_b128 to 16-byte alignment because of unaligned-access-mode
option. This was causing ds_read_b128 to be selected for 8-byte aligned
loads regardles of chosen access mode.
To resolve this we use two patterns for selecting ds_read_b128. One
requires alignment of 16-byte and the other requires
unaligned-access-mode option.
Same goes for ds_write2_b64 and ds_write_b128.
Differential Revision: https://reviews.llvm.org/D92767
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.
Differential Revision: https://reviews.llvm.org/D90419
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
I do not exactly like the use of a negative predicate to
enable instructions' support. Change HasNoMadMacF32Insts
with HasFmaLegacy32.
Differential Revision: https://reviews.llvm.org/D90250
The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.
At the very least missing:
- GlobalISel;
- Some optimizations in frame elimination in between vector
and scalar ALU;
- It shall finally allow to always materialize frame index
as an SGPR, but that is not implemented and frame elimination
cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
expression to recover the unswizzled scratch address;
Differential Revision: https://reviews.llvm.org/D89170
GFX10 enables third addressing mode for flat scratch instructions,
an ST mode. In that mode both register operands are omitted and
only swizzled offset is used in addition to flat_scratch base.
Differential Revision: https://reviews.llvm.org/D89501
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:
* The "Oland" and "Hainan" variants of gfx601 are now split out into
gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
that to avoid using the shaderZExport workaround on gfx602.
* One variant of gfx703 is now split out into gfx705. LLPC and other
front-ends could use that to avoid using the
shaderSpiCsRegAllocFragmentation workaround on gfx705.
* The "TongaPro" variant of gfx802 is now split out into gfx805.
TongaPro has a faster 64-bit shift than its former friends in gfx802,
and a subtarget feature could be set up for that to take advantage of
it. This commit does not make that change; it just adds the target.
V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
so fix the GPUKind order.
Differential Revision: https://reviews.llvm.org/D88916
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
Summary:
This implements a workaround for a hardware bug in gfx8 and gfx9,
where register usage is not estimated correctly for image_store and
image_gather4 instructions when D16 is used.
Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81172
Add subtarget feature check to avoid using ds_read/write_b96/128 with too
low alignment if a bug is present on that specific hardware.
Add this "feature" to GFX 10.1.1 as it is also affected.
Add global-isel test.
Fix local ds_read/write_b96/b128 so they can be selected if the alignment
allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break
them down.
Differential Revision: https://reviews.llvm.org/D81638
Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine
whether hardware supports such access.
UnalignedAccessMode should be used to enable them.
hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be
now used to quickly check both.
Differential Revision: https://reviews.llvm.org/D84522
Adjust alignment requirements for ds_read/write_b96/b128.
GFX9 and onwards allow misaligned access for reads and writes but only if
SH_MEM_CONFIG.alignment_mode allows it.
UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we
can relax alignment requirements.
UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions
but only from GFX9 onward and is supposed to match alignment_mode. By default
alignment of 4 is required.
Differential Revision: https://reviews.llvm.org/D82788
The annoying behavior where the output is different due to the
legality check struck again, plus the subtarget predicate wasn't
really correctly set for DS FP atomics.
Some of the FP min/max instructions seem to be in the gfx6/gfx7
manuals, but IIRC this might have been one of the cases where the
manual got ahead of the actual hardware support, but I've left these
as-is for now since the assembler tests seem to expect them.
Global TableGen let override blocks are pretty dangerous and override
any local special cases. In this case, the broader HasFlatGlobalInsts
was overriding the more specific predicate for
FeatureAtomicFaddInsts. Make sure HasFlatGlobalInsts is implied by
FeatureAtomicFaddInsts, and make sure the right predicate is used.
One issue with independently setting the subtarget features on
incompatible targets is all of the encoding families do not define all
opcodes. This will hit an assert on gfx10 for example, since we set
the encoding independently based on the generation and not based on a
feature.
If f32 denormals were enabled pre-gfx9, we would still try to
implement this with v_max_f32. Pre-gfx9, these instructions ignored
the denormal mode and did not flush. Switch to the multiply form for
f32 as a workaround which should always work in any case.
This fixes conformance failures when the library implementation of
fmin/fmax were accidentally not inlined, forcing the assumption of no
flushing on targets where denormals are not enabled by default. This
is a workaround, since really we should not be mixing code with
different FP mode expectations, but prefer the lowering that will work
in any mode.
Now this will always use max to implement canonicalize on gfx9+. This
is only really beneficial for f64. For f32/f16 it's a neutral choice
(and worse in terms of code size in 1 case), but possibly worse for
the compiler since it does add an extra register use operand. Leave
this change for later.
For context, the proposed RISC-V bit manipulation extension has a subset
of instructions which require one of two SubtargetFeatures to be
enabled, 'zbb' or 'zbp', and there is no defined feature which both of
these can imply to use as a constraint either (see comments in D65649).
AssemblerPredicates allow multiple SubtargetFeatures to be declared in
the "AssemblerCondString" field, separated by commas, and this means
that the two features must both be enabled. There is no equivalent to
say that _either_ feature X or feature Y must be enabled, short of
creating a dummy SubtargetFeature for this purpose and having features X
and Y imply the new feature.
To solve the case where X or Y is needed without adding a new feature,
and to better match a typical TableGen style, this replaces the existing
"AssemblerCondString" with a dag "AssemblerCondDag" which represents the
same information. Two operators are defined for use with
AssemblerCondDag, "all_of", which matches the current behaviour, and
"any_of", which adds the new proposed ORing features functionality.
This was originally proposed in the RFC at
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139138.html
Changes to all current backends are mechanical to support the replaced
functionality, and are NFCI.
At this stage, it is illegal to combine features with ands and ors in a
single AssemblerCondDag. I suspect this case is sufficiently rare that
adding more complex changes to support it are unnecessary.
Differential Revision: https://reviews.llvm.org/D74338
Based on D72931
This adds a new feature called A16 which is enabled for gfx10.
gfx9 keeps the R128A16 feature so it can share all the instruction encodings
with gfx7/8.
Differential Revision: https://reviews.llvm.org/D73956