The change adds divergence predicates for fused logical operations.
The problem with selecting a scalar fused op such as S_NOR_B32 is
that it does not have a VALU counterpart and will be split in
moveToVALU. At the same time it prevents selection of a better
opcode on the VALU side (such as V_OR3_B32) which does not have a
counterpart on SALU side.
XNOR opcodes are left as is and selected as scalar to get advantage
of the SIInstrInfo::lowerScalarXnor() code which can commute
operations to keep one of two opcodes on SALU if possible. See
xnor.ll test for this.
Differential Revision: https://reviews.llvm.org/D111907
Set the output register class based on the output type, instead of
hard-coding VGPR_32. I think this is more correct. It doesn't make any
difference at the moment because we use the same class for 16- and
32-bit results, but it might in future if we make more use of true
16-bit register classes.
Differential Revision: https://reviews.llvm.org/D102622
Note, only src0 and src1 will be commuted if the isCommutable flag
is set. This patch does not change that, it just makes it possible
to commute src0 and src1 of some U/I/B vop3 instructions.
This patch revises d35d8da7d6.
It contains the commute opportunities excluding float insts
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D101474
Change-Id: I62938173d750453839f2457a3851661a29135faf
By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086.
This fix simplifies handling of single VOP instructions by adding a dedicated flag.
Differential Revision: https://reviews.llvm.org/D99408
Note, only src0 and src1 will be commuted if the isCommutable flag
is set. This patch does not change that, it just makes it possible
to commute src0 and src1 of more instructions.
Reviewed By: foad, rampitec
Differential Revision: https://reviews.llvm.org/D99376
Change-Id: I61e20490962d95ea429beb355c55f55c024dafdc
HasModifiers should be true if at least one modifier is used.
This should make the use of this field bit more consistent.
Differential Revision: https://reviews.llvm.org/D94795
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only the mir.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D94341
Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
VOP3 and VOP DPP subroutines to generate input
operands and asm strings were essentially copy
pasted several times. They are deduplicated to
reduce the maintenance burden and allow faster
development.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D94102
Change-Id: I76225eed3c33239d9573351e0c8a0abfad0146ea
It was removed in GFX10 GPUs, but LLVM could
generate it.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D94020
Change-Id: Id1c716d71313edcfb768b2b175a6789ef9b01f3c
These parameters set a default value of 0, so I believe they
should include a 0 suffix. This allows for versions which do not
set a default value in future.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D93187
V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src
modifier, but they can still use NEG and the usual output modifiers.
This partially reverts 3b99f12a4e "AMDGPU: Remove modifiers from v_div_scale_*".
Differential Revision: https://reviews.llvm.org/D90296
I do not exactly like the use of a negative predicate to
enable instructions' support. Change HasNoMadMacF32Insts
with HasFmaLegacy32.
Differential Revision: https://reviews.llvm.org/D90250
Predicates with 'let PredicateCodeUsesOperands = 1' want to examine
matched operands. When we encounter predicate code that uses operands,
analyze its named operand arguments and create a map between argument
index and name. Later, when leaf node with name is encountered, emit
GIM_RecordNamedOperand that will store that operand at its argument
index in operand list. This operand list will be an argument to c++
code of the predicate.
Differential Revision: https://reviews.llvm.org/D87285
The carry-out opcode is renamed, so eliminate the deceptive _gfx9,
which looked like the encoded instruction. The real encoded version
was named _gfx9_gfx9.
Move it into the VI encoding namespace. The gfx9 namespace is just to
deal with the renamed instructions that reinterpret the opcode. When
codegened, it would fail to find the real instruction since it wasn't
in the right namespace.
This may be missing a few overrides to set it off still in some
special cases. Since the flags set during selection should now be
reliably preserved, this should not change codegen for non-strictfp
functions.
This is the groundwork required to implement strictfp. For now, this
should be NFC for regular instructoins (many instructions just gain an
extra use of a reserved register). Regalloc won't rematerialize
instructions with reads of physical registers, but we were suffering
from that anyway with the exec reads.
Should add it for all the related FP uses (possibly with some
extras). I did not add it to either the gpr index mode instructions
(or every single VALU instruction) since it's a ridiculous feature
already modeled as an arbitrary side effect.
Also work towards marking instructions with FP exceptions. This
doesn't actually set the bit yet since this would start to change
codegen. It seems nofpexcept is currently not implied from the regular
IR FP operations. Add it to some MIR tests where I think it might
matter.
I tried to use some of the new tablegen features to avoid creating
different operand list permutations, but I still don't see a way to
programmatically build a source pattern dag.
Also add GlobalISel tests, which now all import successfully.
Some of the fneg fold tests are incorrect, which need to be fixed in a
future commit
We don't use this, and matching from the def doesn't make much sense.
There are multiple tablegen bugs with default operand
handling. undef_tied_input should work to handle the vdst_in
correctly, but this breaks the operand register class constraint which
it should be able to infer.
Manually select this is as a tablegen workraound. Both SelectionDAG
and GlobalISel end up misplacing the copy to m0 when both instructions
in the output need it. Neither considers that both output instructions
depend on m0. I don't know of any other pattern we need to handle this
case, so it's less effort to just workaround this for now.
The other 3-op patterns should also be theoretically handled, but
currently there's a bug in the inferred pattern complexity.
I'm not sure what the error handling strategy should be for potential
constant bus violations. I think the correct strategy is to never
produce mixed SGPR and VGPR operands in a typical VOP instruction,
which will trivially avoid them. However, it's possible to still have
hand written MIR (or erroneously transformed code) with these
operands. When these fold, the restriction will be violated. We
currently don't have any verifiers for reg bank legality. For now,
just ignore the restriction.
It might be worth triggering a DAG fallback on verifier error.
This case can be handled as a regular selection pattern, so move it
out of the weird post-isel folding code which doesn't have an exactly
equivalent place in GlobalISel.
I think it doesn't make much sense to do this optimization here
though, and it would be more useful in instcombine. There's not really
any new information that will be gained during lowering since these
inputs were known from the beginning.
The 16 bank LDS case is complicated due to using multiple
instructions. If I attempt to write a pattern for it, the generated
selector incorrectly places the copy to m0 after the first
instruction, so that needs to be separately addressed.
Also fix not gluing the copy to m0 to the second operation in the
second half of the 16 bank lowering.
We are duplicating predicates if several parts of the combined
predicate list contain the same condition. Added code to deduplicate
the list.
We have AssemblerPredicates and AssemblerPredicate in the
PredicateControl, but we never use AssemblerPredicates with an
actual list, so this one is dropped.
This addresses the first part of the llvm bug 43886:
https://bugs.llvm.org/show_bug.cgi?id=43886
Differential Revision: https://reviews.llvm.org/D69815