Commit Graph

84 Commits

Author SHA1 Message Date
Jay Foad d77b43c385 [AMDGPU][GlobalISel] Add G_AMDGPU_FFBL_B32
This is the counterpart to G_AMDGPU_FFBH_U32 which already exists. These
instructions have a defined result of -1 when the input is zero.

Differential Revision: https://reviews.llvm.org/D107441
2021-08-06 09:40:48 +01:00
Matt Arsenault d7d2e4545e AMDGPU/GlobalISel: Fix selecting G_SEXTLOAD/G_ZEXTLOAD pre-gfx9
The patterns for the m0 glue patterns were failing to import.
2021-07-27 15:56:42 -04:00
Petar Avramovic 4a9bc59867 AMDGPU/GlobalISel: Add integer med3 combines
Add signed and unsigned integer version of med3 combine.
Source pattern is min(max(Val, K0), K1) or max(min(Val, K1), K0)
where K0 and K1 are constants and K0 <= K1. Destination is med3
that corresponds to signedness of min/max in source.

Differential Revision: https://reviews.llvm.org/D90050
2021-04-27 11:52:23 +02:00
Sebastian Neubauer cc7add5298 [AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.

Fixed version of D99587, which does not rely on the address space.

Differential Revision: https://reviews.llvm.org/D99743
2021-04-09 12:28:36 +02:00
Stanislav Mekhanoshin edd6da10d2 [AMDGPU] Remove cpol, tfe, and swz from MUBUF patterns
These are always selected as 0 anyway.

Differential Revision: https://reviews.llvm.org/D98663
2021-03-18 14:36:04 -07:00
Stanislav Mekhanoshin 3bffb1cd0e [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Jay Foad 7e9ceed9a2 [TableGen][GlobalISel] Allow duplicate RendererFns
Allow different GICustomOperandRenderers to use the same RendererFn.
This avoids the need for targets to define a bunch of identical C++
renderer functions with different names.

Without this fix TableGen would have emitted code that tried to define
the GICR enumeration with duplicate enumerators.

Differential Revision: https://reviews.llvm.org/D96587
2021-02-12 15:05:32 +00:00
Thomas Symalla f89f6d1e5d [AMDGPU]: Fixes an invalid clamp selection pattern.
When running the tests on PowerPC and x86, the lit test GlobalISel/trunc.ll fails at the memory sanitize step. This seems to be due to wrong invalid logic (which matches even if it shouldn't) and likely missing variable initialisation."

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D95878
2021-02-08 13:06:30 +01:00
Sebastian Neubauer d49efdc969 Revert "[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path."
This reverts commits 62af0305b7cc..677a3529d3e6 from D93708.
They cause failures in the sanitizer builds because of uninitialized
values.

A fix is in D95878, but it might take some time until this is pushed,
so reverting the changes for now.
2021-02-03 11:03:34 +01:00
Thomas Symalla 602896b9d2 Renamed med3 opcode, removed superfluous copy. 2021-02-02 09:14:54 +01:00
Thomas Symalla c781c25412 Implemented a MED3_S32 GIR opcode. 2021-02-02 09:14:53 +01:00
Thomas Symalla 6604d81e1b Added and used new target pseudo for v_cvt_pk_i16_i32, changes due to code review. 2021-02-02 09:14:53 +01:00
Stanislav Mekhanoshin d15119a02d [AMDGPU][GlobalISel] GlobalISel for flat scratch
It does not seem to fold offsets but this is not specific
to the flat scratch as getPtrBaseWithConstantOffset() does
not return the split for these tests unlike its SDag
counterpart.

Differential Revision: https://reviews.llvm.org/D93670
2020-12-22 16:33:06 -08:00
Mirko Brkusanin d17ea67b92 [AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores
Fix local ds_read/write_b96/b128 so they can be selected if the alignment
allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break
them down.

Differential Revision: https://reviews.llvm.org/D81638
2020-08-21 12:26:31 +02:00
Matt Arsenault a9ee0589a8 AMDGPU/GlobalISel: Match global saddr addressing mode 2020-08-17 15:48:06 -04:00
Matt Arsenault 8cb022982a AMDGPU: Remove redundant FLAT complex patterns
These were identical to the non-atomic cases. I'm not sure why these
were ever separated.
2020-08-15 12:12:01 -04:00
Matt Arsenault 53f21e0fb7 TableGen/GlobalISel: Hack the operand order for atomic_store
ISD::ATOMIC_STORE arbitrarily has the operands in the opposite order
from regular ISD::STORE, which always introduced an annoying
duplication of patterns to handle both cases. Since in GlobalISel
there's just the one G_STORE, we need to swap the operands to
correctly emit the type check for the pointer operand.

Some work started in 20aafa3156 to
migrate SelectionDAG to use ISD::STORE for atomics, but that work
seems to have stalled. Since this is the pretty much the last
operation which matters which isn't supported for AMDGPU, use this
compatibility hack to unblock declaring it functionally complete.

Not sure what's going on with the pending_phis AArch64 test. It seems
it didn't always use atomics, and I'm not sure what it was originally
testing matters anymore.
2020-08-11 10:22:44 -04:00
Matt Arsenault 1a0c0944c6 AMDGPU: Define raw/struct variants of buffer atomic fadd
Somehow the new FP atomic buffer intrinsics ended up using the legacy
style for buffer intrinsics.
2020-08-06 13:36:19 -04:00
Matt Arsenault 63cdc9a49f AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd|fmin|fmax}
These intrinsics are missing mangling for both the pointer and data
type.
2020-08-06 11:09:08 -04:00
Matt Arsenault dcf3ffb0a8 AMDGPU/GlobalISel: Move frame index selection to patterns
Doesn't really save any code until global value is handled too.
2020-08-06 10:42:15 -04:00
Matt Arsenault a4edc04693 AMDGPU/GlobalISel: Use clamp modifier for [us]addsat/[us]subsat
We also have never handled this for SelectionDAG, which needs
additional work.
2020-07-28 11:18:05 -04:00
Kazuaki Ishizaki 0312b9f550 [llvm] NFC: Fix trivial typo in rst and td files
Differential Revision: https://reviews.llvm.org/D77469
2020-04-23 14:26:32 +09:00
Matt Arsenault b27d255e1e AMDGPU/GlobalISel: Form CVT_F32_UBYTE0 2020-03-30 17:45:55 -04:00
Matt Arsenault 15bf916b54 AMDGPU: Remove VOP3OpSelMods0 complex pattern
Use default operand of 0 instead.
2020-03-04 17:18:22 -05:00
Matt Arsenault dfce5fd50a AMDGPU/GlobalISel: Select VOP3P instructions
This only handles the basic cases. More work is needed to make better
use of op_sel.
2020-02-21 13:35:40 -05:00
Matt Arsenault 86813e2768 AMDGPU/GlobalISel: Select llvm.amdgcn.s.buffer.load
Doesn't try to fail on the dlc bit pre-gfx10 like the DAG lowering
does.
2020-02-17 08:02:40 -08:00
Matt Arsenault 9ec668606b AMDGPU: Add option to disable CGP division expansion
The division expansions in AMDGPUCodeGenPrepare can't be relied on for
correctness, since they punt to later optimization and possibly
legalization in some cases. We still need a way to be able to write
tests for the legalizer versions of the expansion. This is mostly for
GlobalISel, since the expected optimzations is expecting aren't
implemented.

The interaction with the flag to expand 64-bit division in the IR is
pretty confusing, but these flags have different purposes.
2020-02-14 11:37:07 -08:00
Matt Arsenault 045a8921d7 AMDGPU/GlobalISel: Select G_CTLZ_ZERO_UNDEF
Directly select this rather than going through the intermediate
instruction, which may provide some combine value in the future.
2020-02-12 16:19:45 -08:00
Matt Arsenault 6fb544d1d2 AMDGPU/GlobalISel: Combine FMIN_LEGACY/FMAX_LEGACY
Try out using combine definition rules.

This really should be a post-legalizer combine, but the combiner pass
is currently pre-legalize. Most of the target combines are really
post-legalize, so we should probably move the pass.
2020-01-31 06:58:04 -08:00
Matt Arsenault 49e424e08e AMDGPU/GlobalISel: Select global MUBUF atomicrmw 2020-01-31 06:05:41 -08:00
Austin Kerbow 2605adb69c [AMDGPU][GlobalISel] Select 8-byte LDS Ops with 4-byte alignment
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73585
2020-01-29 10:42:12 -08:00
Matt Arsenault c3075e6171 AMDGPU/GlobalISel: Select buffer atomics
The cmpswap handling is incomplete and fails to select.
2020-01-27 15:16:44 -05:00
Matt Arsenault 0eb62d5b3f AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.store 2020-01-27 15:16:21 -05:00
Matt Arsenault 533d650e94 AMDGPU/GlobalISel: Move llvm.amdgcn.raw.buffer.store handling
Treat this the same way as loads. There's less value to the
intermediate nodes, but it's good to be consistent.
2020-01-27 14:59:30 -05:00
Matt Arsenault 09ed0e44d9 AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.load 2020-01-27 13:40:37 -05:00
Matt Arsenault 198624c39d AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load.format 2020-01-27 13:02:19 -05:00
Matt Arsenault fc90222a91 AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load
Use intermediate instructions, unlike with buffer stores. This is
necessary because of the need to have an internal way to distinguish
between signed and unsigned extloads. This introduces some duplication
and near duplication with the buffer store selection path. The store
handling should maybe be moved into legalization to match and
eliminate the duplication.
2020-01-27 12:49:23 -05:00
Matt Arsenault e60d658260 AMDGPU/GlobalISel: Handle VOP3NoMods 2020-01-27 09:03:44 -08:00
Matt Arsenault ac0b9b4ccf AMDPGPU/GlobalISel: Select more MUBUF global addressing modes
The handling of the high bits of the resource descriptor seem weird to
me, where the 3rd dword changes based on the instruction.
2020-01-27 07:28:36 -08:00
Matt Arsenault fdaad485e6 AMDGPU/GlobalISel: Initial selection of MUBUF addr64 load/store
Fixes the main reason for compile failures on SI, but doesn't really
try to use the addressing modes yet.
2020-01-27 07:13:56 -08:00
Matt Arsenault 4d14772f5c AMDGPU/GlobalISel: Remove redundant or patterns
These ended up with higher priority than or3 patterns in a future
patch. This also fixes the using VOP2 forms.
2020-01-22 21:45:51 -05:00
Matt Arsenault a174f0da62 AMDGPU/GlobalISel: Add pre-legalize combiner pass
Just copy the AArch64 pass as-is for now, except for removing the
memcpy handling.
2020-01-22 10:16:39 -05:00
Matt Arsenault a722cbf77c AMDGPU/GlobalISel: Handle atomic_inc/atomic_dec
The intermediate instruction drops the extra volatile argument. We are
missing an atomic ordering on these.
2020-01-22 09:26:17 -05:00
Matt Arsenault 592de0009f AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp
The existing test is overly reliant on -mattr=-flat-for-global, and
some missing optimizations to re-use.
2020-01-17 20:09:53 -05:00
Matt Arsenault b4a647449f TableGen/GlobalISel: Add way for SDNodeXForm to work on timm
The current implementation assumes there is an instruction associated
with the transform, but this is not the case for
timm/TargetConstant/immarg values. These transforms should directly
operate on a specific MachineOperand in the source
instruction. TableGen would assert if you attempted to define an
equivalent GISDNodeXFormEquiv using timm when it failed to find the
instruction matcher.

Specially recognize SDNodeXForms on timm, and pass the operand index
to the render function.

Ideally this would be a separate render function type that looks like
void renderFoo(MachineInstrBuilder, const MachineOperand&), but this
proved to be somewhat mechanically painful. Add an optional operand
index which will only be passed if the transform should only look at
the one source operand.

Theoretically it would also be possible to only ever pass the
MachineOperand, and the existing renderers would check the parent. I
think that would be somewhat ugly for the standard usage which may
want to inspect other operands, and I also think MachineOperand should
eventually not carry a pointer to the parent instruction.

Use it in one sample pattern. This isn't a great example, since the
transform exists to satisfy DAG type constraints. This could also be
avoided by just changing the MachineInstr's arbitrary choice of
operand type from i16 to i32. Other patterns have nontrivial uses, but
this serves as the simplest example.

One flaw this still has is if you try to use an SDNodeXForm defined
for imm, but the source pattern uses timm, you still see the "Failed
to lookup instruction" assert. However, there is now a way to avoid
it.
2020-01-09 17:37:52 -05:00
Matt Arsenault e71af77568 AMDGPU/GlobalISel: Add IMMPopCount xform
Partially fixes BFE pattern import.
2020-01-09 10:29:32 -05:00
Matt Arsenault 79450a4ea2 AMDGPU/GlobalISel: Add selectVOP3Mods_nnan
This doesn't enable any new imports yet, but moves the fmed patterns
from failing on this to hitting the "complex suboperand referenced
more than once" limitation in tablegen.
2020-01-09 10:29:32 -05:00
Matt Arsenault d964086c62 AMDGPU/GlobalISel: Add equiv xform for bitcast_fpimm_to_i32
Only partially fixes one pattern import.
2020-01-09 10:29:31 -05:00
Matt Arsenault 3952748ffd AMDGPU/GlobalISel: Fix add of neg inline constant pattern 2020-01-09 10:29:31 -05:00