Commit Graph

81 Commits

Author SHA1 Message Date
Kazu Hirata 41bfac6aed [Target] Remove unused forward declarations (NFC) 2022-01-02 10:20:15 -08:00
Neubauer, Sebastian d1f45ed58f [AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
2021-11-12 11:37:21 +01:00
Jay Foad 83610d4eb0 [AMDGPU][GlobalISel] Better legalization of 32-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107474
2021-08-06 09:40:48 +01:00
Jay Foad 59f6865231 [AMDGPU][GISel] Fix MMO for raw/struct buffer access with non-constant offset
Codegen for the raw/struct buffer access intrinsics would update the
offset in the MMO to reflect the combined offset, if it was known to be
constant. If the combined offset was not known to be constant, or if
there was an index, it would set the offset in the MMO to 0. This is
unsafe because it makes it look like the access does not alias with
another access with a fixed non-zero offset.

Fix these cases by setting the pointer in the MMO to null, to reflect
the fact that we do not have any known IR value pointer + constant
offset for the access.

D106284 did this for SelectionDAG. This is the corresponding fix for
GlobalISel.

Differential Revision: https://reviews.llvm.org/D106451
2021-07-26 14:27:30 +01:00
Christudasan Devadasan e3b8e6d482 [AMDGPU] Remove dead declaration (NFC). 2021-05-25 16:04:04 +05:30
Christudasan Devadasan 90d784053f AMDGPU/GlobalISel: Legalize G_[SU]DIVREM instructions
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100726
2021-05-25 10:51:07 +05:30
Christudasan Devadasan 97618522dc [AMDGPU] Remove dead dcode (NFC). 2021-04-16 23:03:31 +05:30
Konstantin Zhuravlyov f4ace63737 AMDGPU: Add target id and code object v4 support
- Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id)
  - Add code object v4 support (https://llvm.org/docs/AMDGPUUsage.html#elf-code-object)
    - Add kernarg_size to kernel descriptor
    - Change trap handler ABI to no longer move queue pointer into s[0:1]
  - Cleanup ELF definitions
    - Add V2, V3, V4 suffixes to make a clear distinction for code object version
    - Consolidate note names

Differential Revision: https://reviews.llvm.org/D95638
2021-03-24 11:54:05 -04:00
Kazu Hirata c85b6bf33c [AMDGPU] Forward-declare MachineIRBuilder (NFC)
AMDGPULegalizerInfo.h needs MachineIRBuilder but relies on a forward
declaration of MachineIRBuilder in LegalizerInfo.h.  This patch adds a
forward declaration right in AMDGPULegalizerInfo.h.

While we are at it, this patch removes the one in LegalizerInfo.h,
where it is unnecessary.
2021-01-25 19:24:01 -08:00
Matt Arsenault 2a0db8d70e AMDGPU: Use more accurate fast f64 fdiv
A raw v_rcp_f64 isn't accurate enough, so start applying correction.
2021-01-21 10:51:36 -05:00
dfukalov 6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Rodrigo Dominguez f71f5f39f6 [AMDGPU] Implement hardware bug workaround for image instructions
Summary:
This implements a workaround for a hardware bug in gfx8 and gfx9,
where register usage is not estimated correctly for image_store and
image_gather4 instructions when D16 is used.

Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81172
2020-10-07 07:39:52 -04:00
Jay Foad f11f382523 [AMDGPU] Fix declaration parameter names to match definition
This fixes the declaration of AMDGPULegalizerInfo::legalizeBufferLoad to
match the definition. It is still confusing that that parameter order is
different from legalizeBufferStore.

https://bugs.llvm.org/show_bug.cgi?id=47535
2020-09-25 11:38:17 +01:00
Stanislav Mekhanoshin 27a62f6317 [AMDGPU] global-isel support for RT
Differential Revision: https://reviews.llvm.org/D87847
2020-09-24 10:29:45 -07:00
Matt Arsenault 076305568c AMDGPU/GlobalISel: Prepare for more custom load lowerings
Slight restructuring of the code to avoid formatting changes when more
cases are handled here.
2020-08-11 11:09:05 -04:00
Petar Avramovic 0d58d9e8fb AMDGPU/GlobalISel: Lower G_FREM
Add custom lower for G_FREM.

Differential Revision: https://reviews.llvm.org/D84324
2020-08-10 10:10:46 +02:00
Matt Arsenault 63cdc9a49f AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd|fmin|fmax}
These intrinsics are missing mangling for both the pointer and data
type.
2020-08-06 11:09:08 -04:00
Matt Arsenault 5a503521e7 AMDGPU/GlobalISel: Implement expansion for rsq.clamp
Not sure why we handle this removed instruction on newer subtargets
for this one and no others, but maintain compatibility with the DAG.
2020-08-06 10:23:25 -04:00
Matt Arsenault f8fb7835d6 GlobalISel: Add utilty for getting function argument live ins
Get the argument register and ensure there's a copy to the virtual
register. AMDGPU and AArch64 have similarish code to get the livein
value, and I also want to use this in multiple places.

This is a bit more aggressive about setting the register class than
the original function, but that's probably OK.

I think we're missing a few verifier checks for function live ins. I
noticed AArch64's calling convention code is not actually adding
liveins to functions, only the entry block (which apparently might not
matter that much?). There should probably be a verifier check that
entry block live ins are also live into the function. We also might
need a verifier check that the copy to the livein virtual register is
in the entry block.
2020-08-04 16:55:55 -04:00
Matt Arsenault fd63e46941 AMDGPU/GlobalISel: Apply load bitcast to s.buffer.load intrinsic
Should also apply this to the non-scalar buffer loads.
2020-08-03 08:54:29 -04:00
Matt Arsenault 200bb5191a AMDGPU/GlobalISel: Refactor special argument management 2020-07-29 08:27:31 -04:00
Matt Arsenault 16bcd54570 AMDGPU/GlobalISel: Mark GlobalISel classes as final 2020-07-28 11:42:17 -04:00
Matt Arsenault 61f1f2a204 AMDGPU/GlobalISel: Initial Implementation of calls
Return values, and tail calls are not yet handled.
2020-07-20 11:13:22 -04:00
Matt Arsenault a448670752 AMDGPU/GlobalISel: Legalize 64-bit G_SDIV/G_SREM
Now all the divisions should be complete, although we should fix
emitting the entire common part for div/rem when you use both.
2020-06-24 11:39:45 -04:00
Michael Liao 20a1700293 [amdgpu] Fix REL32 relocations with negative offsets.
Summary: - The offset should be treated as a signed one.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82234
2020-06-21 23:09:03 -04:00
Matt Arsenault 7f8b2e1b91 GlobalISel: Pass LegalizerHelper to custom legalize callbacks
This was passing in all the parameters needed to construct a
LegalizerHelper in the custom legalization, when it's simpler to just
pass in the existing helper.

This is slightly more annoying to use in the common case where you
don't need the legalizer helper, but we could add back the common
parameters back in addition to the helper.

I didn't propagate this to all the internal target changes that this
logically implies, but did update a sample one for
legalizeMinNumMaxNum.

This is in preparation for moving AMDGPU load/store legalization
entirely into custom lowering. The current set of legalization actions
is really constraining and not really capable of expressing all the
actions needed to legalize loads/stores. In particular there's no way
to express when the memory access itself needs to change size vs. the
result type. There's also a lot of redundancy since the same
split/widen actions need to be applied in both vector and scalar
cases. All of the sub-cases logically belong as steps in the legalizer
helper, but it will be easier to consider everything at once in custom
lowering.
2020-06-18 17:17:38 -04:00
Matt Arsenault 90a36bbd7c AMDGPU/GlobalISel: Legalize 64-bit G_UDIV/G_UREM
Mostly ported from the DAG version. This results in much worse code
than the DAG version, largely due to a much worse expansion for
G_UMULH.
2020-03-30 10:57:37 -04:00
hsmahesha 3fda1fde8f AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics
Summary: Lower trap and debugtrap intrinsics to AMDGPU machine instruction(s).

Reviewers: arsenm, nhaehnle, kerbowa, cdevadas, t-tye, kzhuravl

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, dstuttard, tpr, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74688
2020-03-05 08:16:57 +05:30
Matt Arsenault 79ff188add AMDGPU/GlobalISel: Legalize G_FPOW
There are few differences from the DAG handling. First, the DAG
handling uses a primitive selection pattern instead of custom
legalizing it. Because of this, this makes use of source modifiers
while the DAG does not.

Also instead of promoting f16, try to use the f16 log/exp. There's no
f16 fmul_legacy, so widen just for the multiply, although I'm not sure
that's the best solution.
2020-02-21 10:31:13 -05:00
Matt Arsenault f742a28ae3 AMDGPU/GlobalISel: Custom lower 32-bit G_SDIV/G_SREM 2020-02-17 15:09:51 -05:00
Matt Arsenault 96db12d507 AMDGPU/GlobalISel: Custom lower 32-bit G_UDIV/G_UREM
AMDGPUCodeGenPrepare expands this most of the time, but not always. We
will always at least need a fallback option here. This is the 3rd
implementation of the same expansion in the backend. Eventually I
would like to eliminate the IR expansion (and the DAG version
obviously).

Currently the new legalizer path produces a better result, since the
IR expansion results in extra operations which need to be combined
out. Notably, the IR expansion results in multiplies by 0.
2020-02-17 11:05:50 -08:00
Matt Arsenault 5aa6e246a1 AMDGPU/GlobalISel: Legalize f64 G_FFLOOR for SI
Use cmp ord instead of cmp_class compared to the DAG version for the
nan check, but mostly try to match the existsing pattern.

I think the sign doesn't matter for fract, so we could do a little
better with the source modifier matching.

I think this is also still broken as in D22898, but I'm leaving it
as-is for now while I don't have an SI system to test on.
2020-02-05 14:32:01 -05:00
Matt Arsenault 69cc9f3046 AMDGPU/GlobalISel: Legalize llvm.amdgcn.s.buffer.load
The 96-bit results need to be widened.

I find the interaction between LegalizerHelper and MIRBuilder somewhat
awkward. The custom legalization is called by the LegalizerHelper, but
then does not have access to the helper. You have to construct a new
helper, which then does not own the MachineIRBuilder, but does modify
it. Maybe custom legalization should be passed the helper?
2020-02-05 12:01:34 -05:00
Matt Arsenault dfa9420f09 AMDGPU/GlobalISel: Don't use legal v2s16 G_BUILD_VECTOR
If we have s_pack_* instructions, legalize this to
G_BUILD_VECTOR_TRUNC from s32 elements. This is closer to how how the
s_pack_* instructions really behave.

If we don't have s_pack_ instructions, expand this by creating a merge
to s32 and bitcasting. This expands to the expected bit operations. I
think this eventually should go in a new bitcast legalize action type
in LegalizerHelper.

We already directly emit the shift operations in RegBankSelect for the
vector case. This could possibly be cleaned up, but I also may want to
defer doing this expansion to selection anyway. I'll see about that
when I try to actually match VOP3P instructions.

This breaks the selection of the build_vector since tablegen doesn't
know how to match G_BUILD_VECTOR_TRUNC yet, so just xfail it for now.
2020-02-05 11:52:18 -05:00
Matt Arsenault 5d2749938c AMDGPU/GlobalISel: Custom lower G_FEXP 2020-02-04 11:50:55 -08:00
Matt Arsenault b21571f4d5 AMDGPU/GlobalISel: Handle s64->s64 G_FPTOSI/G_FPTOUI 2020-01-30 08:46:37 -05:00
Matt Arsenault 8184176efd AMDGPU/GlobalISel: Custom lower G_LOG/G_LOG10
I'm pretty sure this is wrong and we should expand these in a correct
way, but this matches the existing behavior.
2020-01-30 08:38:50 -05:00
Matt Arsenault 872e899b75 AMDGPU/GlobalISel: Legalize unpacked d16 image operations
On targets that don't have the normal packed f16 layout, handle these
during legalization. Directly modify the register types. We can infer
this was a d16 load based on the mem operand size during selection.

A16 operands should possibly be handled here as well, but don't worry
about that yet.
2020-01-30 08:36:11 -05:00
Matt Arsenault c5fffa4da3 GlobalISel: Add observer argument to legalizeIntrinsic
This is passed to legalizeCustom, but not intrinsic. Also remove the
MRI argument, since you can get that from the MachineIRBuilder.

I'm not sure why MachineIRBuilder has a private observer member, and
this is passed separately.
2020-01-29 18:33:45 -05:00
Matt Arsenault c3075e6171 AMDGPU/GlobalISel: Select buffer atomics
The cmpswap handling is incomplete and fails to select.
2020-01-27 15:16:44 -05:00
Matt Arsenault 533d650e94 AMDGPU/GlobalISel: Move llvm.amdgcn.raw.buffer.store handling
Treat this the same way as loads. There's less value to the
intermediate nodes, but it's good to be consistent.
2020-01-27 14:59:30 -05:00
Matt Arsenault 09ed0e44d9 AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.load 2020-01-27 13:40:37 -05:00
Matt Arsenault ce7ca2caf2 AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.load 2020-01-27 13:05:55 -05:00
Matt Arsenault fc90222a91 AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load
Use intermediate instructions, unlike with buffer stores. This is
necessary because of the need to have an internal way to distinguish
between signed and unsigned extloads. This introduces some duplication
and near duplication with the buffer store selection path. The store
handling should maybe be moved into legalization to match and
eliminate the duplication.
2020-01-27 12:49:23 -05:00
Matt Arsenault a1d33ce73a AMDGPU/GlobalISel: Custom legalize v2s16 G_SHUFFLE_VECTOR
Try to keep simple v2s16 cases as-is. This will more naturally map to
how the VOP3P op_sel modifiers work compared to the expansion
involving bitcasts and bitshifts.

This could maybe try harder with wider source vector types, although
that could be handled with a pre-legalize combine.
2020-01-27 08:28:05 -08:00
Matt Arsenault a722cbf77c AMDGPU/GlobalISel: Handle atomic_inc/atomic_dec
The intermediate instruction drops the extra volatile argument. We are
missing an atomic ordering on these.
2020-01-22 09:26:17 -05:00
Austin Kerbow f3225f2abe AMDGPU/GlobalISel: Legalize FDIV64
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70403
2019-11-19 21:02:27 -08:00
Austin Kerbow 2b88b344f2 AMDGPU/GlobalISel: Legalize FDIV32
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69581
2019-10-29 17:18:06 -07:00
Matt Arsenault 171cf5302f AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHG
Custom lower this to a target instruction with the merge operands. I
think it might be better to directly select this and emit a
REG_SEQUENCE, but this would be more work since it would require
splitting the tablegen patterns for these cases from the other
atomics.
2019-10-25 13:11:09 -07:00
Austin Kerbow c35b358b74 AMDGPU/GlobalISel: Legalize FDIV16
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69347
2019-10-25 11:07:17 -07:00