Commit Graph

331 Commits

Author SHA1 Message Date
Matt Arsenault 678da7b109 AMDGPU/GlobalISel: Remove leftover #if 0
The subtarget feature used to be missing from subtargets, but that was
fixed.
2020-03-19 20:07:05 -04:00
Matt Arsenault ea4597eef1 Reapply "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize"
This reverts commit 9bca8fc4cf.

Rearrange handling to avoid changing the instruction in the case where
it's going to be erased and replaced with undef.
2020-03-18 12:01:22 -04:00
Vitaly Buka 9bca8fc4cf Revert "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize"
The patch introduced use-after-poison.

This reverts commit d0fe13ecf9.
2020-03-17 22:04:14 -07:00
Matt Arsenault 039c917b43 AMDGPU/GlobalISel: Fix asserting on gather4 intrinsics 2020-03-17 11:07:30 -04:00
Matt Arsenault d0fe13ecf9 AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize
For normal loads, fully eliminate the load. For the TFE case, adjust
the dmask value in the instruction so the selector doesn't need to
handle it. For the TFE special case, I guess it would be possible to
replace the loaded data register with undef, but as-is this will start
treating it as a well defined value.
2020-03-17 10:15:30 -04:00
Matt Arsenault d9a012ed8a AMDGPU/GlobalISel: Adjust image load register type based on dmask
Trim elements that won't be written. The equivalent still needs to be
done for writes. Also start widening 3 elements to 4
elements. Selection will get the count from the dmask.
2020-03-17 10:09:18 -04:00
Matt Arsenault 83ffbf2618 AMDGPU/GlobalISel: Legalize non-a16 non-NSA images 2020-03-17 10:02:09 -04:00
Matt Arsenault 2aba9b6cf8 AMDGPU/GlobalISel: Legalize a16 images
Pack the address registers in the legalizer. Avoid introducing a huge
family of new intermediate operations by filling dead operands with
noreg.
2020-03-17 10:02:09 -04:00
Matt Arsenault 57d896e838 AMDGPU/GlobalISel: Make some large merges legal
We allow up to 1024-bit registers, so we should support merges all the
way to the maximum.
2020-03-16 10:49:10 -04:00
Matt Arsenault bb8622094d AMDGPU: Don't handle kernarg.segment.ptr in functions
Just lower this to null. Pass implicitarg.ptr in its place in the
argument list.
2020-03-13 12:51:12 -07:00
Matt Arsenault 1e0c540360 AMDGPU: Don't hard error on LDS globals in functions
Instead, emit a trap and a warning. We force inlining of this
situation, so any function where this happens should be dead as
indirect or external calls are not yet supported. This should avoid
erroring on dead code.
2020-03-11 15:34:11 -04:00
Matt Arsenault edd0dfca0d AMDGPU/GlobalISel: Refine G_TRUNC legality rules
Scalarize most truncates. Avoid touching cases that could end up in
unresolvable infinite loops.
2020-03-10 15:32:22 -07:00
Matt Arsenault ce8a1f7294 GlobalISel: Implement fewerElementsVector for G_TRUNC
Extend fewerElementsVectorBasic to handle operands with different
element types.
2020-03-10 15:17:20 -07:00
hsmahesha 3fda1fde8f AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics
Summary: Lower trap and debugtrap intrinsics to AMDGPU machine instruction(s).

Reviewers: arsenm, nhaehnle, kerbowa, cdevadas, t-tye, kzhuravl

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, dstuttard, tpr, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74688
2020-03-05 08:16:57 +05:30
Matt Arsenault 86e13ec194 AMDGPU/GlobalISel: Use packed for G_ADD/G_SUB/G_MUL v2s16 2020-02-25 11:20:35 -05:00
Jay Foad 33cbd5ee08 AMDGPU/GlobalISel: Legalize s64 min/max by lowering
Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75108
2020-02-25 16:00:43 +00:00
Jay Foad 0ed4744bb5 AMDGPU/GlobalISel: Lower 64-bit uaddo/usubo
Summary: Add more test cases for signed and unsigned add/sub with overflow.

Reviewers: arsenm, rampitec, kerbowa

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75051
2020-02-24 23:08:14 +00:00
Matt Arsenault 72eef820d5 AMDGPU/GlobalISel: Select G_SHUFFLE_VECTOR
G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel
for VOP3P instructions. Expand it in some other way in case it doesn't
fold into the use instructions.
2020-02-21 13:35:40 -05:00
Matt Arsenault 79ff188add AMDGPU/GlobalISel: Legalize G_FPOW
There are few differences from the DAG handling. First, the DAG
handling uses a primitive selection pattern instead of custom
legalizing it. Because of this, this makes use of source modifiers
while the DAG does not.

Also instead of promoting f16, try to use the f16 log/exp. There's no
f16 fmul_legacy, so widen just for the multiply, although I'm not sure
that's the best solution.
2020-02-21 10:31:13 -05:00
Matt Arsenault 37c452a289 AMDGPU/GlobalISel: Adjust branch target when lowering loop intrinsic
This needs to steal the branch target like the other control flow
intrinsics.
2020-02-18 06:35:40 -08:00
Matt Arsenault f742a28ae3 AMDGPU/GlobalISel: Custom lower 32-bit G_SDIV/G_SREM 2020-02-17 15:09:51 -05:00
Matt Arsenault e240b27d6d AMDGPU/GlobalISel: Allow arbitrary global values
Treat unknown address spaces as global
2020-02-17 11:32:28 -08:00
Matt Arsenault 96db12d507 AMDGPU/GlobalISel: Custom lower 32-bit G_UDIV/G_UREM
AMDGPUCodeGenPrepare expands this most of the time, but not always. We
will always at least need a fallback option here. This is the 3rd
implementation of the same expansion in the backend. Eventually I
would like to eliminate the IR expansion (and the DAG version
obviously).

Currently the new legalizer path produces a better result, since the
IR expansion results in extra operations which need to be combined
out. Notably, the IR expansion results in multiplies by 0.
2020-02-17 11:05:50 -08:00
Michael Liao 487fcc8d3d Fix `-Wpedantic` warning. NFC. 2020-02-17 00:18:01 -05:00
Matt Arsenault 295bbea3ed AMDGPU/GlobalISel: Fix non-power-of-2 G_SITOFP/G_UITOFP
This wouldn't work for s33-s63 sources.
2020-02-16 22:48:57 -05:00
Matt Arsenault 044d40ed46 AMDGPU/GlobalISel: Move lambdas to normal function
These aren't using any local state
2020-02-16 22:48:32 -05:00
Matt Arsenault 60fea2713d AMDGPU/GlobalISel: Improve 16-bit bswap
Match the new DAG behavior and use v_perm_b32 when available. Also
does better on SI/CI by expanding 16-bit swaps. Also fix
non-power-of-2 cases.
2020-02-14 15:57:39 -08:00
Matt Arsenault bfbfa18591 GlobalISel: Lower s64->s16 G_FPTRUNC
This is more or less directly ported from the AMDGPU custom lowering
for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES
instead of creating shift/trunc to extract the two halves, and zexting
an inverted compare instead of select_cc).

This also does not include the fast math expansion the DAG which
converts to f32 and then to f16. I think that belongs in a
pre-legalize combine instead.
2020-02-14 10:46:58 -08:00
Matt Arsenault a257bde420 AMDGPU/GlobalISel: Handle G_BSWAP 2020-02-14 09:09:44 -08:00
Matt Arsenault 5adbf7d57f AMDGPU/GlobalISel: Make G_TRUNC legal
This is required to be legal. I'm not sure how we were getting away
without defining any rules for it.
2020-02-13 15:25:52 -05:00
Matt Arsenault fa61e200e5 AMDGPU/GlobalISel: Widen non-power-of-2 load results
Load extra bits if suitably aligned. This allows using widened
3-vector loads on SI, and fixes legalization for <9 x s32> (which LSV
apparently forms frequently on lowered kernel argument lists).

Fix incorrectly treating these as legal on SI. This should emit a
64-bit store and a 32-bit store.

I think all of the load and store rules are just about complete, but
due for a rewrite.
2020-02-12 09:35:10 -05:00
Matt Arsenault f4a38c114e AMDGPU/GlobalISel: Look through casts when legalizing vector indexing
We were failing to find constants that were casted. I feel like the
artifact combiner should have folded the constant in the trunc before
the custom lowering, but that doesn't happen.
2020-02-09 18:02:10 -05:00
Petar Avramovic 7df5fc9e03 [GlobalISel] Add buildMerge with SrcOp initializer list
Allows more flexible use of buildMerge in places where
use operands are available as SrcOp since it does not
require explicit conversion to Register.
Simplify code with new buildMerge.

Differential Revision: https://reviews.llvm.org/D74223
2020-02-07 18:43:45 +01:00
Matt Arsenault 8de2dad9e0 GlobalISel: Fix lowering of G_CTLZ/G_CTTZ
The type passed to lower was invalid, so I'm not sure how this was
even working before. The source and destination type also do not have
to match, so make sure to use the right ones.
2020-02-07 06:54:12 -08:00
Matt Arsenault 6a570dc548 AMDGPU/GlobalISel: Fix non-pow-2 add/sub/mul for 16-bit insts
These wouldn't legalize between 16-bits and 32-bits on targets with
16-bit instructions.
2020-02-06 21:43:54 -05:00
Matt Arsenault baafe82b07 AMDGPU/GlobalISel: Remove bitcast legality hack 2020-02-05 16:24:24 -05:00
Matt Arsenault 364326ce66 AMDGPU/GlobalISel: Add mem operand to s.buffer.load intrinsic
Really the intrinsic definition is wrong, but work around this
here. The DAG lowering introduces an MMO. We have to introduce a new
operation to avoid the verifier complaining about the missing mayLoad.
2020-02-05 15:04:42 -05:00
Matt Arsenault 5aa6e246a1 AMDGPU/GlobalISel: Legalize f64 G_FFLOOR for SI
Use cmp ord instead of cmp_class compared to the DAG version for the
nan check, but mostly try to match the existsing pattern.

I think the sign doesn't matter for fract, so we could do a little
better with the source modifier matching.

I think this is also still broken as in D22898, but I'm leaving it
as-is for now while I don't have an SI system to test on.
2020-02-05 14:32:01 -05:00
Matt Arsenault 7bffa97285 AMDGPU/GlobalISel: Prefer merge/unmerge ops to legalize TFE
These have a better chance of combining with other operations and are
currently much better supported than G_EXTRACT.
2020-02-05 12:56:10 -05:00
Matt Arsenault e65e6d052e AMDGPU/GlobalISel: Legalize TFE image result loads
Rewrite the result register pair into the expected sinigle register
format in the legalizer.

I'm also operating under the assumption that TFE doesn't apply to
stores or atomics, but don't know if this is true or not.
2020-02-05 12:40:20 -05:00
Jordan Rupprecht 9f507bfd8d NFC: fix unused var warnings in no-assert builds 2020-02-05 09:26:59 -08:00
Matt Arsenault 69cc9f3046 AMDGPU/GlobalISel: Legalize llvm.amdgcn.s.buffer.load
The 96-bit results need to be widened.

I find the interaction between LegalizerHelper and MIRBuilder somewhat
awkward. The custom legalization is called by the LegalizerHelper, but
then does not have access to the helper. You have to construct a new
helper, which then does not own the MachineIRBuilder, but does modify
it. Maybe custom legalization should be passed the helper?
2020-02-05 12:01:34 -05:00
Matt Arsenault dfa9420f09 AMDGPU/GlobalISel: Don't use legal v2s16 G_BUILD_VECTOR
If we have s_pack_* instructions, legalize this to
G_BUILD_VECTOR_TRUNC from s32 elements. This is closer to how how the
s_pack_* instructions really behave.

If we don't have s_pack_ instructions, expand this by creating a merge
to s32 and bitcasting. This expands to the expected bit operations. I
think this eventually should go in a new bitcast legalize action type
in LegalizerHelper.

We already directly emit the shift operations in RegBankSelect for the
vector case. This could possibly be cleaned up, but I also may want to
defer doing this expansion to selection anyway. I'll see about that
when I try to actually match VOP3P instructions.

This breaks the selection of the build_vector since tablegen doesn't
know how to match G_BUILD_VECTOR_TRUNC yet, so just xfail it for now.
2020-02-05 11:52:18 -05:00
Matt Arsenault 05f2a04ba7 AMDGPU/GlobalISel: Legalize G_SEXT_INREG
Split the VALU 64-bit case in RegBankSelect.
2020-02-04 13:23:53 -08:00
Matt Arsenault 9b0ce8edfa AMDGPU/GlobalISel: Remove extension legality hacks
The legalization has improved since this was added, and the tests
relying on this no longer need it.
2020-02-04 12:50:47 -08:00
Matt Arsenault 5d2749938c AMDGPU/GlobalISel: Custom lower G_FEXP 2020-02-04 11:50:55 -08:00
Matt Arsenault b461436d01 AMDGPU/GlobalISel: Legalize s16 G_FEXP2 2020-02-04 11:50:55 -08:00
Matt Arsenault 1024b73ef5 AMDGPU: Split denormal mode tracking bits
Prepare to accurately track the future denormal-fp-math attribute
changes. The way to actually set these separately is not wired in yet.

This is just a mechanical change, and mostly still assumes the input
and output mode match. This should be refined for some cases. For
example, fcanonicalize lowering should use the flushing variant if
either input or output flushing is enabled
2020-02-04 10:44:21 -08:00
Matt Arsenault cd7650c186 GlobalISel: Implement fewerElementsVector for G_SEXT_INREG
Start using a new strategy with a combination of merge and unmerges.

This allows scalarizing before lowering, which in cases like
<2 x s128> avoids producing giant illegal shifts.
2020-02-03 11:47:33 -08:00
Matt Arsenault c0b12916a7 AMDGPU/GlobalISel: Use more wide vector load/stores
This improves the type breakdown for some large vectors. For example,
we now get a <4 x s32> and s32 store instead of 5 s32 stores for
<5 x s32>.
2020-02-01 10:47:21 -05:00