Commit Graph

1758 Commits

Author SHA1 Message Date
Matt Arsenault 9eb3dda0b2 MachineVerifier: Fix assert on implicit virtreg use
If the liveness of a physical register was invalid, this
was attempting to iterate the subregisters of all register
uses of the instruction, which would assert when it
encountered an implicit virtual register operand.

llvm-svn: 340763
2018-08-27 17:40:09 +00:00
Tim Renouf 904343f879 [AMDGPU] Add support for multi-dword s.buffer.load intrinsic
Summary:
Patch by Marek Olsak and David Stuttard, both of AMD.

This adds a new amdgcn intrinsic supporting s.buffer.load, in particular
multiple dword variants. These are convenient to use from some front-end
implementations.

Also modified the existing llvm.SI.load.const intrinsic to common up the
underlying implementation.

This modification also requires that we can lower to non-uniform loads correctly
by splitting larger dword variants into sizes supported by the non-uniform
versions of the load.

V2: Addressed minor review comments.
V3: i1 glc is now i32 cachepolicy for consistency with buffer and
    tbuffer intrinsics, plus fixed formatting issue.
V4: Added glc test.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D51098

Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b
llvm-svn: 340684
2018-08-25 14:53:17 +00:00
Matt Arsenault 5b9ef39bdd DAG: Allow matching fminnum/fmaxnum from vselect
llvm-svn: 340655
2018-08-24 21:24:18 +00:00
Tim Renouf 4be70ba94a [RegisterCoalescer] Fix for assert in removePartialRedundancy
Summary:
I got "Use not jointly dominated by defs" when removePartialRedundancy
attempted to prune then re-extend a subrange whose only liveness was a
dead def at the copy being removed.

V2: Removed junk from test. Improved comment.
V3: Addressed minor review comments.

Subscribers: MatzeB, qcolombet, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D50914

Change-Id: I6f894e9f517f71e921e0c6d81d28c5f344db8dad
llvm-svn: 340549
2018-08-23 17:28:33 +00:00
Samuel Pitoiset 7bd9dcffcd AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space
32-bit constant address space is declared as 6, so the
maximum number of address spaces is 6, not 5.

Fixes "LLVM ERROR: Pointer address space out of range".

v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS
v4: - fix compilation issues
    - fix out of bounds access
v3: use static_assert()
v2: add a very simple test for 32-bit addr space

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630
llvm-svn: 340417
2018-08-22 16:08:48 +00:00
Samuel Pitoiset d81d6f7d58 AMDGPU: fix existing alias rules for constant and global
Constant and global may alias, also one rules table wasn't
ordered correctly.

Pinpointed by Matt.

v2: add a test with swapped parameters
llvm-svn: 340416
2018-08-22 16:08:43 +00:00
Matt Arsenault bb8e64e7f5 AMDGPU: Fix not respecting byval alignment in call frame setup
This was hackily adding in the 4-bytes reserved for the callee's
emergency stack slot. Treat it like a normal stack allocation
so we get the correct alignment padding behavior. This fixes
an inconsistency between the caller and callee.

llvm-svn: 340396
2018-08-22 11:09:45 +00:00
Farhana Aleen 3528c80378 [AMDGPU] Support idot2 pattern.
Summary: Transform add (mul ((i32)S0.x, (i32)S1.x),

         add( mul ((i32)S0.y, (i32)S1.y), (i32)S3) => i/udot2((v2i16)S0, (v2i16)S1, (i32)S3)

Author: FarhanaAleen

Reviewed By: arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D50024

llvm-svn: 340295
2018-08-21 16:21:15 +00:00
Tim Renouf bb5ee41ab4 [AMDGPU] Allow int types for MUBUF vdata
Summary:
Previously the new llvm.amdgcn.raw/struct.buffer.load/store intrinsics
only allowed float types for the data to be loaded or stored, which
sometimes meant the frontend needed to generate a bitcast. In this, the
new intrinsics copied the old buffer intrinsics.

This commit extends the new intrinsics to allow int types as well.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D50315

Change-Id: I8202af2d036455553681dcbb3d7d32ae273f8f85
llvm-svn: 340270
2018-08-21 11:08:12 +00:00
Tim Renouf 4f703f5e11 [AMDGPU] New buffer intrinsics
Summary:
This commit adds new intrinsics
  llvm.amdgcn.raw.buffer.load
  llvm.amdgcn.raw.buffer.load.format
  llvm.amdgcn.raw.buffer.load.format.d16
  llvm.amdgcn.struct.buffer.load
  llvm.amdgcn.struct.buffer.load.format
  llvm.amdgcn.struct.buffer.load.format.d16
  llvm.amdgcn.raw.buffer.store
  llvm.amdgcn.raw.buffer.store.format
  llvm.amdgcn.raw.buffer.store.format.d16
  llvm.amdgcn.struct.buffer.store
  llvm.amdgcn.struct.buffer.store.format
  llvm.amdgcn.struct.buffer.store.format.d16
  llvm.amdgcn.raw.buffer.atomic.*
  llvm.amdgcn.struct.buffer.atomic.*

with the following changes from the llvm.amdgcn.buffer.*
intrinsics:

* there are separate raw and struct versions: raw does not have an
  index arg and sets idxen=0 in the instruction, and struct always sets
  idxen=1 in the instruction even if the index is 0, to allow for the
  fact that gfx9 does bounds checking differently depending on whether
  idxen is set;

* there is a combined cachepolicy arg (glc+slc)

* there are now only two offset args: one for the offset that is
  included in bounds checking and swizzling, to be split between the
  instruction's voffset and immoffset fields, and one for the offset
  that is excluded from bounds checking and swizzling, to go into the
  instruction's soffset field.

The AMDISD::BUFFER_* SD nodes always have an index operand, all three
offset operands, combined cachepolicy operand, and an extra idxen
operand.

The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D50306

Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205
llvm-svn: 340269
2018-08-21 11:07:10 +00:00
Tim Renouf 35484c9d50 [AMDGPU] New tbuffer intrinsics
Summary:
This commit adds new intrinsics
  llvm.amdgcn.raw.tbuffer.load
  llvm.amdgcn.struct.tbuffer.load
  llvm.amdgcn.raw.tbuffer.store
  llvm.amdgcn.struct.tbuffer.store

with the following changes from the llvm.amdgcn.tbuffer.* intrinsics:

* there are separate raw and struct versions: raw does not have an index
  arg and sets idxen=0 in the instruction, and struct always sets
  idxen=1 in the instruction even if the index is 0, to allow for the
  fact that gfx9 does bounds checking differently depending on whether
  idxen is set;

* there is a combined format arg (dfmt+nfmt)

* there is a combined cachepolicy arg (glc+slc)

* there are now only two offset args: one for the offset that is
  included in bounds checking and swizzling, to be split between the
  instruction's voffset and immoffset fields, and one for the offset
  that is excluded from bounds checking and swizzling, to go into the
  instruction's soffset field.

The AMDISD::TBUFFER_* SD nodes always have an index operand, all three
offset operands, combined format operand, combined cachepolicy operand,
and an extra idxen operand.

The tbuffer pseudo- and real instructions now also have a combined
format operand.

The obsolescent llvm.amdgcn.tbuffer.* and llvm.SI.tbuffer.store
intrinsics continue to work.

V2: Separate raw and struct intrinsics.
V3: Moved extract_glc and extract_slc defs to a more sensible place.
V4: Rebased on D49995.
V5: Only two separate offset args instead of three.
V6: Pseudo- and real instructions have joint format operand.
V7: Restored optionality of dfmt and nfmt in assembler.
V8: Addressed minor review comments.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49026

Change-Id: If22ad77e349fac3a5d2f72dda53c010377d470d4
llvm-svn: 340268
2018-08-21 11:06:05 +00:00
Vitaly Buka 30b5ed3eb7 Revert "AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space"
As it introduces out of bound access.

This reverts commit r340172 and r340171

llvm-svn: 340202
2018-08-20 19:31:03 +00:00
Samuel Pitoiset c95ef77d37 AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space
32-bit constant address space is declared as 6, so the
maximum number of address spaces is 6, not 5.

Fixes "LLVM ERROR: Pointer address space out of range".

v3: use static_assert()
v2: add a very simple test for 32-bit addr space

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
llvm-svn: 340171
2018-08-20 13:18:59 +00:00
Matt Arsenault 7121bed210 AMDGPU: Custom lower fexp
This will allow the library to just use __builtin_expf directly
without expanding this itself. Note f64 still won't work because
there is no exp instruction for it.

llvm-svn: 339902
2018-08-16 17:07:52 +00:00
Matt Arsenault f533e6b0ed AMDGPU: Fold fneg into fmed3
llvm-svn: 339821
2018-08-15 21:46:27 +00:00
Matt Arsenault a816073764 AMDGPU: Improve extract_vector_elt reduction combine
Handle fmul, fsub and preserve flags.

Also really test minnum/maxnum reductions.
The existing tests were only checking from
minnum/maxnum matched from a fast math compare
and select which is not the same.

llvm-svn: 339820
2018-08-15 21:34:06 +00:00
Matt Arsenault b3a80e5397 AMDGPU: Implement llvm.amdgcn.icmp/fcmp for i16/f16
Also support these on targets without support for these,
since it will allow us to freely create these in instcombine.

llvm-svn: 339819
2018-08-15 21:25:20 +00:00
Matt Arsenault 6c7ba82900 AMDGPU: Address todo for handling 1/(2 pi)
llvm-svn: 339814
2018-08-15 21:03:55 +00:00
Scott Linder 35213793bc [CodeGen] Fix assert in SelectionDAG::computeKnownBits
Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR
when zero extending the demanded elements mask if it is already as long as the
source vector.

Differential Revision: https://reviews.llvm.org/D49574

llvm-svn: 339600
2018-08-13 18:44:21 +00:00
Matt Arsenault 3763f307bd AMDGPU: Cleanup min/max legacy tests
Also add some more tests in preparation for
a future patch.

llvm-svn: 339526
2018-08-12 19:29:53 +00:00
Matt Arsenault 1201301b94 DAG: Check no-signed-zeros instead of unsafe-fp-math
Addresses fixme, although this should still be checking individual
operand flags.

llvm-svn: 339525
2018-08-12 19:09:12 +00:00
Matt Arsenault 13b0db9285 AMDGPU: Check NSZ MI flag when folding omod
I'm not sure the exact nsz flag combination that
is OK. I think as long as it's on either, this is OK.
For now just check it on the omod multiply.

llvm-svn: 339513
2018-08-12 08:44:25 +00:00
Matt Arsenault b5acec1f79 AMDGPU: Use splat vectors for undefs when folding canonicalize
If one of the elements is undef, use the canonicalized constant
from the other element instead of 0.

Splat vectors are more useful for other optimizations, such
as matching vector clamps. This was breaking on clamps
of half3 from the undef 4th component.

llvm-svn: 339512
2018-08-12 08:42:54 +00:00
Matt Arsenault 3ead7d7389 AMDGPU: Fix packing undef parts of build_vector
llvm-svn: 339511
2018-08-12 08:42:46 +00:00
Tom Stellard 8adc86a7dc AMDGPU/GlobalISel: Define instruction mapping for G_INSERT
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49625

llvm-svn: 339491
2018-08-11 00:51:54 +00:00
Matt Arsenault 940e6075e4 AMDGPU: More canonicalized operations
llvm-svn: 339464
2018-08-10 19:20:17 +00:00
Matt Arsenault 3dcf4ce435 AMDGPU: Combine and of seto/setuo and fp_class
Clear the nan (or non-nan) test bits from the mask.

llvm-svn: 339462
2018-08-10 18:58:56 +00:00
Matt Arsenault 8ad00d30fa AMDGPU: Match isfinite pattern to class instructions
llvm-svn: 339460
2018-08-10 18:58:41 +00:00
Matt Arsenault 935f3b70fe AMDGPU: Error more gracefully on libcalls
I think this is the only situation where the callsite
will have a null instruction.

llvm-svn: 339271
2018-08-08 16:58:39 +00:00
Matt Arsenault e719139b10 AMDGPU: Fix shifts for i128
llvm-svn: 339270
2018-08-08 16:58:33 +00:00
Jan Vesely 7b2c98ab59 AMDGPU: Remove broken i16 ternary patterns
Fixup test to check for GCN prefix
These patterns always zero extend the result even though it might need sign extension.
This has been broken since the addition of i16 support.
It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence:
        v_mad_i16 v2, v10, v9, v11
        v_med3_i32 v2, v2, v8, v7

Fixes mad_sat(char) piglit on VI.

Differential Revision: https://reviews.llvm.org/D49836

llvm-svn: 339190
2018-08-07 21:54:37 +00:00
Matt Arsenault 08f3fe4fae AMDGPU: cvt_pk_rtz_f16 canonicalizes
llvm-svn: 339078
2018-08-06 23:01:31 +00:00
Matt Arsenault e94ee833f9 AMDGPU: Handle some vector operations in isCanonicalized
llvm-svn: 339077
2018-08-06 22:45:51 +00:00
Matt Arsenault a29e76244a AMDGPU: Push fcanonicalize through partially constant build_vector
This usually avoids some re-packing code, and may
help find canonical sources.

llvm-svn: 339072
2018-08-06 22:30:44 +00:00
Matt Arsenault d49ab0b214 AMDGPU: Treat more custom operations as canonicalizing
Everything should quiet, and I think everything should
flush.

I assume the min3/med3/max3 follow the same rules
as regular min/max for flushing, which should at
least be conservatively correct.

There are still more operations that need to
be handled.

llvm-svn: 339065
2018-08-06 21:58:11 +00:00
Matt Arsenault ce6d61fba8 AMDGPU: Conversions always produce canonical results
Not sure why this was checking for denormals for f16.
My interpretation of the IEEE standard is conversions
should produce a canonical result, and the ISA manual
says denormals are created when appropriate.

llvm-svn: 339064
2018-08-06 21:51:52 +00:00
Matt Arsenault f8768bfc84 AMDGPU: Fix implementation of isCanonicalized
If denormals are enabled, denormals are canonical.
Also fix a few other issues. minnum/maxnum are supposed
to canonicalize. Temporarily improve workaround for the
instruction behavior change in gfx9.

Handle selects and fcopysign.

The tests were also largely broken, since they were
checking for a flush used on some targets after the
store of the result.

llvm-svn: 339061
2018-08-06 21:38:27 +00:00
Matt Arsenault 0d1b3934e2 AMDGPU: Fold v_lshl_or_b32 with 0 src0
Appears from expansion of some packed cases.

llvm-svn: 339025
2018-08-06 15:40:20 +00:00
Matt Arsenault dbf77c5b41 AMDGPU: Rename check prefixes in test
Will avoid noisy diff in future change.

llvm-svn: 339022
2018-08-06 15:16:12 +00:00
Matt Arsenault c3dc8e65e2 DAG: Enhance isKnownNeverNaN
Add a parameter for testing specifically for
sNaNs - at least one instruction pattern on AMDGPU
needs to check specifically for this.

Also handle more cases, and add a target hook
for custom nodes, similar to the hooks for known
bits.

llvm-svn: 338910
2018-08-03 18:27:52 +00:00
Tim Renouf abd85fb1f5 [AMDGPU] Reworked SIFixWWMLiveness
Summary:
I encountered some problems with SIFixWWMLiveness when WWM is in a loop:

1. It sometimes gave invalid MIR where there is some control flow path
   to the new implicit use of a register on EXIT_WWM that does not pass
   through any def.

2. There were lots of false positives of registers that needed to have
   an implicit use added to EXIT_WWM.

3. Adding an implicit use to EXIT_WWM (and adding an implicit def just
   before the WWM code, which I tried in order to fix (1)) caused lots
   of the values to be spilled and reloaded unnecessarily.

This commit is a rework of SIFixWWMLiveness, with the following changes:

1. Instead of considering any register with a def that can reach the WWM
   code and a def that can be reached from the WWM code, it now
   considers three specific cases that need to be handled.

2. A register that needs liveness over WWM to be synthesized now has it
   done by adding itself as an implicit use to defs other than the
   dominant one.

Also added the following fixmes:

FIXME: We should detect whether a register in one of the above
categories is already live at the WWM code before deciding to add the
implicit uses to synthesize its liveness.

FIXME: I believe this whole scheme may be flawed due to the possibility
of the register allocator doing live interval splitting.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46756

Change-Id: Ie7fba0ede0378849181df3f1a9a7a39ed1a94a94
llvm-svn: 338783
2018-08-02 23:31:32 +00:00
Tim Renouf f1c7b92a6a [AMDGPU] Avoid using divergent value in mubuf addr64 descriptor
Summary:
This fixes a problem where a load from global+idx generated incorrect
code on <=gfx7 when the index is divergent.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47383

Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed
llvm-svn: 338779
2018-08-02 22:53:57 +00:00
Matt Arsenault 1f3977a856 DAG: Fix vector widening fcanonicalize
llvm-svn: 338715
2018-08-02 13:43:53 +00:00
Matt Arsenault 36cdcfadcf AMDGPU: Fix scalarizing v4f16 fcanonicalize
llvm-svn: 338714
2018-08-02 13:43:42 +00:00
Matt Arsenault 709374d186 AMDGPU: Improve hack for packing conversion ops
Mutate the node type during selection when it
doesn't matter. This avoids an intermediate bitcast
node on targets with legal i16/f16.

Also fixes missing output modifiers on v_cvt_pkrtz_f32_f16,
which I assume are OK.

llvm-svn: 338619
2018-08-01 20:13:58 +00:00
Matt Arsenault 55ab9213d3 AMDGPU: Partially fix handling of packed amdgpu_ps arguments
Fixes annoying limitations when writing tests.
Also remove more leftover code for manually scalarizing arguments
and return values.

llvm-svn: 338618
2018-08-01 19:57:34 +00:00
Jan Vesely 93b252799b AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS
Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8)

llvm-svn: 338610
2018-08-01 18:36:07 +00:00
Ryan Taylor 894c8fd0e2 [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero
Summary:
Add _L to _LZ image intrinsic table mapping to table gen.
In ISelLowering check if image intrinsic has lod and if it's equal
to zero, if so remove lod and change opcode to equivalent mapped _LZ.

Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71

Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49483

llvm-svn: 338523
2018-08-01 12:12:01 +00:00
Konstantin Zhuravlyov bb30ef7af4 AMDGPU: Add clamp bit to dot intrinsics
Differential Revision: https://reviews.llvm.org/D49874

llvm-svn: 338470
2018-08-01 01:31:30 +00:00
Matt Arsenault 118c47b6d1 AMDGPU: Split amdgcn/r600 fminnum/fmaxnum tests
R600 breaks on too many things to usefully test changes
with ieee_mode on vs. off.

llvm-svn: 338435
2018-07-31 20:38:42 +00:00