Commit Graph

202 Commits

Author SHA1 Message Date
Matt Arsenault 61f8ba8b79 AMDGPU: s_setpc_b64 should be an indirect branch
llvm-svn: 278278
2016-08-10 19:20:02 +00:00
Matt Arsenault c6b1350039 AMDGPU: Set sizes on control flow pseudos
llvm-svn: 278276
2016-08-10 19:11:51 +00:00
Matt Arsenault 57431c9680 AMDGPU: Change insertion point of si_mask_branch
Insert before the skip branch if one is created.
This is a somewhat more natural placement relative
to the skip branches, and makes it possible to implement
analyzeBranch for skip blocks.

The test changes are mostly due to a quirk where
the block label is not emitted if there is a terminator
that is not also a branch.

llvm-svn: 278273
2016-08-10 19:11:42 +00:00
Nicolai Haehnle 8a482b33fe AMDGPU: Stay in WQM for non-intrinsic stores
Summary:
Two types of stores are possible in pixel shaders: stores to memory that are
explicitly requested at the API level, and stores that are an implementation
detail of register spilling or lowering of arrays.

For the first kind of store, we must ensure that helper pixels have no effect
and hence WQM must be disabled. The second kind of store must always be
executed, because the written value may be loaded again in a way that is
relevant for helper pixels as well -- and there are no externally visible
effects anyway.

This is a candidate for the 3.9 release branch.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D22675

llvm-svn: 277504
2016-08-02 19:31:14 +00:00
Valery Pykhtin 902db3101b [AMDGPU] refactor DS instruction definitions. NFC.
Differential revision: https://reviews.llvm.org/D22522

llvm-svn: 277344
2016-08-01 14:21:30 +00:00
Matt Arsenault d2141b6030 AMDGPU: Set s_setpc_b64 as a terminator
llvm-svn: 277259
2016-07-30 01:40:34 +00:00
Wei Ding 07e03712d3 AMDGPU : Add intrinsics for compare with the full wavefront result
Differential Revision: http://reviews.llvm.org/D22482

llvm-svn: 276998
2016-07-28 16:42:13 +00:00
Nicolai Haehnle 3b572002a2 AMDGPU: add execfix flag to SI_ELSE
Summary:
SI_ELSE is lowered into two parts:

s_or_saveexec_b64 dst, src (at the start of the basic block)

s_xor_b64 exec, exec, dst (at the end of the basic block)

The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction

s_and_b64 exec, exec, s[...]

which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:

s_or_savexec_b64 dst, src

s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow

s_xor_b64 exec, exec, dst

Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.

Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit

s_or_saveexec_b64 dst, src

...

s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst

and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions

s_and_b64 vcc, exec, vcc

before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.

In any case, this current patch could help in the short term with the whole
ExecModified business.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22846

llvm-svn: 276972
2016-07-28 11:39:24 +00:00
Matt Arsenault c6b69a9114 AMDGPU: Use implicit_def for selecting anyext
llvm-svn: 276819
2016-07-26 23:06:33 +00:00
Matt Arsenault 32fc527c65 AMDGPU: Add fp legacy instruction intrinsics
This could use some additional optimization work
to use mad/mac legacy.

llvm-svn: 276764
2016-07-26 16:45:45 +00:00
Matt Arsenault f9245b75c0 AMDGPU: Delete more dead code
Remove dead code from r600 intrinsic removal.
Remove unset members, rename StackSize to be less ambiguous.

llvm-svn: 276436
2016-07-22 17:01:25 +00:00
Matt Arsenault 7fb961f3e6 AMDGPU: Fix i1 fp_to_int
R600's i1 fp_to_uint selected but was incorrect according to
what instcombine constant folds to.

llvm-svn: 276435
2016-07-22 17:01:21 +00:00
Matt Arsenault 03006fd3c4 AMDGPU: Only use legal inline immediates with kill pseudo
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.

These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.

llvm-svn: 275988
2016-07-19 16:27:56 +00:00
Matt Arsenault cb540bc03c AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

llvm-svn: 275934
2016-07-19 00:35:03 +00:00
Matt Arsenault c96e1deffa AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32
llvm-svn: 275871
2016-07-18 18:35:05 +00:00
Matt Arsenault 4c519d3518 AMDGPU/R600: Replace barrier intrinsics
llvm-svn: 275870
2016-07-18 18:34:59 +00:00
Matt Arsenault 786724a22e AMDGPU: Follow up to r275203
I meant to squash this into it.

llvm-svn: 275220
2016-07-12 21:41:32 +00:00
Wei Ding 5b2636a152 AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8
Differential Revision: http://reviews.llvm.org/D22239

llvm-svn: 275197
2016-07-12 18:02:14 +00:00
Nicolai Haehnle 7968c34586 AMDGPU: Unify MOVRELSOffset and MOVRELDOffset
Summary:
Previously, constant index insertelements would be turned into SI_INDIRECT_DST,
which is bound to prevent some optimization opportunities. Worse, it mislead
the heuristic that decides whether immediates should be lowered to S_MOV_B32
or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22217

llvm-svn: 275160
2016-07-12 08:12:16 +00:00
Matt Arsenault fc7e6a0a0e AMDGPU: Cleanup pseudoinstructions
llvm-svn: 275133
2016-07-12 00:23:17 +00:00
Matt Arsenault 840593e19d AMDGPU: Fix missing scc def on control flow pseudos
These are all expanded to instructions that include an scc def.

llvm-svn: 275132
2016-07-12 00:08:14 +00:00
Matt Arsenault 48d70cb486 Revert "AMDGPU: Remove unused control flow intrinsic"
llvm-svn: 274978
2016-07-09 17:18:39 +00:00
Matt Arsenault 1322b6f8bb AMDGPU: Improve offset folding for register indexing
llvm-svn: 274954
2016-07-09 01:13:56 +00:00
Matt Arsenault 8f0a92f0ba AMDGPU: Remove unused control flow intrinsic
llvm-svn: 274939
2016-07-08 21:39:44 +00:00
Valery Pykhtin 68853ab2c5 [AMDGPU] fix ds_swizzle_b32 opcode for VI (bz 28371)
Differential Revision: http://reviews.llvm.org/D22049

llvm-svn: 274852
2016-07-08 15:12:46 +00:00
Matt Arsenault a74374a86b AMDGPU: Move si_mask_branch register operand to be a use
llvm-svn: 274818
2016-07-08 00:55:44 +00:00
Valery Pykhtin af8b1bddbd [AMDGPU] fix ds_write_src2 encoding (bz26027)
Differential revision: http://reviews.llvm.org/D22041

llvm-svn: 274756
2016-07-07 14:23:38 +00:00
Valery Pykhtin e65b39ec09 [AMDGPU] rename DS_1A1D_Off8_NORET to DS_1A2D_Off8_NORET as ds_write2xx use 2 source registers. NFC.
llvm-svn: 274556
2016-07-05 15:15:28 +00:00
Tom Stellard 17a0ec5400 AMDGPU/SI: Remove hack for selecting < 32-bit loads to MUBUF instructions
Summary:
The isGlobalLoad() query was returning true for constant address space loads
with memory types less than 32-bits, which is wrong.  This logic has been
replaced with PatFrag in the TableGen files, to provide the same functionality.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21696

llvm-svn: 274521
2016-07-04 20:41:48 +00:00
Matt Arsenault 43e92fe306 AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.

llvm-svn: 273652
2016-06-24 06:30:11 +00:00
Matt Arsenault 529cf25e60 AMDGPU: readlane/writelane do not read exec
llvm-svn: 273525
2016-06-23 01:26:16 +00:00
Matt Arsenault 3cb4ddeb4e AMDGPU: Fix liveness when expanding m0 loop
llvm-svn: 273514
2016-06-22 23:40:57 +00:00
Changpeng Fang 47efe1f6db AMDGPU/SI: Define an intrinsic to expose ds_swizzle_b32
Reviewers: tstellarAMD, arsenm

Differential Revision: http://reviews.llvm.org/D21533

llvm-svn: 273496
2016-06-22 21:33:49 +00:00
Matt Arsenault 9babdf4265 AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.

Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.

llvm-svn: 273467
2016-06-22 20:15:28 +00:00
Matt Arsenault 0bb294b224 AMDGPU: Temporarily select trap to s_endpgm
This should select to s_trap, but that requires
additonal work to setup and enable the trap handler.
For now emit s_endpgm so bugpoint stops getting stuck
on the unsupported call to abort.

Emit a warning that this will only terminate the wave and
not really trap.

llvm-svn: 273062
2016-06-17 22:27:03 +00:00
Matt Arsenault 8885910f8e AMDGPU: Remove llvm.SI.tid intrinsic
Mesa doesn't emit this for llvm >= 3.8 anymore.

llvm-svn: 273050
2016-06-17 21:18:41 +00:00
Tom Stellard bf3e6e5bb4 AMDGPU/SI: Refactor fixup handling for constant addrspace variables
Summary:
We now use a standard fixup type applying the pc-relative address of
constant address space variables, and we have the GlobalAddress lowering
code add the required 4 byte offset to the global address rather than
doing it as part of the fixup.

This refactoring will make it easier to use the same code for global
address space variables and also simplifies the code.

Re-commit this after fixing a bug where we were trying to use a
reference to a Triple object that had already been destroyed.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21154

llvm-svn: 272705
2016-06-14 20:29:59 +00:00
Tom Stellard b1a523fa68 Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables"
This reverts commit r272675.

llvm-svn: 272677
2016-06-14 15:16:35 +00:00
Tom Stellard 5e6298b0f2 AMDGPU/SI: Refactor fixup handling for constant addrspace variables
Summary:
We now use a standard fixup type applying the pc-relative address of
constant address space variables, and we have the GlobalAddress lowering
code add the required 4 byte offset to the global address rather than
doing it as part of the fixup.

This refactoring will make it easier to use the same code for global
address space variables and also simplifies the code.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21154

llvm-svn: 272675
2016-06-14 15:11:01 +00:00
Matt Arsenault 887018179a AMDGPU: Fix i64 global cmpxchg
This was using extract_subreg sub0 to extract the low register
of the result instead of sub0_sub1, producing an invalid copy.

There doesn't seem to be a way to use the compound subreg indices
in tablegen since those are generated, so manually select it.

llvm-svn: 272344
2016-06-09 23:42:48 +00:00
Artem Tamazov 135487767b [AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC.
Another step for unification llvm assembler/disassembler with sp3.
Besides, CodeGen output is a bit improved, thus changes in CodeGen tests.
Assembler/Disassembler tests updated/added.

Differential Revision: http://reviews.llvm.org/D20796

llvm-svn: 271900
2016-06-06 15:23:43 +00:00
Matt Arsenault 7401516985 AMDGPU: Add fract intrinsic
Remove broken patterns matching it. This was matching the
unsafe math pattern and expanding the fix for the buggy instruction
from the pattern. The problems are also on CI. Remove the workarounds
and only use fract with unsafe math or from the intrinsic.

llvm-svn: 271078
2016-05-28 00:19:52 +00:00
Matt Arsenault e57206d81b AMDGPU: Fix v2i64/v2f64 bitcasts
These operations tend to get promoted away to v4i32 so
this doesn't happen often.

llvm-svn: 270740
2016-05-25 18:07:36 +00:00
Jan Vesely fbcb754e66 AMDGPU: Make CONST_DATA_PTR available to R600
Rename to AMDGPUconstdata_ptr

Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19786

llvm-svn: 269474
2016-05-13 20:39:18 +00:00
Matt Arsenault 4234542503 AMDGPU: Make some instructions convergent
llvm-svn: 269147
2016-05-11 00:32:31 +00:00
Artem Tamazov f0b6b40fa4 [AMDGPU][llvm-mc] Some refactoring of .td files
Some custom Operands and AsmOperandClasses moved to proper place.
No functional changes.

Differential Revision: http://reviews.llvm.org/D20012

llvm-svn: 268780
2016-05-06 19:32:38 +00:00
Artem Tamazov ebe71ce36a [AMDGPU][llvm-mc] Add support for sendmsg(...) syntax.
Added support for sendmsg(MSG[, OP[, STREAM_ID]]) syntax
in s_sendmsg and s_sendmsghalt instructions.
The syntax matches the SP3 assembler/disassembler rules.
That is why implicit inputs (like M0 and EXEC) are not printed
to disassembly output anymore.

sendmsg(...) allows only known message types and attributes,
even if literals are used instead of symbolic names.
However, raw literal (without "sendmsg") still can be used,
and that allows for any 16-bit value.

Tests updated/added.

Differential Revision: http://reviews.llvm.org/D19596

llvm-svn: 268762
2016-05-06 17:48:48 +00:00
Artem Tamazov 38e496b175 Fixed/Recommitted r267733 "[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD."
Previously reverted by r267752.

r267733 review:
Differential Revision: http://reviews.llvm.org/D19342

llvm-svn: 268066
2016-04-29 17:04:50 +00:00
Chad Rosier 03e1647d19 Revert "[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD."
This reverts commit r267733 due to a -Werror,-Wunused-function error.

llvm-svn: 267752
2016-04-27 18:29:11 +00:00
Artem Tamazov 3896f8f83d [AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD.
Added support of TTMP quads.
Reworked M0 exclusion machinery for SMRD and similar instructions
to enable usage of TTMP registers in those instructions as destinations.
Tests added.

Differential Revision: http://reviews.llvm.org/D19342

llvm-svn: 267733
2016-04-27 16:20:23 +00:00
Nicolai Haehnle f66bdb5ea8 AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsic
Summary:
So it appears that to guarantee some of the ordering requirements of a GLSL
memoryBarrier() executed in the shader, we need to emit an s_waitcnt.

(We can't use an s_barrier, because memoryBarrier() may appear anywhere in
the shader, in particular it may appear in non-uniform control flow.)

Reviewers: arsenm, mareko, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19203

llvm-svn: 267729
2016-04-27 15:46:01 +00:00
Sam Kolton 3025e7f25f [AMDGPU] Assembler: basic support for SDWA instructions
Support for SDWA instructions for VOP1 and VOP2 encoding.
Not done yet:
  - converters for support optional operands and modifiers
  - VOPC
  - sext() modifier
  - intrinsics
  - VOP2b (see vop_dpp.s)
  - V_MAC_F32 (see vop_dpp.s)

Differential Revision: http://reviews.llvm.org/D19360

llvm-svn: 267553
2016-04-26 13:33:56 +00:00
Artem Tamazov d6468666b5 [AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax.
Added hwreg(reg[,offset,width]) syntax.
Default offset = 0, default width = 32.
Possibility to specify 16-bit immediate kept.
Added out-of-range checks.
Disassembling is always to hwreg(...) format.
Tests updated/added.

Differential Revision: http://reviews.llvm.org/D19329

llvm-svn: 267410
2016-04-25 14:13:51 +00:00
Nicolai Haehnle b0c9748709 AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic
Summary:
This intrinsic returns true if the current thread belongs to a live pixel
and false if it belongs to a pixel that we are executing only for derivative
computation. It will be used by Mesa to implement gl_HelperInvocation.

Note that for pixels that are killed during the shader, this implementation
also returns true, but it doesn't matter because those pixels are always
disabled in the EXEC mask.

This unearthed a corner case in the instruction verifier, which complained
about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but
correct code, so make the verifier accept it as such.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19191

llvm-svn: 267102
2016-04-22 04:04:08 +00:00
Nicolai Haehnle b48275f134 Add IntrWrite[Arg]Mem intrinsic property
Summary:
This property is used to mark an intrinsic that only writes to memory, but
neither reads from memory nor has other side effects.

An example where this is useful is the llvm.amdgcn.buffer.store.format.*
intrinsic, which corresponds to a store instruction that goes through a special
buffer descriptor rather than through a plain pointer.

With this property, the intrinsic should still be handled as having side
effects at the LLVM IR level, but machine scheduling can make smarter
decisions.

Reviewers: tstellarAMD, arsenm, joker.eph, reames

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18291

llvm-svn: 266826
2016-04-19 21:58:33 +00:00
Artem Tamazov e2762423c2 [AMDGPU][llvm-mc] s_setreg* - Fix order of operands
Order should match the sp3 syntax, where destination (simm16 denoting the hwreg) is coming first.

Differential Revision: http://reviews.llvm.org/D19161

llvm-svn: 266617
2016-04-18 14:54:26 +00:00
Matt Arsenault 4ac341c8b3 AMDGPU: Directly emit m0 initialization with s_mov_b32
Currently what comes out of instruction selection is a
register initialized to -1, and then copied to m0.
MachineCSE doesn't consider copies, but we want these
to be CSEed. This isn't much of a problem currently,
because SIFoldOperands is run immediately after.

This avoids regressions when SIFoldOperands is run later
from leaving all copies to m0.

llvm-svn: 266377
2016-04-14 21:58:15 +00:00
Matt Arsenault 9cd90712f0 AMDGPU: Implement canonicalize
Also add generic DAG node for it.

llvm-svn: 266272
2016-04-14 01:42:16 +00:00
Nicolai Haehnle df77c9ada4 AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like
llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and
atomic counters at least when robust buffer access behavior is desired.
(These instructions perform no format conversion and do buffer range checking
per component.)

As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format,
it has become trivial to add support for the f32 and v2f32 variants of that
intrinsic, so the patch does so.

Also DAG-ify (and fix) some tests that I noticed intermittent failures in
while developing this patch.

Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects
changes to the BUFFER_STORE_DWORD* instructions. See also
http://reviews.llvm.org/D18291.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18292

llvm-svn: 266126
2016-04-12 21:18:10 +00:00
Matt Arsenault 64fa2f4513 AMDGPU: Implement i64 global atomics
llvm-svn: 266075
2016-04-12 14:05:11 +00:00
Matt Arsenault a9dbdcae04 AMDGPU: Add atomic_inc + atomic_dec intrinsics
These are different than atomicrmw add 1 because they have
an additional input value to clamp the result.

llvm-svn: 266074
2016-04-12 14:05:04 +00:00
Jan Vesely 43b7b5b846 AMDGPU/SI: Implement atomic load/store for i32 and i64
Standard load/store instructions with GLC bit set.

Reviewers: tstellardAMD, arsenm

Differential Revision: http://reviews.llvm.org/D18760

llvm-svn: 265709
2016-04-07 19:23:11 +00:00
Valery Pykhtin e23b6deb01 [AMDGPU] fix readlane/readfirstlane src vgpr operand type.
For VGPR_32 operand disassembler expects a VGPR register encoded as 0..255 (enum8 src operand).
readfirstlane/readline actually has enum9 operand and this change fixes VGPR_32 to VS_32 (enum9 encoding).

Differential Revision: http://reviews.llvm.org/D18696

llvm-svn: 265670
2016-04-07 13:41:51 +00:00
Sam Kolton ff90c60a78 [AMDGPU] AsmParser: disable DPP for unsupported instructions. New dpp tests. Fix v_nop_dpp.
Summary:
1. Disable DPP encoding for instructions that do not support it:
    - VOP1:
        - v_readfirstlane_b32
        - v_clrexcp
        - v_movreld_b32
        - v_movrels_b32
        - v_movrelsd_b32
    - VOP2:
        - v_madmk_f16/32
        - v_madak_f16/32
    - VOPC, VINTRP, VOP3
2. Fix DPP for v_nop
3. New DPP tests for VOP1 and VOP2 instructions

Reviewers: nhaustov, tstellarAMD, vpykhtin

Subscribers: tstellarAMD, arsenm

Differential Revision: http://reviews.llvm.org/D18552

llvm-svn: 265538
2016-04-06 13:29:59 +00:00
Tom Stellard 354a43c7bc AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}
Summary:
Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+.

32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý.

Patch by: Vedran Miletić

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: jvesely, scchan, kanarayan, arsenm

Differential Revision: http://reviews.llvm.org/D17280

llvm-svn: 265170
2016-04-01 18:27:37 +00:00
Matt Arsenault 2fe4fbc184 AMDGPU: Add frexp_exp intrinsic
llvm-svn: 264944
2016-03-30 22:28:52 +00:00
Matt Arsenault 30d37a74da AMDGPU: Remove atomic inc/dec patterns
There is no benefit to these since materializing the constant 1
requires the same number of instructions as materializing uint_max

llvm-svn: 264215
2016-03-23 23:23:38 +00:00
Matt Arsenault b96b57347a AMDGPU: Add frexp_mant intrinsic
llvm-svn: 263948
2016-03-21 16:11:05 +00:00
Nicolai Haehnle 95e8ffd398 AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format
Summary:
Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before
the frontend patches land in Mesa. Eventually, we may want to automatically
reduce the size of loads at the LLVM IR level, which requires such overloads,
and in some cases Mesa can generate them directly.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18255

llvm-svn: 263792
2016-03-18 16:24:40 +00:00
Nicolai Haehnle ad63638f6d AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics
Summary:
These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used
by Mesa to implement atomics with buffer semantics. The intrinsic interface
matches that of buffer.load.format and buffer.store.format, except that the
GLC bit is not exposed (it is automatically deduced based on whether the
return value is used).

The change of hasSideEffects is required for TableGen to accept the pattern
that matches the intrinsic.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, rivanvx, llvm-commits

Differential Revision: http://reviews.llvm.org/D18151

llvm-svn: 263791
2016-03-18 16:24:31 +00:00
Nicolai Haehnle 3003ba00a3 AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format
Summary:
We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend
cannot easily make use of an explicit soffset parameter either. Furthermore,
it is likely that in the future, LLVM will be in a better position than the
frontend to choose an SGPR offset if possible.

Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18218

llvm-svn: 263790
2016-03-18 16:24:20 +00:00
Changpeng Fang 01f6062227 AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS
Summary:
  Static LDS size is saved in MachineFunctionInfo::LDSSize,
We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter,
we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize.

Reviewers:
    arsenm
    tstellarAMD

Subscribers
    llvm-commits, arsenm

Differential Revision:
   http://reviews.llvm.org/D18064

llvm-svn: 263563
2016-03-15 17:28:44 +00:00
Nikolay Haustov 79af6b33e0 [AMDGPU] Assembler: SOP* instruction fixes
s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit.
s_rfe_b64 has just one destination operand and no source.
Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that.
Add s_memrealtime test and change comments in smem.s to follow common style.
Change test for s_memtime to use non-zero register to make it really test encoding.
Add tests for s_buffer_load*.
Add tests for SOPC instructions (same for SI and VI)

Differential Revision: http://reviews.llvm.org/D18040

llvm-svn: 263420
2016-03-14 11:17:19 +00:00
Nikolay Haustov 6560781c4f [AMDGPU] Assembler: change v_madmk operands to have same order as mad.
The constant is now at source operand 1 (previously at 2).
This is also how it is in legacy AMD sp3 assembler.
Update tests.

Differential Revision: http://reviews.llvm.org/D17984

llvm-svn: 263212
2016-03-11 09:27:25 +00:00
Nicolai Haehnle b142770bfe AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.

The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.

For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.

BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17277

llvm-svn: 263140
2016-03-10 18:43:50 +00:00
Changpeng Fang 278a5b31a5 AMDGPU/SI: Define S_GETREG Intrinsic
Summary:
 Define s_getreg intrinsic to generate s_getreg instruction to read
hardware registers.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17892

llvm-svn: 263124
2016-03-10 16:47:15 +00:00
Valery Pykhtin a4db224d54 [AMDGPU] Fix SMEM instructions encoding/operand namings
Differential Revision: http://reviews.llvm.org/D17651

llvm-svn: 263108
2016-03-10 13:06:08 +00:00
Nikolay Haustov 8e3f099497 [AMDGPU] Assembler: Fix s_setpc_b64
s_setpc_b64 has just one 64-bit source which is the address of instruction to jump to.

Differential Revision: http://reviews.llvm.org/D17888

llvm-svn: 263005
2016-03-09 10:56:19 +00:00
Matt Arsenault c89f2919a4 AMDGPU: Match more med3 integer patterns
llvm-svn: 262864
2016-03-07 21:54:48 +00:00
Valery Pykhtin 0c6293da68 [AMDGPU] SOPxx instructions operand naming fixed in td files.
dst -> sdst
ssrcN -> srcN

Differential Revision: http://reviews.llvm.org/D17646

llvm-svn: 262801
2016-03-06 10:31:44 +00:00
Tom Stellard 649b5db557 AMDGPU/SI: Add support for spiling SGPRs to scratch buffer
Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17592

llvm-svn: 262732
2016-03-04 18:31:18 +00:00
Nikolay Haustov 5bf46ac150 AMDGPU/SI: add llvm.amdgcn.image.atomic.* intrinsics
These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the
GL_ARB_shader_image_load_store extension.

Initial change by Nicolai H.hnle

Differential Revision: http://reviews.llvm.org/D17401

llvm-svn: 262701
2016-03-04 10:39:50 +00:00
Matt Arsenault 274d34e725 AMDGPU: Add s_sleep intrinsic
llvm-svn: 262120
2016-02-27 08:53:52 +00:00
Matt Arsenault 61738cbcb6 AMDGPU: Implement readcyclecounter
This matches the behavior of the HSAIL clock instruction.
s_realmemtime is used if the subtarget supports it, and falls
back to s_memtime if not.

Also introduces new intrinsics for each of s_memtime / s_memrealtime.

llvm-svn: 262119
2016-02-27 08:53:46 +00:00
Nikolay Haustov 2f684f1347 [AMDGPU] Assembler: Basic support for MIMG
Add parsing and printing of image operands. Matches legacy sp3 assembler.
Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last.
Update SITargetLowering for new order.
Add basic MC test.
Update CodeGen tests.

Review: http://reviews.llvm.org/D17574
llvm-svn: 261995
2016-02-26 09:51:05 +00:00
Nikolay Haustov 2e4c72977c [AMDGPU] Assembler: Simplify handling of optional operands
Resubmit with index problem fixed. Verified with valgrind.

Prepare to support DPP encodings.

For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands.
However this means that when parsing instruction which has no mnemonic prefix,
we cannot add both default values for VOP3 and for DPP optional operands
to OperandVector - neither instructions would match. So add default values
for optional operands to MCInst during conversion instead.

Mark more operands as IsOptional = 1 in .td files.
Do not add default values for optional operands to OperandVector in AMDGPUAsmParser.
Add default values for optional operands during conversion using new helper addOptionalImmOperand.
Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one.
Separate cvtFlat and cvtFlatAtomic.
Fix CNDMASK_B32 definition to have no modifiers.

Review: http://reviews.llvm.org/D17445
llvm-svn: 261856
2016-02-25 10:58:54 +00:00
NAKAMURA Takumi 3d3d0f4151 Revert r261742, "[AMDGPU] Assembler: Simplify handling of optional operands"
It brought undefined behavior.

llvm-svn: 261839
2016-02-25 08:35:27 +00:00
Nikolay Haustov 4f073ca7fa [AMDGPU] Assembler: Simplify handling of optional operands
Prepare to support DPP encodings.

For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands. However this means that when parsing instruction which has no mnemonic prefix, we cannot add both default values for VOP3 and for DPP optional operands to OperandVector - neither instructions would match. So add default values for optional operands to MCInst during conversion instead.

Mark more operands as IsOptional = 1 in .td files.
Do not add default values for optional operands to OperandVector in AMDGPUAsmParser.
Add default values for optional operands during conversion using new helper addOptionalImmOperand.
Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one.
Separate cvtFlat and cvtFlatAtomic.
Fix CNDMASK_B32 definition to have no modifiers.

Review: http://reviews.llvm.org/D17445

Reviewers: tstellarAMD
llvm-svn: 261742
2016-02-24 14:22:47 +00:00
Nikolay Haustov 2a62b3c244 [AMDGPU] Fix operands of S_BFE_U64 and S_BFM_B64
src1 of s_bfe_u64 is 32-bit (same as s_bfe_i64).
src0 and src1 of s_bfm_b64 are 32-bit.
Update tests.

Review: http://reviews.llvm.org/D17480

Reviewers: arsenm
llvm-svn: 261621
2016-02-23 09:19:14 +00:00
Nicolai Haehnle f2c64db55a AMDGPU/SI: add llvm.amdgcn.image.load/store[.mip] intrinsics
Summary:
These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa
for the GL_ARB_shader_image_load_store extension.

IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has
a legacy name and pretends not to read memory.

Differential Revision: http://reviews.llvm.org/D17276

llvm-svn: 261224
2016-02-18 16:44:18 +00:00
Tom Stellard e1818af8c5 [AMDGPU] Disassembler: Added basic disassembler for AMDGPU target
Changes:

- Added disassembler project
- Fixed all decoding conflicts in .td files
- Added DecoderMethod=“NONE” option to Target.td that allows to
  disable decoder generation for an instruction.
- Created decoding functions for VS_32 and VReg_32 register classes.
- Added stubs for decoding all register classes.
- Added several tests for disassembler

Disassembler only supports:

- VI subtarget
- VOP1 instruction encoding
- 32-bit register operands and inline constants

[Valery]

One of the point that requires to pay attention to is how decoder
conflicts were resolved:

- Groups of target instructions were separated by using different
  DecoderNamespace (SICI, VI, CI) using similar to AssemblerPredicate
  approach.

- There were conflicts in IMAGE_<> instructions caused by two
  different reasons:

1. dmask wasn’t specified for the output (fixed)
2. There are image instructions that differ only by the number of
   the address components but have the same encoding by the HW spec. The
   actual number of address components is determined by the HW at runtime
   using image resource descriptor starting from the VGPR encoded in an
   IMAGE instruction. This means that we should choose only one instruction
   from conflicting group to be the rule for decoder. I didn’t find the way
   to disable decoder generation for an arbitrary instruction and therefore
   made a onelinear fix to tablegen generator that would suppress decoder
   generation when DecoderMethod is set to “NONE”. This is a change that
   should be reviewed and submitted first. Otherwise I would need to
   specify different DecoderNamespace for every instruction in the
   conflicting group. I haven’t checked yet if DecoderMethod=“NONE” is not
   used in other targets.
3. IMAGE_GATHER decoder generation is for now disabled and to be
   done later.

[/Valery]

Patch By: Sam Kolton

Differential Revision: http://reviews.llvm.org/D16723

llvm-svn: 261185
2016-02-18 03:42:32 +00:00
Tom Stellard cc4c8718ed [AMDGPU] Rename $dst operand to $vdst for VOP instructions.
Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler.

Reviewers: vpykhtin, tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, nhaustov

Projects: #llvm-amdgpu-spb

Differential Revision: http://reviews.llvm.org/D16920

llvm-svn: 260986
2016-02-16 18:14:56 +00:00
Matt Arsenault ce56a0ef54 AMDGPU: Add intrinsics for sin/cos
These provide direct access to the hardware instruction without
the unit version required like llvm.sin/llvm.cos lowering requires.

llvm-svn: 260782
2016-02-13 01:19:56 +00:00
Matt Arsenault 79963e80b8 AMDGPU: Rename intrinsic to better match instruction name
Also fixes missing f32 test.

llvm-svn: 260780
2016-02-13 01:03:00 +00:00
Tom Stellard bc4497b13c AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions
Reviewers: arsenm

Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16603

llvm-svn: 260765
2016-02-12 23:45:29 +00:00
Tom Stellard e993451f5c [AMDGPU] Fix for "v_div_scale_f64 reg, vcc, ..." parsing
Summary:
Added support for "VOP3Only" attribute in VOP3bInst encoding.
Set VOP3Only=1 for V_DIV_SCALE_F64/32 insns.
Added support for multi-dest instructions in AMDGPUAs::cvt*().
Added lit test for "V_DIV_SCALE_F64|F32 vreg,vcc|sreg,vreg,vreg,vreg".

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, SamWot, nhaustov, vpykhtin

Differential Revision: http://reviews.llvm.org/D16995

Patch By: Artem Tamazov

llvm-svn: 260560
2016-02-11 18:25:26 +00:00
Tom Stellard a90b9526df [AMDGPU] Assembler: Fix VOP3 only instructions
Separate methods to convert parsed instructions to MCInst:

  - VOP3 only instructions (always create modifiers as operands in MCInst)
  - VOP2 instrunctions with modifiers (create modifiers as operands
    in MCInst when e64 encoding is forced or modifiers are parsed)
  - VOP2 instructions without modifiers (do not create modifiers
    as operands in MCInst)
  - Add VOP3Only flag. Pass HasMods flag to VOP3Common.
  - Simplify code that deals with modifiers (-1 is now same as
    0). This is no longer needed.
  - Add few tests (more will be added separately).
    Update error message now correct.

Patch By: Nikolay Haustov

Differential Revision: http://reviews.llvm.org/D16778

llvm-svn: 260483
2016-02-11 03:28:15 +00:00
Matt Arsenault f639c32739 AMDGPU: Match some med3 patterns
llvm-svn: 259089
2016-01-28 20:53:42 +00:00
Matt Arsenault 382d945d16 AMDGPU: Tidy minor td file issues
Make comments and indentation more consistent.

Rearrange a few things to be in a more consistent order,
such as organizing subtarget features from those describing
an actual device property, and those used as options.

llvm-svn: 258789
2016-01-26 04:49:22 +00:00
Matt Arsenault c5f6152911 AMDGPU: Make v32i8/v64i8 illegal types
Old intrinsics were forcing these, but they have now all
been removed. This fixes large i8 vector operations generally
being broken.

llvm-svn: 258788
2016-01-26 04:43:48 +00:00