Commit Graph

82 Commits

Author SHA1 Message Date
Nicolai Haehnle 3003ba00a3 AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format
Summary:
We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend
cannot easily make use of an explicit soffset parameter either. Furthermore,
it is likely that in the future, LLVM will be in a better position than the
frontend to choose an SGPR offset if possible.

Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18218

llvm-svn: 263790
2016-03-18 16:24:20 +00:00
Changpeng Fang 01f6062227 AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS
Summary:
  Static LDS size is saved in MachineFunctionInfo::LDSSize,
We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter,
we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize.

Reviewers:
    arsenm
    tstellarAMD

Subscribers
    llvm-commits, arsenm

Differential Revision:
   http://reviews.llvm.org/D18064

llvm-svn: 263563
2016-03-15 17:28:44 +00:00
Nikolay Haustov 79af6b33e0 [AMDGPU] Assembler: SOP* instruction fixes
s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit.
s_rfe_b64 has just one destination operand and no source.
Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that.
Add s_memrealtime test and change comments in smem.s to follow common style.
Change test for s_memtime to use non-zero register to make it really test encoding.
Add tests for s_buffer_load*.
Add tests for SOPC instructions (same for SI and VI)

Differential Revision: http://reviews.llvm.org/D18040

llvm-svn: 263420
2016-03-14 11:17:19 +00:00
Nikolay Haustov 6560781c4f [AMDGPU] Assembler: change v_madmk operands to have same order as mad.
The constant is now at source operand 1 (previously at 2).
This is also how it is in legacy AMD sp3 assembler.
Update tests.

Differential Revision: http://reviews.llvm.org/D17984

llvm-svn: 263212
2016-03-11 09:27:25 +00:00
Nicolai Haehnle b142770bfe AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.

The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.

For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.

BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17277

llvm-svn: 263140
2016-03-10 18:43:50 +00:00
Changpeng Fang 278a5b31a5 AMDGPU/SI: Define S_GETREG Intrinsic
Summary:
 Define s_getreg intrinsic to generate s_getreg instruction to read
hardware registers.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17892

llvm-svn: 263124
2016-03-10 16:47:15 +00:00
Valery Pykhtin a4db224d54 [AMDGPU] Fix SMEM instructions encoding/operand namings
Differential Revision: http://reviews.llvm.org/D17651

llvm-svn: 263108
2016-03-10 13:06:08 +00:00
Nikolay Haustov 8e3f099497 [AMDGPU] Assembler: Fix s_setpc_b64
s_setpc_b64 has just one 64-bit source which is the address of instruction to jump to.

Differential Revision: http://reviews.llvm.org/D17888

llvm-svn: 263005
2016-03-09 10:56:19 +00:00
Matt Arsenault c89f2919a4 AMDGPU: Match more med3 integer patterns
llvm-svn: 262864
2016-03-07 21:54:48 +00:00
Valery Pykhtin 0c6293da68 [AMDGPU] SOPxx instructions operand naming fixed in td files.
dst -> sdst
ssrcN -> srcN

Differential Revision: http://reviews.llvm.org/D17646

llvm-svn: 262801
2016-03-06 10:31:44 +00:00
Tom Stellard 649b5db557 AMDGPU/SI: Add support for spiling SGPRs to scratch buffer
Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17592

llvm-svn: 262732
2016-03-04 18:31:18 +00:00
Nikolay Haustov 5bf46ac150 AMDGPU/SI: add llvm.amdgcn.image.atomic.* intrinsics
These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the
GL_ARB_shader_image_load_store extension.

Initial change by Nicolai H.hnle

Differential Revision: http://reviews.llvm.org/D17401

llvm-svn: 262701
2016-03-04 10:39:50 +00:00
Matt Arsenault 274d34e725 AMDGPU: Add s_sleep intrinsic
llvm-svn: 262120
2016-02-27 08:53:52 +00:00
Matt Arsenault 61738cbcb6 AMDGPU: Implement readcyclecounter
This matches the behavior of the HSAIL clock instruction.
s_realmemtime is used if the subtarget supports it, and falls
back to s_memtime if not.

Also introduces new intrinsics for each of s_memtime / s_memrealtime.

llvm-svn: 262119
2016-02-27 08:53:46 +00:00
Nikolay Haustov 2f684f1347 [AMDGPU] Assembler: Basic support for MIMG
Add parsing and printing of image operands. Matches legacy sp3 assembler.
Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last.
Update SITargetLowering for new order.
Add basic MC test.
Update CodeGen tests.

Review: http://reviews.llvm.org/D17574
llvm-svn: 261995
2016-02-26 09:51:05 +00:00
Nikolay Haustov 2e4c72977c [AMDGPU] Assembler: Simplify handling of optional operands
Resubmit with index problem fixed. Verified with valgrind.

Prepare to support DPP encodings.

For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands.
However this means that when parsing instruction which has no mnemonic prefix,
we cannot add both default values for VOP3 and for DPP optional operands
to OperandVector - neither instructions would match. So add default values
for optional operands to MCInst during conversion instead.

Mark more operands as IsOptional = 1 in .td files.
Do not add default values for optional operands to OperandVector in AMDGPUAsmParser.
Add default values for optional operands during conversion using new helper addOptionalImmOperand.
Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one.
Separate cvtFlat and cvtFlatAtomic.
Fix CNDMASK_B32 definition to have no modifiers.

Review: http://reviews.llvm.org/D17445
llvm-svn: 261856
2016-02-25 10:58:54 +00:00
NAKAMURA Takumi 3d3d0f4151 Revert r261742, "[AMDGPU] Assembler: Simplify handling of optional operands"
It brought undefined behavior.

llvm-svn: 261839
2016-02-25 08:35:27 +00:00
Nikolay Haustov 4f073ca7fa [AMDGPU] Assembler: Simplify handling of optional operands
Prepare to support DPP encodings.

For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands. However this means that when parsing instruction which has no mnemonic prefix, we cannot add both default values for VOP3 and for DPP optional operands to OperandVector - neither instructions would match. So add default values for optional operands to MCInst during conversion instead.

Mark more operands as IsOptional = 1 in .td files.
Do not add default values for optional operands to OperandVector in AMDGPUAsmParser.
Add default values for optional operands during conversion using new helper addOptionalImmOperand.
Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one.
Separate cvtFlat and cvtFlatAtomic.
Fix CNDMASK_B32 definition to have no modifiers.

Review: http://reviews.llvm.org/D17445

Reviewers: tstellarAMD
llvm-svn: 261742
2016-02-24 14:22:47 +00:00
Nikolay Haustov 2a62b3c244 [AMDGPU] Fix operands of S_BFE_U64 and S_BFM_B64
src1 of s_bfe_u64 is 32-bit (same as s_bfe_i64).
src0 and src1 of s_bfm_b64 are 32-bit.
Update tests.

Review: http://reviews.llvm.org/D17480

Reviewers: arsenm
llvm-svn: 261621
2016-02-23 09:19:14 +00:00
Nicolai Haehnle f2c64db55a AMDGPU/SI: add llvm.amdgcn.image.load/store[.mip] intrinsics
Summary:
These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa
for the GL_ARB_shader_image_load_store extension.

IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has
a legacy name and pretends not to read memory.

Differential Revision: http://reviews.llvm.org/D17276

llvm-svn: 261224
2016-02-18 16:44:18 +00:00
Tom Stellard e1818af8c5 [AMDGPU] Disassembler: Added basic disassembler for AMDGPU target
Changes:

- Added disassembler project
- Fixed all decoding conflicts in .td files
- Added DecoderMethod=“NONE” option to Target.td that allows to
  disable decoder generation for an instruction.
- Created decoding functions for VS_32 and VReg_32 register classes.
- Added stubs for decoding all register classes.
- Added several tests for disassembler

Disassembler only supports:

- VI subtarget
- VOP1 instruction encoding
- 32-bit register operands and inline constants

[Valery]

One of the point that requires to pay attention to is how decoder
conflicts were resolved:

- Groups of target instructions were separated by using different
  DecoderNamespace (SICI, VI, CI) using similar to AssemblerPredicate
  approach.

- There were conflicts in IMAGE_<> instructions caused by two
  different reasons:

1. dmask wasn’t specified for the output (fixed)
2. There are image instructions that differ only by the number of
   the address components but have the same encoding by the HW spec. The
   actual number of address components is determined by the HW at runtime
   using image resource descriptor starting from the VGPR encoded in an
   IMAGE instruction. This means that we should choose only one instruction
   from conflicting group to be the rule for decoder. I didn’t find the way
   to disable decoder generation for an arbitrary instruction and therefore
   made a onelinear fix to tablegen generator that would suppress decoder
   generation when DecoderMethod is set to “NONE”. This is a change that
   should be reviewed and submitted first. Otherwise I would need to
   specify different DecoderNamespace for every instruction in the
   conflicting group. I haven’t checked yet if DecoderMethod=“NONE” is not
   used in other targets.
3. IMAGE_GATHER decoder generation is for now disabled and to be
   done later.

[/Valery]

Patch By: Sam Kolton

Differential Revision: http://reviews.llvm.org/D16723

llvm-svn: 261185
2016-02-18 03:42:32 +00:00
Tom Stellard cc4c8718ed [AMDGPU] Rename $dst operand to $vdst for VOP instructions.
Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler.

Reviewers: vpykhtin, tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, nhaustov

Projects: #llvm-amdgpu-spb

Differential Revision: http://reviews.llvm.org/D16920

llvm-svn: 260986
2016-02-16 18:14:56 +00:00
Matt Arsenault ce56a0ef54 AMDGPU: Add intrinsics for sin/cos
These provide direct access to the hardware instruction without
the unit version required like llvm.sin/llvm.cos lowering requires.

llvm-svn: 260782
2016-02-13 01:19:56 +00:00
Matt Arsenault 79963e80b8 AMDGPU: Rename intrinsic to better match instruction name
Also fixes missing f32 test.

llvm-svn: 260780
2016-02-13 01:03:00 +00:00
Tom Stellard bc4497b13c AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions
Reviewers: arsenm

Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16603

llvm-svn: 260765
2016-02-12 23:45:29 +00:00
Tom Stellard e993451f5c [AMDGPU] Fix for "v_div_scale_f64 reg, vcc, ..." parsing
Summary:
Added support for "VOP3Only" attribute in VOP3bInst encoding.
Set VOP3Only=1 for V_DIV_SCALE_F64/32 insns.
Added support for multi-dest instructions in AMDGPUAs::cvt*().
Added lit test for "V_DIV_SCALE_F64|F32 vreg,vcc|sreg,vreg,vreg,vreg".

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, SamWot, nhaustov, vpykhtin

Differential Revision: http://reviews.llvm.org/D16995

Patch By: Artem Tamazov

llvm-svn: 260560
2016-02-11 18:25:26 +00:00
Tom Stellard a90b9526df [AMDGPU] Assembler: Fix VOP3 only instructions
Separate methods to convert parsed instructions to MCInst:

  - VOP3 only instructions (always create modifiers as operands in MCInst)
  - VOP2 instrunctions with modifiers (create modifiers as operands
    in MCInst when e64 encoding is forced or modifiers are parsed)
  - VOP2 instructions without modifiers (do not create modifiers
    as operands in MCInst)
  - Add VOP3Only flag. Pass HasMods flag to VOP3Common.
  - Simplify code that deals with modifiers (-1 is now same as
    0). This is no longer needed.
  - Add few tests (more will be added separately).
    Update error message now correct.

Patch By: Nikolay Haustov

Differential Revision: http://reviews.llvm.org/D16778

llvm-svn: 260483
2016-02-11 03:28:15 +00:00
Matt Arsenault f639c32739 AMDGPU: Match some med3 patterns
llvm-svn: 259089
2016-01-28 20:53:42 +00:00
Matt Arsenault 382d945d16 AMDGPU: Tidy minor td file issues
Make comments and indentation more consistent.

Rearrange a few things to be in a more consistent order,
such as organizing subtarget features from those describing
an actual device property, and those used as options.

llvm-svn: 258789
2016-01-26 04:49:22 +00:00
Matt Arsenault c5f6152911 AMDGPU: Make v32i8/v64i8 illegal types
Old intrinsics were forcing these, but they have now all
been removed. This fixes large i8 vector operations generally
being broken.

llvm-svn: 258788
2016-01-26 04:43:48 +00:00
Matt Arsenault 018179fc46 AMDGPU: Remove old sample intrinsics
I did my best to try to update all the uses in tests that
just happened to use the old ones to the newer intrinsics.

I'm not sure I got all of the immediate operand conversions
correct, since the value seems to have been ignored by the
old pattern but I don't think it really matters.

llvm-svn: 258787
2016-01-26 04:38:08 +00:00
Matt Arsenault 051d6f9fde AMDGPU: Add new amdgcn intrinsics for cube instructions
More cleanup to try to get all intrinsics using the correct
amdgcn prefix that are as close to the instruction as possible.

llvm-svn: 258786
2016-01-26 04:29:56 +00:00
Matt Arsenault 7713162c32 AMDGPU: Remove more unused intrinsics
Replace tests with lrp with basic IR expansion

llvm-svn: 258612
2016-01-23 05:42:38 +00:00
Matt Arsenault 10ca39ca8b AMDGPU: Add new name for barrier intrinsic
llvm-svn: 258558
2016-01-22 21:30:43 +00:00
Matt Arsenault 7898b90ee1 AMDGPU: Change control flow intrinsics to use amdgcn prefix
These aren't supposed to be used outside of the backend,
so there aren't any users to worry about.

llvm-svn: 258516
2016-01-22 18:42:55 +00:00
Matt Arsenault 8d903029e8 AMDGPU: Don't use separate mulhu/mulhs Pats
llvm-svn: 258515
2016-01-22 18:42:49 +00:00
Matt Arsenault de5fbe9c60 AMDGPU: Pattern match ffbh pattern to instruction.
The hardware instruction's output on 0 is -1 rather than 32.
Eliminate a test and select to -1. This removes an extra instruction
from the compatability function with HSAIL's firstbit instruction.

llvm-svn: 257352
2016-01-11 17:02:00 +00:00
Matt Arsenault 905042774d AMDGPU: Remove redundant let mayLoad = 1
This is already set on the SMRD format class.

llvm-svn: 256813
2016-01-05 04:50:28 +00:00
Tom Stellard a6f24c6565 AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF instructions
Summary:
We were previously selecting all constant loads to SMRD instructions and legalizing
the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass.

This new solution is more simple and also generates much better code, because
the instruction selector is able to take advantage of all the MUBUF addressing
modes that are legalization pass wasn't able to.

We also no longer need to generate v_add_* instructions when we
have a uniform pointer and a non-uniform offset, as this is now folded into the
MUBUF instruction during instruction selection.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15425

llvm-svn: 255672
2015-12-15 20:55:55 +00:00
Tom Stellard 8f307217c3 AMDGPU/SI: Fix bitcast between v2f32 and f64
The radeonsi fp64 support can hit these now that some redundant bitcasts
are folded.

Patch by: Michel Dänzer

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 255657
2015-12-15 17:11:17 +00:00
Tom Stellard 43f52df0b5 AMDGPU/SI: Add llvm.amdgcn.mbcnt.* intrinsics
Summary:
These are meant to be used instead of the llvm.SI.tid intrinsic which will
be deprecated at some point.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15475

llvm-svn: 255652
2015-12-15 17:02:52 +00:00
Matt Arsenault d079285e05 AMDGPU: Use generic bitreverse intrinsic
Also fix bug in vector legalization for bitreverse.

llvm-svn: 255512
2015-12-14 17:25:38 +00:00
Matt Arsenault fbd9bbfda3 Start replacing vector_extract/vector_insert with extractelt/insertelt
These are redundant pairs of nodes defined for
INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
insertelement/extractelement are slightly closer to the corresponding
C++ node name, and has stricter type checking so prefer it.

Update targets to only use these nodes where it is trivial to do so.
AArch64, ARM, and Mips all have various type errors on simple replacement,
so they will need work to fix.

Example from AArch64:

def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
          (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;

Which is trying to do sext_inreg i8, i8.

llvm-svn: 255359
2015-12-11 19:20:16 +00:00
Tom Stellard c93fc11f36 AMDGPU/SI: Emit constant arrays in the .text section
Summary:
This allows us to remove the END_OF_TEXT_LABEL hack we had been using
and simplifies the fixups used to compute the address of constant
arrays.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15257

llvm-svn: 255204
2015-12-10 02:13:01 +00:00
Tom Stellard 38b7cbe3e0 AMDGPU/SI: Remove REGISTER_STORE/REGISTER_LOAD code which is now dead
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15050

llvm-svn: 254427
2015-12-01 17:45:22 +00:00
Marek Olsak 7ed6b2f414 AMDGPU/SI: select S_ABS_I32 when possible (v2)
v2: added more tests, moved the SALU->VALU conversion to a separate function

It looks like it's not possible to get subregisters in the S_ABS lowering
code, and I don't feel like guessing without testing what the correct code
would look like.

llvm-svn: 254095
2015-11-25 21:22:45 +00:00
Matt Arsenault 61001bbc03 AMDGPU: Make v2i64/v2f64 legal types.
They can be loaded and stored, so count them as legal. This is
mostly to fix a number of common cases for load/store merging.

llvm-svn: 254086
2015-11-25 19:58:34 +00:00
Tom Stellard 41b7e63040 AMDGPU/SI: Refactor VOP[12C] tablegen definitions
Summary:
Pass the VOPProfile object all the through to *_m multiclasses.  This will
allow us to do more simplifications in the future.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13437

llvm-svn: 252339
2015-11-06 20:56:18 +00:00
Matt Arsenault 08f14de244 AMDGPU: Remove unused scratch resource operands
The SGPR spill pseudos don't actually use them.

llvm-svn: 252324
2015-11-06 18:07:53 +00:00
Marek Olsak 74d084f466 AMDGPU/SI: use S_OR for fneg (fabs f32)
llvm-svn: 251631
2015-10-29 15:29:05 +00:00