Commit Graph

561 Commits

Author SHA1 Message Date
Nicolai Haehnle 3003ba00a3 AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format
Summary:
We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend
cannot easily make use of an explicit soffset parameter either. Furthermore,
it is likely that in the future, LLVM will be in a better position than the
frontend to choose an SGPR offset if possible.

Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18218

llvm-svn: 263790
2016-03-18 16:24:20 +00:00
Sam Kolton a74cd526e9 [AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3
Review: http://reviews.llvm.org/D18267
llvm-svn: 263789
2016-03-18 15:35:51 +00:00
Changpeng Fang 234fcb81d3 AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute
Symmary:
  ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed.

Reviewers
  arsenm, tstellarAMD

Subscribers
  llvm-commits, arsenm

Differential Revision:
  http://reviews.llvm.org/D18197

llvm-svn: 263720
2016-03-17 16:43:50 +00:00
Nicolai Haehnle 79cad857a0 AMDGPU: mark atomic instructions as sources of divergence
Summary:
As explained by the comment, threads will typically see different values
returned by atomic instructions even if the arguments are equal.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18156

llvm-svn: 263719
2016-03-17 16:21:59 +00:00
Nicolai Haehnle ef160de3e5 AMDGPU: Prevent uniform loops from becoming infinite
Summary:
Uniform loops where the branch leaving the loop is predicated on VCCNZ
must be skipped if EXEC = 0, otherwise they will be infinite.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18137

llvm-svn: 263658
2016-03-16 20:14:33 +00:00
Michel Danzer 302f83ac4e AMDGPU: Verify instructions in non-debug builds as well
And emit an error if it fails.

This prevents illegal instructions from getting sent to the GPU, which
would potentially result in a hang.

This is a candidate for the stable branch(es).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 263627
2016-03-16 09:10:42 +00:00
Michel Danzer beb79ceb19 AMDGPU/SI: Clean up indentation in SIInstrInfo::getDefaultRsrcDataFormat
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 263626
2016-03-16 09:10:35 +00:00
Changpeng Fang 01f6062227 AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS
Summary:
  Static LDS size is saved in MachineFunctionInfo::LDSSize,
We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter,
we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize.

Reviewers:
    arsenm
    tstellarAMD

Subscribers
    llvm-commits, arsenm

Differential Revision:
   http://reviews.llvm.org/D18064

llvm-svn: 263563
2016-03-15 17:28:44 +00:00
Sanjay Patel 5719584129 [DAG] use isUndef() ; NFCI
llvm-svn: 263448
2016-03-14 17:28:46 +00:00
Tom Stellard 331f981cc9 AMDGPU/SI: Handle wait states required for DPP instructions
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17543

llvm-svn: 263447
2016-03-14 17:05:56 +00:00
Marek Olsak ed2213e6ef AMDGPU/SI: Incomplete shader binaries need to finish execution at the end
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D18058

llvm-svn: 263441
2016-03-14 15:57:14 +00:00
Nicolai Haehnle 74127fe8d7 AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergence
Summary:
When multiple threads perform an atomic op with the same arguments, they
will usually see different return values.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18101

llvm-svn: 263440
2016-03-14 15:37:18 +00:00
Nikolay Haustov 79af6b33e0 [AMDGPU] Assembler: SOP* instruction fixes
s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit.
s_rfe_b64 has just one destination operand and no source.
Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that.
Add s_memrealtime test and change comments in smem.s to follow common style.
Change test for s_memtime to use non-zero register to make it really test encoding.
Add tests for s_buffer_load*.
Add tests for SOPC instructions (same for SI and VI)

Differential Revision: http://reviews.llvm.org/D18040

llvm-svn: 263420
2016-03-14 11:17:19 +00:00
Valery Pykhtin 0f97f17152 [AMDGPU] AsmParser: Factor out parseRegister. NFC.
llvm-svn: 263411
2016-03-14 07:43:42 +00:00
Valery Pykhtin 9e33c7f5d3 [AMDGPU] AsmParser: refactor post push_back vector access. NFC.
llvm-svn: 263409
2016-03-14 05:25:44 +00:00
Valery Pykhtin f91911c3ae [AMDGPU] AsmParser: remove redundant isReg checks. NFC.
llvm-svn: 263407
2016-03-14 05:01:45 +00:00
Valery Pykhtin a7f480b4e9 [AMDGPU] Fix VOPC instruction operand namings
Differential Revision: http://reviews.llvm.org/D17966

llvm-svn: 263242
2016-03-11 14:53:28 +00:00
Nikolay Haustov 6560781c4f [AMDGPU] Assembler: change v_madmk operands to have same order as mad.
The constant is now at source operand 1 (previously at 2).
This is also how it is in legacy AMD sp3 assembler.
Update tests.

Differential Revision: http://reviews.llvm.org/D17984

llvm-svn: 263212
2016-03-11 09:27:25 +00:00
Matt Arsenault bafc9dc591 AMDGPU: Don't use InstVisitor for AMDGPUPromoteAlloca
Frontend authors are strongly encouraged to keep allocas
in the entry block, so don't bother visiting every instruction
in the other blocks of the function.

llvm-svn: 263206
2016-03-11 08:20:50 +00:00
Matt Arsenault 6b6a2c37bc AMDGPU: R600 code splitting cleanup
Move a few functions only used by R600 to R600 specific code,
fix header macros to stop using R600, mark classes as final.

llvm-svn: 263204
2016-03-11 08:00:27 +00:00
Matt Arsenault 9a19c240c0 AMDGPU: Materialize sign bits with bfrev
If a constant is the same as the reverse of an inline immediate,
this is 4 bytes smaller than having to embed a 32-bit literal.

llvm-svn: 263201
2016-03-11 07:42:49 +00:00
Nicolai Haehnle b142770bfe AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.

The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.

For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.

BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17277

llvm-svn: 263140
2016-03-10 18:43:50 +00:00
Changpeng Fang 278a5b31a5 AMDGPU/SI: Define S_GETREG Intrinsic
Summary:
 Define s_getreg intrinsic to generate s_getreg instruction to read
hardware registers.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17892

llvm-svn: 263124
2016-03-10 16:47:15 +00:00
Valery Pykhtin a4db224d54 [AMDGPU] Fix SMEM instructions encoding/operand namings
Differential Revision: http://reviews.llvm.org/D17651

llvm-svn: 263108
2016-03-10 13:06:08 +00:00
Chad Rosier c27a18f39f [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.
http://reviews.llvm.org/D17967

llvm-svn: 263021
2016-03-09 16:00:35 +00:00
Teresa Johnson e50b23c67f Fix build error due to unsigned compare >= 0 in r263008 (NFC)
Fixes error from building with clang:

/usr/local/google/home/tejohnson/llvm/llvm_15/lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp:407:12:
error: comparison of unsigned expression >= 0 is always true
[-Werror,-Wtautological-compare]
  if ((Imm >= 0x000) && (Imm <= 0x0ff)) {
         ~~~ ^  ~~~~~

llvm-svn: 263014
2016-03-09 14:58:23 +00:00
Sam Kolton dfa29f7c5b [AMDGPU] Assembler: Support DPP instructions.
Supprot DPP syntax as used in SP3 (except several operands syntax).
Added dpp-specific operands in td-files.
Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter.
Support for VOP2 DPP instructions in td-files.
Some tests for DPP instructions.

ToDo:
  - VOP2bInst:
    - vcc is considered as operand
    - AsmMatcher doesn't apply mnemonic aliases when parsing operands
  - v_mac_f32
  - v_nop
  - disable instructions with 64-bit operands
  - change dpp_ctrl assembler representation to conform sp3

Review: http://reviews.llvm.org/D17804
llvm-svn: 263008
2016-03-09 12:29:31 +00:00
Nikolay Haustov 9b7577ed22 [AMDGPU] Assembler: Support abs() syntax.
Support legacy SP3 abs(v1) syntax. InstPrinter still uses |v1|.
Add tests.

Differential Revision: http://reviews.llvm.org/D17887

llvm-svn: 263006
2016-03-09 11:03:21 +00:00
Nikolay Haustov 8e3f099497 [AMDGPU] Assembler: Fix s_setpc_b64
s_setpc_b64 has just one 64-bit source which is the address of instruction to jump to.

Differential Revision: http://reviews.llvm.org/D17888

llvm-svn: 263005
2016-03-09 10:56:19 +00:00
Matt Arsenault c89f2919a4 AMDGPU: Match more med3 integer patterns
llvm-svn: 262864
2016-03-07 21:54:48 +00:00
Matt Arsenault 56356c8a9c AMDGPU: Remove a fixme for ptrrtoint handling
llvm-svn: 262854
2016-03-07 21:12:46 +00:00
Matt Arsenault 81d06015c6 AMDGPU: Move function only used by R600
llvm-svn: 262853
2016-03-07 21:10:13 +00:00
Valery Pykhtin dc11054f20 [AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler.
Engages code from r262804.

Differential Revision: http://reviews.llvm.org/D17151

llvm-svn: 262808
2016-03-06 20:25:36 +00:00
Valery Pykhtin 50cd3c4ec7 fix sanitizer-ppc64be-linux failure for r262804
error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]

http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/930

llvm-svn: 262805
2016-03-06 15:13:54 +00:00
Valery Pykhtin 499a5c6323 [AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields
Differential Revision: http://reviews.llvm.org/D17150

llvm-svn: 262804
2016-03-06 13:27:13 +00:00
Valery Pykhtin 0c6293da68 [AMDGPU] SOPxx instructions operand naming fixed in td files.
dst -> sdst
ssrcN -> srcN

Differential Revision: http://reviews.llvm.org/D17646

llvm-svn: 262801
2016-03-06 10:31:44 +00:00
Tom Stellard 649b5db557 AMDGPU/SI: Add support for spiling SGPRs to scratch buffer
Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17592

llvm-svn: 262732
2016-03-04 18:31:18 +00:00
Tom Stellard ebef6f9771 AMDGPU/SI: Enable frame index scavenging during PrologEpilogueInserter
Summary:
This allows us to use virtual registers when we need extra registers
for inserting spill instructions in SIRegisterInfo:eliminateFrameIndex().

Once all the frame indices have been eliminated, the
PrologEpilogueInserter does an extra pass over the program to replace
all virtual registers with physical ones.

This allows us to make more efficient use of our emergency spill slots,
so we only need to create one.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17591

llvm-svn: 262728
2016-03-04 18:02:01 +00:00
Sam Kolton f51f4b8370 Test commit access
llvm-svn: 262714
2016-03-04 12:29:14 +00:00
Valery Pykhtin 824e804bf6 test commit
llvm-svn: 262709
2016-03-04 10:59:50 +00:00
Nikolay Haustov 5bf46ac150 AMDGPU/SI: add llvm.amdgcn.image.atomic.* intrinsics
These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the
GL_ARB_shader_image_load_store extension.

Initial change by Nicolai H.hnle

Differential Revision: http://reviews.llvm.org/D17401

llvm-svn: 262701
2016-03-04 10:39:50 +00:00
Tom Stellard cc7067a668 AMDGPU: Insert two S_NOP instructions for every high level source statement.
Patch by: Konstantin Zhuravlyov

Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input.

Reviewers: tstellarAMD, arsenm

Subscribers: echristo, dblaikie, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17454

llvm-svn: 262579
2016-03-03 03:53:29 +00:00
Tom Stellard 600ca6fd39 AMDGPU/SI: Don't try to move scratch wave offset when there are no free SGPRs
Summary:
When there were no free SGPRs, we were trying to move this value into
some of the reserved registers which was causing a segmentation fault.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17590

llvm-svn: 262577
2016-03-03 03:45:09 +00:00
Matt Arsenault 8226fc4829 AMDGPU: Simplify boolean conditional return statements
Patch by Richard Thomson

llvm-svn: 262536
2016-03-02 23:00:21 +00:00
Nikolay Haustov f2fbabe9c1 Revert "[AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields"
Build failure with clang.

llvm-svn: 262477
2016-03-02 11:16:56 +00:00
Nikolay Haustov f0f24628cb Revert "[AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler."
Build failure with clang.

llvm-svn: 262475
2016-03-02 10:54:21 +00:00
Nikolay Haustov 73447a9714 [AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler.
complementary patch to table-driven amd_kernel_code_t field parser/printer utility. lit tests passed.

Patch by: Valery Pykhtin

Differential Revision: http://reviews.llvm.org/D17151

llvm-svn: 262474
2016-03-02 10:36:30 +00:00
Nikolay Haustov 6c8c74969a [AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields
This is going to be used in .hsatext disassembler and can be used
in current assembler parser (lit tests passed on parsing).
Code using this helpers isn't included in this patch.

Benefits:

unified approach
fast field name lookup on parsing
Later I would like to enhance some of the field naming/syntax using this code.

Patch by: Valery Pykhtin

Differential Revision: http://reviews.llvm.org/D17150

llvm-svn: 262473
2016-03-02 10:36:25 +00:00
Matt Arsenault f2dcb4737b AMDGPU: Fix bug 26659.
Fix checking the same instruction twice instead of the
second branch that uses vccz. I don't think this matters
currently because s_branch_vccnz is always used currently.

llvm-svn: 262457
2016-03-02 04:12:39 +00:00
Matt Arsenault a266bd8760 AMDGPU: Cleanup suggested in bug 23960
llvm-svn: 262456
2016-03-02 04:05:14 +00:00