Commit Graph

3582 Commits

Author SHA1 Message Date
Matt Arsenault 224d8cd987 AMDGPU: Remove mubuf specific PatFrags
These are identical to the *_global PatFrag, and will only create more
work to get the GlobalISel importer to handle them.

llvm-svn: 365350
2019-07-08 16:53:53 +00:00
Matt Arsenault 430b0497e7 AMDGPU: Move waitcnt intrinsic to instruction definition pattern
llvm-svn: 365349
2019-07-08 16:53:48 +00:00
Dmitry Preobrazhensky 2eff0318c6 [AMDGPU][MC] Corrected parsing of FLAT offset modifier
Summary of changes:

- simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset;
- provided information about error position for pre-gfx9 targets;
- improved errors handling.

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D64244

llvm-svn: 365321
2019-07-08 14:27:37 +00:00
Jay Foad 38902350ef [AMDGPU] Use a named predicate instead of a magic number.
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64201

llvm-svn: 365294
2019-07-08 07:04:58 +00:00
Matt Arsenault 5e9610a3f5 AMDGPU: Fix assert in clang test
llvm-svn: 365245
2019-07-05 21:09:53 +00:00
Matt Arsenault e7e23e3e91 AMDGPU: Make AMDGPUPerfHintAnalysis an SCC pass
Add a string attribute instead of directly setting
MachineFunctionInfo. This avoids trying to get the analysis in the
MachineFunctionInfo in a way that doesn't work with the new pass
manager.

This will also avoid re-visiting the call graph for every single
function.

llvm-svn: 365241
2019-07-05 20:26:13 +00:00
Michael Liao 8d6ea2d48c [CodeGen] Enhance `MachineInstrSpan` to allow the end of MBB to be used.
Summary:
- Explicitly specify the parent MBB to allow the end iterator to be
  used.

Reviewers: aprantl, MatzeB, craig.topper, qcolombet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64261

llvm-svn: 365240
2019-07-05 20:23:59 +00:00
Christudasan Devadasan 652ad423bb [NFC] A test commit to check the access permission. Removed a blank line.
llvm-svn: 365223
2019-07-05 17:07:42 +00:00
Yaxun Liu a62413526d [AMDGPU] Added a new metadata for multi grid sync implicit argument
Patch by Christudasan Devadasan.

Differential Revision: https://reviews.llvm.org/D63886

llvm-svn: 365217
2019-07-05 16:05:17 +00:00
Jay Foad 7e0c10b55f [AMDGPU] DPP combiner: recognize identities for more opcodes
Summary:
This allows the DPP combiner to kick in more often. For example the
exclusive scan generated by the atomic optimizer for a divergent atomic
add used to look like this:

        v_mov_b32_e32 v3, v1
        v_mov_b32_e32 v5, v1
        v_mov_b32_e32 v6, v1
        v_mov_b32_dpp v3, v2  wave_shr:1 row_mask:0xf bank_mask:0xf
        s_nop 1
        v_add_u32_dpp v4, v3, v3  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
        v_mov_b32_dpp v5, v3  row_shr:2 row_mask:0xf bank_mask:0xf
        v_mov_b32_dpp v6, v3  row_shr:3 row_mask:0xf bank_mask:0xf
        v_add3_u32 v3, v4, v5, v6
        v_mov_b32_e32 v4, v1
        s_nop 1
        v_mov_b32_dpp v4, v3  row_shr:4 row_mask:0xf bank_mask:0xe
        v_add_u32_e32 v3, v3, v4
        v_mov_b32_e32 v4, v1
        s_nop 1
        v_mov_b32_dpp v4, v3  row_shr:8 row_mask:0xf bank_mask:0xc
        v_add_u32_e32 v3, v3, v4
        v_mov_b32_e32 v4, v1
        s_nop 1
        v_mov_b32_dpp v4, v3  row_bcast:15 row_mask:0xa bank_mask:0xf
        v_add_u32_e32 v3, v3, v4
        s_nop 1
        v_mov_b32_dpp v1, v3  row_bcast:31 row_mask:0xc bank_mask:0xf
        v_add_u32_e32 v1, v3, v1
        v_add_u32_e32 v1, v2, v1
        v_readlane_b32 s0, v1, 63

But now most of the dpp movs are combined into adds:

        v_mov_b32_e32 v3, v1
        v_mov_b32_e32 v5, v1
        s_nop 0
        v_mov_b32_dpp v3, v2  wave_shr:1 row_mask:0xf bank_mask:0xf
        s_nop 1
        v_add_u32_dpp v4, v3, v3  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
        v_mov_b32_dpp v5, v3  row_shr:2 row_mask:0xf bank_mask:0xf
        v_mov_b32_dpp v1, v3  row_shr:3 row_mask:0xf bank_mask:0xf
        v_add3_u32 v1, v4, v5, v1
        s_nop 1
        v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
        s_nop 1
        v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
        s_nop 1
        v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
        s_nop 1
        v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf
        v_add_u32_e32 v1, v2, v1
        v_readlane_b32 s0, v1, 63

Reviewers: arsenm, vpykhtin

Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64207

llvm-svn: 365211
2019-07-05 14:52:48 +00:00
Tim Renouf 5816889c74 [AMDGPU] Custom lower INSERT_SUBVECTOR v3, v4, v5, v8
Summary:
Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these
sizes has been marked "expand", which made LegalizeDAG lower it to loads
and stores via a stack slot. The code got optimized a bit later, but the
now-unused stack slot was never deleted.

This commit avoids that problem by custom lowering INSERT_SUBVECTOR into
an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the
subvector to insert.

V2: Addressed review comments re test.

Differential Revision: https://reviews.llvm.org/D63160

Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1
llvm-svn: 365148
2019-07-04 17:38:24 +00:00
Jay Foad 0cd50b2a95 Fix typos in comments and debug output.
llvm-svn: 365146
2019-07-04 15:04:29 +00:00
Michael Liao 7a9ad430fe [AMDGPU] Correct the setting of `FlatScratchInit`.
Summary: - That flag setting should skip spilling stack slot.

Reviewers: arsenm, rampitec

Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64143

llvm-svn: 365137
2019-07-04 13:29:45 +00:00
Matt Arsenault 5b0922fe1f AMDGPU: Add pass to lower SGPR spills
This is split out from my patches to split register allocation into a
separate SGPR and VGPR phase, and has some parts that aren't yet used
(like maintaining LiveIntervals).

This simplifies making the frame pointer register callee saved. As it
is now, the code to determine callee saves needs to predict all the
possible SGPR spills and how many callee saved VGPRs are needed. By
handling this before PrologEpilogInserter, it's possible to just check
the spill objects that already exist.

Change-Id: I29e6df4034afcf949e06f8ef44206acb94696f04
llvm-svn: 365095
2019-07-03 23:32:29 +00:00
Matt Arsenault c96c174557 Revert "[AMDGPU] Kernel arg metadata: added support for "__hip_texture" type."
This reverts commit r365073.

This is crashing, and is improperly relying on IR type names.

llvm-svn: 365087
2019-07-03 21:34:34 +00:00
Konstantin Pyzhov 6f419a3370 [AMDGPU] Kernel arg metadata: added support for "__hip_texture" type.
Summary:
Hip texture type is equivalent to OpenCL image. So, we need to set the Image type for kernel arguments with __hip_texture type.

Differential revision: https://reviews.llvm.org/D63850

llvm-svn: 365073
2019-07-03 19:11:35 +00:00
Michael Liao 80177ca5a9 [AMDGPU] Enable serializing of argument info.
Summary:
- Support serialization of all arguments in machine function info. This
  enables fabricating MIR tests depending on argument info.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64096

llvm-svn: 364995
2019-07-03 02:00:21 +00:00
Matt Arsenault c04aab9c06 AMDGPU: Look through bundles for existing waitcnts
These aren't produced now, but will be in a future patch.

llvm-svn: 364983
2019-07-03 00:30:44 +00:00
Matt Arsenault 5fe851b6cd AMDGPU: Custom lower vector_shuffle for v4i16/v4f16
Ordinarily it is lowered as a build_vector of each extract_vector_elt,
which in turn get lowered to bitcasts and bit shifts. Very little
understand the lowered extract pattern, resulting in much worse
code. We treat concat_vectors of v2i16 as legal, so prefer that.

llvm-svn: 364959
2019-07-02 19:15:45 +00:00
Alexander Timofeev 66ac6b409d [AMDGPU] LCSSA pass added in preISel. Fixing typo in previous commit
llvm-svn: 364952
2019-07-02 18:16:42 +00:00
Alexander Timofeev 2ce560f029 [AMDGPU] LCSSA pass added in preISel. Uniform values defined in the divergent loop and used outside
Differential Revision: https://reviews.llvm.org/D63953

Reviewers: rampitec, nhaehnle, arsenm
llvm-svn: 364950
2019-07-02 17:59:44 +00:00
Matt Arsenault 50be3481d4 AMDGPU/GlobalISel: Try generated matcher with intrinsics
llvm-svn: 364933
2019-07-02 14:52:16 +00:00
Matt Arsenault a8bff4b963 AMDGPU/GlobalISel: Select mul
llvm-svn: 364932
2019-07-02 14:52:14 +00:00
Matt Arsenault 70a4d3f67c AMDGPU/GlobalISel: Fix G_GEP with mixed SGPR/VGPR operands
The register bank for the destination of the sample argument copy was
wrong. We shouldn't be constraining each source to the result register
bank. Allow constraining the original register to the right size.

llvm-svn: 364928
2019-07-02 14:40:22 +00:00
Matt Arsenault ed63399244 AMDGPU/GlobalISel: Select G_FENCE
Manually select to workaround tablegen emitter emitting checks for
G_CONSTANT.

llvm-svn: 364927
2019-07-02 14:17:38 +00:00
Matt Arsenault 40c08052a5 AMDGPU: Correct properties for adjcallstack* pseudos
These should be SALU writes, and these are lowered to instructions
that def SCC.

llvm-svn: 364859
2019-07-01 22:01:05 +00:00
Matt Arsenault bae3636f96 AMDGPU/GlobalISel: Handle more input argument intrinsics
llvm-svn: 364836
2019-07-01 18:50:50 +00:00
Matt Arsenault 9e8e8c60fa AMDGPU/GlobalISel: Lower kernarg segment ptr intrinsics
llvm-svn: 364835
2019-07-01 18:49:01 +00:00
Matt Arsenault 756d81905f AMDGPU/GlobalISel: Legalize workgroup ID intrinsics
llvm-svn: 364834
2019-07-01 18:47:22 +00:00
Matt Arsenault e2c86cce3a AMDGPU/GlobalISel: Legalize workitem ID intrinsics
Tests don't cover the masked input path since non-kernel arguments
aren't lowered yet.

Test is copied directly from the existing test, with 2 additions.

llvm-svn: 364833
2019-07-01 18:45:36 +00:00
Matt Arsenault e15770aec4 AMDGPU/GlobalISel: Custom lower control flow intrinsics
Replace the brcond for the 2 cases that act as branches. For now
follow how the current system works, although I think we can
eventually get rid of the pseudos.

llvm-svn: 364832
2019-07-01 18:40:23 +00:00
Matt Arsenault 4073b33786 AMDGPU/GlobalISel: Handle 16-bit SALU min/max
This needs to be extended to s32, and expanded into cmp+select.  This
is relying on the fact that widenScalar happens to leave the
instruction in place, but this isn't a guaranteed property of
LegalizerHelper.

llvm-svn: 364831
2019-07-01 18:33:37 +00:00
Matt Arsenault 5a7d5111e5 AMDGPU/GlobalISel: Lower SALU min/max to cmp+select
Use a change observer to apply a register bank to the newly created
intermediate result register.

llvm-svn: 364830
2019-07-01 18:30:45 +00:00
Matt Arsenault ef59cb6982 AMDGPU/GlobalISel: Legalize s16 add/sub/mul
If this is scalar, promote to s32. Use a new observer class to assign
the register bank of newly created registers.

llvm-svn: 364827
2019-07-01 18:18:55 +00:00
Matt Arsenault 9470bb262b AMDGPU/GlobalISel: Fix allowing non-boolean conditions for G_SELECT
The condition register bank must be scc or vcc so that a copy will be
inserted, which will be lowered to a compare.

Currently greedy unnecessarily forces using a VCC select.

llvm-svn: 364825
2019-07-01 18:13:12 +00:00
Matt Arsenault b2ea20eedd AMDGPU/GlobalISel: RegBankSelect for sendmsg/sendmsghalt
llvm-svn: 364819
2019-07-01 17:40:18 +00:00
Matt Arsenault 40d1faf38f AMDGPU/GlobalISel: Legalize s16 fcmp
llvm-svn: 364817
2019-07-01 17:35:53 +00:00
Nicolai Haehnle 10c911db63 AMDGPU/GFX10: implement ds_ordered_count changes
Summary:
ds_ordered_count can now simultaneously operate on up to 4 dwords
in a single instruction, which are taken from (and returned to)
lanes 0..3 of a single VGPR.

Change-Id: I19b6e7b0732b617c10a779a7f9c0303eec7dd276

Reviewers: mareko, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63716

llvm-svn: 364815
2019-07-01 17:17:52 +00:00
Nicolai Haehnle 4dc3b2bf95 AMDGPU: Support GDS atomics
Summary:
Original patch by Marek Olšák

Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab

Reviewers: mareko, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63452

llvm-svn: 364814
2019-07-01 17:17:45 +00:00
Matt Arsenault 1094e6a814 AMDGPU/GlobalISel: RegBankSelect for DS ordered add/swap
llvm-svn: 364811
2019-07-01 17:04:57 +00:00
Matt Arsenault 265059eaf6 AMDGPU/GlobalISel: RegBankSelect for amdgcn.writelane
llvm-svn: 364808
2019-07-01 16:41:36 +00:00
Matt Arsenault a310727830 AMDGPU/GlobalISel: Fail instead of assert when selecting loads
llvm-svn: 364807
2019-07-01 16:36:39 +00:00
Matt Arsenault 0a52e9d026 AMDGPU/GlobalISel: Complete implementation of G_GEP
Also works around tablegen defect in selecting add with unused carry,
but if we have to manually select GEP, might as well handle add
manually.

llvm-svn: 364806
2019-07-01 16:34:48 +00:00
Matt Arsenault e1006259d8 AMDGPU/GlobalISel: Select G_PHI
llvm-svn: 364805
2019-07-01 16:32:47 +00:00
Matt Arsenault d810ff2588 AMDGPU/GlobalISel: Try to select VOP3 form of add
There are several things broken, but at least emit the right thing for
gfx9.

The import of the pattern with the unused carry out seems to not
work. Needs a special class for clamp, because OperandWithDefaultOps
doesn't really work.

llvm-svn: 364804
2019-07-01 16:27:32 +00:00
Matt Arsenault 62d64b0c30 AMDGPU/GlobalISel: RegBankSelect for readlane/readfirstlane
llvm-svn: 364801
2019-07-01 16:19:39 +00:00
Tom Stellard 9e9dd30de3 AMDGPU/GlobalISel: Implement select for 32-bit G_ADD
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58804

llvm-svn: 364797
2019-07-01 16:09:33 +00:00
Matt Arsenault 2ab25f9ceb AMDGPU/GlobalISel: Select G_BRCOND for vcc
llvm-svn: 364795
2019-07-01 16:06:02 +00:00
Matt Arsenault cda82f0bb6 AMDGPU/GlobalISel: Select G_FRAME_INDEX
llvm-svn: 364789
2019-07-01 15:48:18 +00:00
Nicolai Haehnle 7cfd99ab15 AMDGPU/GFX10: fix scratch resource descriptor
Summary:
The stride should depend on the wave size, not the hardware generation.

Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be
relevant.

Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6

Reviewers: arsenm, rampitec, mareko

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63808

llvm-svn: 364788
2019-07-01 15:43:00 +00:00