Commit Graph

18 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin 4fa44f989e [AMDGPU] Fixed dpp test. NFC. 2019-11-13 16:38:54 -08:00
Stanislav Mekhanoshin 1184c27fa5 [AMDGPU] Support mov dpp with 64 bit operands
We define mov/update dpp intrinsics as overloaded but do not
support i64, which is a practically useful type. Fix the
selection and lowering.

Differential Revision: https://reviews.llvm.org/D68673

llvm-svn: 374910
2019-10-15 16:41:15 +00:00
Alexander Timofeev c4d256a590 [AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'
Detailed description:

    After https://reviews.llvm.org/D59990 submit several issues were discovered.
    Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly.

    Discovered issues were addressed in the following commits:

    https://reviews.llvm.org/D67662
    https://reviews.llvm.org/D67101
    https://reviews.llvm.org/D63953
    https://reviews.llvm.org/D63731

    This change brings back AMDGPU specific changes.

  Reviewed by: rampitec, arsenm

  Differential Revision: https://reviews.llvm.org/D68635

llvm-svn: 374767
2019-10-14 12:01:10 +00:00
Stanislav Mekhanoshin 245b5ba344 [AMDGPU] gfx1010 dpp16 and dpp8
Differential Revision: https://reviews.llvm.org/D63203

llvm-svn: 363186
2019-06-12 18:02:41 +00:00
Alexander Timofeev 37bd9bd137 [AMDGPU] Partial revert for the ba447bae74
"Divergence driven ISel. Assign register class for cross block values
       according to the divergence."
       that discovered the design flaw leading to several issues that
       required to be solved before.

       This change reverts AMDGPU specific changes and keeps common part
       unaffected.

llvm-svn: 362749
2019-06-06 21:13:02 +00:00
Alexander Timofeev ba447bae74 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
             the correct register classes to the cross block values beforehand. For the divergent targets
             same value type requires different register classes dependent on the value divergence.

    Reviewers: rampitec, nhaehnle

    Differential Revision: https://reviews.llvm.org/D59990

    This commit was reverted because of the build failure.
    The reason was mlformed patch.
    Build failure fixed.

llvm-svn: 361741
2019-05-26 20:33:26 +00:00
Peter Collingbourne 3b93737446 Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio

llvm-svn: 361688
2019-05-25 01:52:38 +00:00
Alexander Timofeev dffedea014 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
         the correct register classes to the cross block values beforehand. For the divergent targets
         same value type requires different register classes dependent on the value divergence.

Reviewers: rampitec, nhaehnle

Differential Revision: https://reviews.llvm.org/D59990

llvm-svn: 361644
2019-05-24 15:32:18 +00:00
Mark Searles 72da47df25 run post-RA hazard recognizer pass late
Memory legalizer, waitcnt, and shrink  passes can perturb the instructions,
which means that the post-RA hazard recognizer pass should run after them.
Otherwise, one of those passes may invalidate the work done by the hazard
recognizer. Note that this has adverse side-effect that any consecutive
S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a
single S_NOP <N>. This should be addressed in a follow-on patch.

Differential Revision: https://reviews.llvm.org/D49288

llvm-svn: 337154
2018-07-16 10:02:41 +00:00
Connor Abbott 79f3ade51a [AMDGPU] Add pseudo "old" source to all DPP instructions
Summary:
All instructions with the DPP modifier may not write to certain lanes of
the output if bound_ctrl=1 is set or any bits in bank_mask or row_mask
aren't set, so the destination register may be both defined and modified.
The right way to handle this is to add a constraint that the destination
register is the same as one of the inputs. We could tie the destination
to the first source, but that would be too restrictive for some use-cases
where we want the destination to be some other value before the
instruction executes. Instead, add a fake "old" source and tie it to the
destination. Effectively, the "old" source defines what value unwritten
lanes will get. We'll expose this functionality to users with a new
intrinsic later.

Also, we want to use DPP instructions for computing derivatives, which
means we need to set WQM for them. We also need to enable the entire
wavefront when using DPP intrinsics to implement nonuniform subgroup
reductions, since otherwise we'll get incorrect results in some cases.
To accomodate this, add a new operand to all DPP instructions which will
be interpreted by the SI WQM pass. This will be exposed with a new
intrinsic later. We'll also add support for Whole Wavefront Mode later.

I also fixed llvm.amdgcn.mov.dpp to overwrite the source and fixed up
the test. However, I could also keep the old behavior (where lanes that
aren't written are undefined) if people want it.

Reviewers: tstellar, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34716

llvm-svn: 310283
2017-08-07 19:10:56 +00:00
Connor Abbott 82267a553d test commit
llvm-svn: 309981
2017-08-03 20:22:30 +00:00
Matt Arsenault 3dbeefa978 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Matt Arsenault 7aad8fd8f4 Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to
for the mesa path.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 292982
2017-01-24 22:02:15 +00:00
Tom Stellard a27007eb4f AMDGPU/SI: Use hazard recognizer to detect DPP hazards
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18603

llvm-svn: 268247
2016-05-02 16:23:09 +00:00
Sam Kolton a74cd526e9 [AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3
Review: http://reviews.llvm.org/D18267
llvm-svn: 263789
2016-03-18 15:35:51 +00:00
Tom Stellard 331f981cc9 AMDGPU/SI: Handle wait states required for DPP instructions
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17543

llvm-svn: 263447
2016-03-14 17:05:56 +00:00
Sam Kolton dfa29f7c5b [AMDGPU] Assembler: Support DPP instructions.
Supprot DPP syntax as used in SP3 (except several operands syntax).
Added dpp-specific operands in td-files.
Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter.
Support for VOP2 DPP instructions in td-files.
Some tests for DPP instructions.

ToDo:
  - VOP2bInst:
    - vcc is considered as operand
    - AsmMatcher doesn't apply mnemonic aliases when parsing operands
  - v_mac_f32
  - v_nop
  - disable instructions with 64-bit operands
  - change dpp_ctrl assembler representation to conform sp3

Review: http://reviews.llvm.org/D17804
llvm-svn: 263008
2016-03-09 12:29:31 +00:00
Tom Stellard 4409051d00 AMDGPU/SI: Add llvm.amdgcn.mov.dpp intrinsic
This intrinsic will be used to expose dpp functionality to higher-level
languages. It will map to the dpp version of v_mov_b32.

llvm-svn: 260792
2016-02-13 02:09:49 +00:00