[AMDGPU] Add support for Whole Wavefront Mode
Summary:
Whole Wavefront Wode (WWM) is similar to WQM, except that all of the
lanes are always enabled, regardless of control flow. This is required
for implementing wavefront reductions in non-uniform control flow, where
we need to use the inactive lanes to propagate intermediate results, so
they need to be enabled. We need to propagate WWM to uses (unless
they're explicitly marked as exact) so that they also propagate
intermediate results correctly. We do the analysis and exec mask munging
during the WQM pass, since there are interactions with WQM for things
that require both WQM and WWM. For simplicity, WWM is entirely
block-local -- blocks are never WWM on entry or exit of a block, and WWM
is not propagated to the block level. This means that computations
involving WWM cannot involve control flow, but we only ever plan to use
WWM for a few limited purposes (none of which involve control flow)
anyways.
Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There
isn't yet a way to turn WWM off -- that will be added in a future
change.
Finally, it turns out that turning on inactive lanes causes a number of
problems with register allocation. While the best long-term solution
seems like teaching LLVM's register allocator about predication, for now
we need to add some hacks to prevent ourselves from getting into trouble
due to constraints that aren't currently expressed in LLVM. For the gory
details, see the comments at the top of SIFixWWMLiveness.cpp.
Reviewers: arsenm, nhaehnle, tpr
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D35524
llvm-svn: 310087
2017-08-05 02:36:52 +08:00
|
|
|
# RUN: llc -march=amdgcn -verify-machineinstrs -run-pass si-wqm -o - %s | FileCheck %s
|
|
|
|
|
|
|
|
---
|
|
|
|
# Check for awareness that s_or_saveexec_b64 clobbers SCC
|
|
|
|
#
|
2019-04-01 23:19:52 +08:00
|
|
|
#CHECK: ENTER_WWM
|
[AMDGPU] Add support for Whole Wavefront Mode
Summary:
Whole Wavefront Wode (WWM) is similar to WQM, except that all of the
lanes are always enabled, regardless of control flow. This is required
for implementing wavefront reductions in non-uniform control flow, where
we need to use the inactive lanes to propagate intermediate results, so
they need to be enabled. We need to propagate WWM to uses (unless
they're explicitly marked as exact) so that they also propagate
intermediate results correctly. We do the analysis and exec mask munging
during the WQM pass, since there are interactions with WQM for things
that require both WQM and WWM. For simplicity, WWM is entirely
block-local -- blocks are never WWM on entry or exit of a block, and WWM
is not propagated to the block level. This means that computations
involving WWM cannot involve control flow, but we only ever plan to use
WWM for a few limited purposes (none of which involve control flow)
anyways.
Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There
isn't yet a way to turn WWM off -- that will be added in a future
change.
Finally, it turns out that turning on inactive lanes causes a number of
problems with register allocation. While the best long-term solution
seems like teaching LLVM's register allocator about predication, for now
we need to add some hacks to prevent ourselves from getting into trouble
due to constraints that aren't currently expressed in LLVM. For the gory
details, see the comments at the top of SIFixWWMLiveness.cpp.
Reviewers: arsenm, nhaehnle, tpr
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D35524
llvm-svn: 310087
2017-08-05 02:36:52 +08:00
|
|
|
#CHECK: S_CMP_LT_I32
|
|
|
|
#CHECK: S_CSELECT_B32
|
|
|
|
name: test_wwm_scc
|
[Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes mir parsing
Summary:
This catches malformed mir files which specify alignment as log2 instead of pow2.
See https://reviews.llvm.org/D65945 for reference,
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67433
llvm-svn: 371608
2019-09-11 19:16:48 +08:00
|
|
|
alignment: 1
|
[AMDGPU] Add support for Whole Wavefront Mode
Summary:
Whole Wavefront Wode (WWM) is similar to WQM, except that all of the
lanes are always enabled, regardless of control flow. This is required
for implementing wavefront reductions in non-uniform control flow, where
we need to use the inactive lanes to propagate intermediate results, so
they need to be enabled. We need to propagate WWM to uses (unless
they're explicitly marked as exact) so that they also propagate
intermediate results correctly. We do the analysis and exec mask munging
during the WQM pass, since there are interactions with WQM for things
that require both WQM and WWM. For simplicity, WWM is entirely
block-local -- blocks are never WWM on entry or exit of a block, and WWM
is not propagated to the block level. This means that computations
involving WWM cannot involve control flow, but we only ever plan to use
WWM for a few limited purposes (none of which involve control flow)
anyways.
Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There
isn't yet a way to turn WWM off -- that will be added in a future
change.
Finally, it turns out that turning on inactive lanes causes a number of
problems with register allocation. While the best long-term solution
seems like teaching LLVM's register allocator about predication, for now
we need to add some hacks to prevent ourselves from getting into trouble
due to constraints that aren't currently expressed in LLVM. For the gory
details, see the comments at the top of SIFixWWMLiveness.cpp.
Reviewers: arsenm, nhaehnle, tpr
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D35524
llvm-svn: 310087
2017-08-05 02:36:52 +08:00
|
|
|
exposesReturnsTwice: false
|
|
|
|
legalized: false
|
|
|
|
regBankSelected: false
|
|
|
|
selected: false
|
|
|
|
tracksRegLiveness: true
|
|
|
|
registers:
|
|
|
|
- { id: 0, class: sgpr_32, preferred-register: '' }
|
|
|
|
- { id: 1, class: sgpr_32, preferred-register: '' }
|
|
|
|
- { id: 2, class: sgpr_32, preferred-register: '' }
|
|
|
|
- { id: 3, class: vgpr_32, preferred-register: '' }
|
|
|
|
- { id: 4, class: vgpr_32, preferred-register: '' }
|
|
|
|
- { id: 5, class: sgpr_32, preferred-register: '' }
|
|
|
|
- { id: 6, class: vgpr_32, preferred-register: '' }
|
|
|
|
- { id: 7, class: vgpr_32, preferred-register: '' }
|
|
|
|
- { id: 8, class: sreg_32_xm0, preferred-register: '' }
|
|
|
|
- { id: 9, class: sreg_32, preferred-register: '' }
|
|
|
|
- { id: 10, class: sreg_32, preferred-register: '' }
|
|
|
|
- { id: 11, class: vgpr_32, preferred-register: '' }
|
|
|
|
- { id: 12, class: vgpr_32, preferred-register: '' }
|
|
|
|
liveins:
|
2018-02-01 06:04:26 +08:00
|
|
|
- { reg: '$sgpr0', virtual-reg: '%0' }
|
|
|
|
- { reg: '$sgpr1', virtual-reg: '%1' }
|
|
|
|
- { reg: '$sgpr2', virtual-reg: '%2' }
|
|
|
|
- { reg: '$vgpr0', virtual-reg: '%3' }
|
[AMDGPU] Add support for Whole Wavefront Mode
Summary:
Whole Wavefront Wode (WWM) is similar to WQM, except that all of the
lanes are always enabled, regardless of control flow. This is required
for implementing wavefront reductions in non-uniform control flow, where
we need to use the inactive lanes to propagate intermediate results, so
they need to be enabled. We need to propagate WWM to uses (unless
they're explicitly marked as exact) so that they also propagate
intermediate results correctly. We do the analysis and exec mask munging
during the WQM pass, since there are interactions with WQM for things
that require both WQM and WWM. For simplicity, WWM is entirely
block-local -- blocks are never WWM on entry or exit of a block, and WWM
is not propagated to the block level. This means that computations
involving WWM cannot involve control flow, but we only ever plan to use
WWM for a few limited purposes (none of which involve control flow)
anyways.
Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There
isn't yet a way to turn WWM off -- that will be added in a future
change.
Finally, it turns out that turning on inactive lanes causes a number of
problems with register allocation. While the best long-term solution
seems like teaching LLVM's register allocator about predication, for now
we need to add some hacks to prevent ourselves from getting into trouble
due to constraints that aren't currently expressed in LLVM. For the gory
details, see the comments at the top of SIFixWWMLiveness.cpp.
Reviewers: arsenm, nhaehnle, tpr
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D35524
llvm-svn: 310087
2017-08-05 02:36:52 +08:00
|
|
|
body: |
|
|
|
|
bb.0:
|
2018-02-01 06:04:26 +08:00
|
|
|
liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr0
|
[AMDGPU] Add support for Whole Wavefront Mode
Summary:
Whole Wavefront Wode (WWM) is similar to WQM, except that all of the
lanes are always enabled, regardless of control flow. This is required
for implementing wavefront reductions in non-uniform control flow, where
we need to use the inactive lanes to propagate intermediate results, so
they need to be enabled. We need to propagate WWM to uses (unless
they're explicitly marked as exact) so that they also propagate
intermediate results correctly. We do the analysis and exec mask munging
during the WQM pass, since there are interactions with WQM for things
that require both WQM and WWM. For simplicity, WWM is entirely
block-local -- blocks are never WWM on entry or exit of a block, and WWM
is not propagated to the block level. This means that computations
involving WWM cannot involve control flow, but we only ever plan to use
WWM for a few limited purposes (none of which involve control flow)
anyways.
Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There
isn't yet a way to turn WWM off -- that will be added in a future
change.
Finally, it turns out that turning on inactive lanes causes a number of
problems with register allocation. While the best long-term solution
seems like teaching LLVM's register allocator about predication, for now
we need to add some hacks to prevent ourselves from getting into trouble
due to constraints that aren't currently expressed in LLVM. For the gory
details, see the comments at the top of SIFixWWMLiveness.cpp.
Reviewers: arsenm, nhaehnle, tpr
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D35524
llvm-svn: 310087
2017-08-05 02:36:52 +08:00
|
|
|
|
2018-02-01 06:04:26 +08:00
|
|
|
%3 = COPY $vgpr0
|
|
|
|
%2 = COPY $sgpr2
|
|
|
|
%1 = COPY $sgpr1
|
|
|
|
%0 = COPY $sgpr0
|
|
|
|
S_CMP_LT_I32 0, %0, implicit-def $scc
|
|
|
|
%12 = V_ADD_I32_e32 %3, %3, implicit-def $vcc, implicit $exec
|
|
|
|
%5 = S_CSELECT_B32 %2, %1, implicit $scc
|
|
|
|
%11 = V_ADD_I32_e32 %5, %12, implicit-def $vcc, implicit $exec
|
|
|
|
$vgpr0 = WWM %11, implicit $exec
|
|
|
|
SI_RETURN_TO_EPILOG $vgpr0
|
[AMDGPU] Add support for Whole Wavefront Mode
Summary:
Whole Wavefront Wode (WWM) is similar to WQM, except that all of the
lanes are always enabled, regardless of control flow. This is required
for implementing wavefront reductions in non-uniform control flow, where
we need to use the inactive lanes to propagate intermediate results, so
they need to be enabled. We need to propagate WWM to uses (unless
they're explicitly marked as exact) so that they also propagate
intermediate results correctly. We do the analysis and exec mask munging
during the WQM pass, since there are interactions with WQM for things
that require both WQM and WWM. For simplicity, WWM is entirely
block-local -- blocks are never WWM on entry or exit of a block, and WWM
is not propagated to the block level. This means that computations
involving WWM cannot involve control flow, but we only ever plan to use
WWM for a few limited purposes (none of which involve control flow)
anyways.
Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There
isn't yet a way to turn WWM off -- that will be added in a future
change.
Finally, it turns out that turning on inactive lanes causes a number of
problems with register allocation. While the best long-term solution
seems like teaching LLVM's register allocator about predication, for now
we need to add some hacks to prevent ourselves from getting into trouble
due to constraints that aren't currently expressed in LLVM. For the gory
details, see the comments at the top of SIFixWWMLiveness.cpp.
Reviewers: arsenm, nhaehnle, tpr
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D35524
llvm-svn: 310087
2017-08-05 02:36:52 +08:00
|
|
|
|
|
|
|
...
|
[AMDGPU] Remove unnecessary v_mov from a register to itself in WQM lowering.
Summary:
- SI Whole Quad Mode phase is replacing WQM pseudo instructions with v_mov instructions.
While this is necessary for the special handling of moving results out of WWM live ranges,
it is not necessary for WQM live ranges. The result is a v_mov from a register to itself after every
WQM operation. This change uses a COPY psuedo in these cases, which allows the register
allocator to coalesce the moves away.
Reviewers: tpr, dstuttard, foad, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71386
2019-12-12 11:31:32 +08:00
|
|
|
|
|
|
|
---
|
|
|
|
# V_SET_INACTIVE, when its second operand is undef, is replaced by a
|
|
|
|
# COPY by si-wqm. Ensure the instruction is removed.
|
|
|
|
#CHECK-NOT: V_SET_INACTIVE
|
|
|
|
name: no_cfg
|
|
|
|
alignment: 1
|
|
|
|
exposesReturnsTwice: false
|
|
|
|
legalized: false
|
|
|
|
regBankSelected: false
|
|
|
|
selected: false
|
|
|
|
failedISel: false
|
|
|
|
tracksRegLiveness: true
|
|
|
|
hasWinCFI: false
|
|
|
|
registers:
|
|
|
|
- { id: 0, class: sgpr_32, preferred-register: '' }
|
|
|
|
- { id: 1, class: sgpr_32, preferred-register: '' }
|
|
|
|
- { id: 2, class: sgpr_32, preferred-register: '' }
|
|
|
|
- { id: 3, class: sgpr_32, preferred-register: '' }
|
|
|
|
- { id: 4, class: sgpr_32, preferred-register: '' }
|
|
|
|
- { id: 5, class: sgpr_128, preferred-register: '' }
|
|
|
|
- { id: 6, class: sgpr_128, preferred-register: '' }
|
|
|
|
- { id: 7, class: sreg_32, preferred-register: '' }
|
|
|
|
- { id: 8, class: vreg_64, preferred-register: '' }
|
|
|
|
- { id: 9, class: sreg_32, preferred-register: '' }
|
|
|
|
- { id: 10, class: vgpr_32, preferred-register: '' }
|
|
|
|
- { id: 11, class: vgpr_32, preferred-register: '' }
|
|
|
|
- { id: 12, class: sreg_32, preferred-register: '' }
|
|
|
|
- { id: 13, class: vgpr_32, preferred-register: '' }
|
|
|
|
- { id: 14, class: vgpr_32, preferred-register: '' }
|
|
|
|
- { id: 15, class: vgpr_32, preferred-register: '' }
|
|
|
|
- { id: 16, class: vgpr_32, preferred-register: '' }
|
|
|
|
liveins:
|
|
|
|
- { reg: '$sgpr0', virtual-reg: '%0' }
|
|
|
|
- { reg: '$sgpr1', virtual-reg: '%1' }
|
|
|
|
- { reg: '$sgpr2', virtual-reg: '%2' }
|
|
|
|
- { reg: '$sgpr3', virtual-reg: '%3' }
|
|
|
|
body: |
|
|
|
|
bb.0:
|
|
|
|
liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3
|
|
|
|
|
|
|
|
%3:sgpr_32 = COPY $sgpr3
|
|
|
|
%2:sgpr_32 = COPY $sgpr2
|
|
|
|
%1:sgpr_32 = COPY $sgpr1
|
|
|
|
%0:sgpr_32 = COPY $sgpr0
|
|
|
|
%6:sgpr_128 = REG_SEQUENCE %0, %subreg.sub0, %1, %subreg.sub1, %2, %subreg.sub2, %3, %subreg.sub3
|
|
|
|
%5:sgpr_128 = COPY %6
|
|
|
|
%7:sreg_32 = S_MOV_B32 0
|
|
|
|
%8:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %6, %7, 0, 0, 0, 0, 0, 0, implicit $exec
|
|
|
|
%16:vgpr_32 = COPY %8.sub1
|
|
|
|
%11:vgpr_32 = COPY %16
|
|
|
|
%10:vgpr_32 = V_SET_INACTIVE_B32 %11, undef %12:sreg_32, implicit $exec
|
|
|
|
%14:vgpr_32 = COPY %7
|
|
|
|
%13:vgpr_32 = V_MOV_B32_dpp %14, killed %10, 323, 12, 15, 0, implicit $exec
|
|
|
|
early-clobber %15:vgpr_32 = WWM killed %13, implicit $exec
|
|
|
|
BUFFER_STORE_DWORD_OFFSET_exact killed %15, %6, %7, 4, 0, 0, 0, 0, 0, implicit $exec
|
|
|
|
S_ENDPGM 0
|
|
|
|
|
|
|
|
...
|
2020-03-16 21:33:32 +08:00
|
|
|
|
|
|
|
---
|
|
|
|
# Ensure that wwm is not put around an EXEC copy
|
|
|
|
#CHECK-LABEL: name: copy_exec
|
|
|
|
#CHECK: %7:sreg_64 = COPY $exec
|
|
|
|
#CHECK-NEXT: %14:sreg_64 = ENTER_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
|
|
|
|
#CHECK-NEXT: %8:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
|
|
|
|
#CHECK-NEXT: $exec = EXIT_WWM %14
|
|
|
|
#CHECK-NEXT: %9:vgpr_32 = V_MBCNT_LO_U32_B32_e64 %7.sub0, 0, implicit $exec
|
|
|
|
name: copy_exec
|
|
|
|
tracksRegLiveness: true
|
|
|
|
body: |
|
|
|
|
bb.0:
|
|
|
|
liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3
|
|
|
|
|
|
|
|
%3:sgpr_32 = COPY $sgpr3
|
|
|
|
%2:sgpr_32 = COPY $sgpr2
|
|
|
|
%1:sgpr_32 = COPY $sgpr1
|
|
|
|
%0:sgpr_32 = COPY $sgpr0
|
|
|
|
%4:sgpr_128 = REG_SEQUENCE %0, %subreg.sub0, %1, %subreg.sub1, %2, %subreg.sub2, %3, %subreg.sub3
|
|
|
|
%5:sreg_32 = S_MOV_B32 0
|
|
|
|
%6:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET %4, %5, 0, 0, 0, 0, 0, 0, implicit $exec
|
|
|
|
|
|
|
|
%8:sreg_64 = COPY $exec
|
|
|
|
%9:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
|
|
|
|
%10:vgpr_32 = V_MBCNT_LO_U32_B32_e64 %8.sub0:sreg_64, 0, implicit $exec
|
|
|
|
%11:vgpr_32 = V_MOV_B32_dpp %9:vgpr_32, %10:vgpr_32, 312, 15, 15, 0, implicit $exec
|
|
|
|
%12:sreg_32 = V_READLANE_B32 %11:vgpr_32, 63
|
|
|
|
early-clobber %13:sreg_32 = WWM %9:vgpr_32, implicit $exec
|
|
|
|
|
|
|
|
%14:vgpr_32 = COPY %13
|
|
|
|
BUFFER_STORE_DWORD_OFFSET_exact killed %14, %4, %5, 4, 0, 0, 0, 0, 0, implicit $exec
|
|
|
|
S_ENDPGM 0
|
|
|
|
|
|
|
|
...
|