2018-12-19 18:17:49 +08:00
|
|
|
# RUN: llc -march=amdgcn -mcpu=gfx803 -verify-machineinstrs -run-pass si-insert-waitcnts -o - %s | FileCheck -check-prefixes=GCN,GFX8 %s
|
|
|
|
# RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -run-pass si-insert-waitcnts -o - %s | FileCheck -check-prefixes=GCN,GFX9 %s
|
AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo
Summary:
MachineLoopInfo cannot be relied on for correctness, because it cannot
properly recognize loops in irreducible control flow which can be
introduced by late machine basic block optimization passes. See the new
test case for the reduced form of an example that occurred in practice.
Use a simple fixpoint iteration instead.
In order to facilitate this change, refactor WaitcntBrackets so that it
only tracks pending events and registers, rather than also maintaining
state that is relevant for the high-level algorithm. Various accessor
methods can be removed or made private as a consequence.
Affects (in radv):
- dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex}
Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU")
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54231
llvm-svn: 347853
2018-11-29 19:06:26 +08:00
|
|
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
define amdgpu_ps void @irreducible_loop() {
|
2018-12-19 18:17:49 +08:00
|
|
|
ret void
|
|
|
|
}
|
|
|
|
define amdgpu_ps void @irreducible_loop_extended() {
|
AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo
Summary:
MachineLoopInfo cannot be relied on for correctness, because it cannot
properly recognize loops in irreducible control flow which can be
introduced by late machine basic block optimization passes. See the new
test case for the reduced form of an example that occurred in practice.
Use a simple fixpoint iteration instead.
In order to facilitate this change, refactor WaitcntBrackets so that it
only tracks pending events and registers, rather than also maintaining
state that is relevant for the high-level algorithm. Various accessor
methods can be removed or made private as a consequence.
Affects (in radv):
- dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex}
Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU")
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54231
llvm-svn: 347853
2018-11-29 19:06:26 +08:00
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
...
|
|
|
|
---
|
2018-12-19 18:17:49 +08:00
|
|
|
|
|
|
|
# GCN-LABEL: name: irreducible_loop{{$}}
|
|
|
|
# GCN: S_LOAD_DWORDX4_IMM
|
|
|
|
# GFX8: S_WAITCNT 127{{$}}
|
|
|
|
# GFX9: S_WAITCNT 49279{{$}}
|
|
|
|
# GCN: S_BUFFER_LOAD_DWORD_IMM
|
|
|
|
# GFX8: S_WAITCNT 127{{$}}
|
|
|
|
# GFX9: S_WAITCNT 49279{{$}}
|
|
|
|
# GCN: S_CMP_GE_I32
|
AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo
Summary:
MachineLoopInfo cannot be relied on for correctness, because it cannot
properly recognize loops in irreducible control flow which can be
introduced by late machine basic block optimization passes. See the new
test case for the reduced form of an example that occurred in practice.
Use a simple fixpoint iteration instead.
In order to facilitate this change, refactor WaitcntBrackets so that it
only tracks pending events and registers, rather than also maintaining
state that is relevant for the high-level algorithm. Various accessor
methods can be removed or made private as a consequence.
Affects (in radv):
- dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex}
Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU")
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54231
llvm-svn: 347853
2018-11-29 19:06:26 +08:00
|
|
|
name: irreducible_loop
|
|
|
|
body: |
|
|
|
|
bb.0:
|
|
|
|
successors: %bb.3, %bb.2
|
|
|
|
|
|
|
|
S_CBRANCH_VCCZ %bb.2, implicit $vcc
|
|
|
|
S_BRANCH %bb.3
|
|
|
|
|
|
|
|
bb.1:
|
|
|
|
successors: %bb.3, %bb.2
|
|
|
|
|
|
|
|
S_CBRANCH_VCCNZ %bb.3, implicit $vcc
|
|
|
|
|
|
|
|
bb.2:
|
|
|
|
successors: %bb.3
|
|
|
|
|
2019-05-01 06:08:23 +08:00
|
|
|
renamable $sgpr4_sgpr5_sgpr6_sgpr7 = S_LOAD_DWORDX4_IMM renamable $sgpr0_sgpr1, 0, 0, 0
|
|
|
|
renamable $sgpr3 = S_BUFFER_LOAD_DWORD_IMM killed renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0, 0
|
AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo
Summary:
MachineLoopInfo cannot be relied on for correctness, because it cannot
properly recognize loops in irreducible control flow which can be
introduced by late machine basic block optimization passes. See the new
test case for the reduced form of an example that occurred in practice.
Use a simple fixpoint iteration instead.
In order to facilitate this change, refactor WaitcntBrackets so that it
only tracks pending events and registers, rather than also maintaining
state that is relevant for the high-level algorithm. Various accessor
methods can be removed or made private as a consequence.
Affects (in radv):
- dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex}
Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU")
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54231
llvm-svn: 347853
2018-11-29 19:06:26 +08:00
|
|
|
|
|
|
|
bb.3:
|
|
|
|
successors: %bb.1, %bb.4
|
|
|
|
|
|
|
|
S_CMP_GE_I32 renamable $sgpr2, renamable $sgpr3, implicit-def $scc
|
|
|
|
S_CBRANCH_SCC0 %bb.1, implicit killed $scc
|
|
|
|
|
|
|
|
bb.4:
|
|
|
|
|
[AMDGPU] Add support for immediate operand for S_ENDPGM
Summary:
Add support for immediate operand in S_ENDPGM
Change-Id: I0c56a076a10980f719fb2a8f16407e9c301013f6
Reviewers: alexshap
Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, eraman, arphaman, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59213
llvm-svn: 355902
2019-03-12 17:52:58 +08:00
|
|
|
S_ENDPGM 0
|
AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo
Summary:
MachineLoopInfo cannot be relied on for correctness, because it cannot
properly recognize loops in irreducible control flow which can be
introduced by late machine basic block optimization passes. See the new
test case for the reduced form of an example that occurred in practice.
Use a simple fixpoint iteration instead.
In order to facilitate this change, refactor WaitcntBrackets so that it
only tracks pending events and registers, rather than also maintaining
state that is relevant for the high-level algorithm. Various accessor
methods can be removed or made private as a consequence.
Affects (in radv):
- dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex}
Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU")
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54231
llvm-svn: 347853
2018-11-29 19:06:26 +08:00
|
|
|
|
|
|
|
...
|
2018-12-19 18:17:49 +08:00
|
|
|
|
|
|
|
# GCN-LABEL: name: irreducible_loop_extended
|
|
|
|
|
|
|
|
# GCN: S_LOAD_DWORDX4_IMM
|
|
|
|
# GFX8: S_WAITCNT 127{{$}}
|
|
|
|
# GFX9: S_WAITCNT 49279{{$}}
|
|
|
|
# GCN: BUFFER_STORE_DWORD_OFFEN_exact
|
|
|
|
# GFX8: S_WAITCNT 127{{$}}
|
|
|
|
# GFX9: S_WAITCNT 49279{{$}}
|
|
|
|
# GCN: BUFFER_STORE_DWORD_OFFEN_exact
|
|
|
|
# GCN: S_LOAD_DWORDX4_IMM
|
|
|
|
# GFX8: S_WAITCNT 127{{$}}
|
|
|
|
# GFX9: S_WAITCNT 49279{{$}}
|
|
|
|
# GCN: BUFFER_ATOMIC_ADD_OFFSET_RTN
|
|
|
|
# GCN: S_WAITCNT 3952
|
|
|
|
# GCN: FLAT_STORE_DWORD
|
[AMDGPU] Add support for immediate operand for S_ENDPGM
Summary:
Add support for immediate operand in S_ENDPGM
Change-Id: I0c56a076a10980f719fb2a8f16407e9c301013f6
Reviewers: alexshap
Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, eraman, arphaman, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59213
llvm-svn: 355902
2019-03-12 17:52:58 +08:00
|
|
|
# GCN: S_ENDPGM 0
|
2018-12-19 18:17:49 +08:00
|
|
|
name: irreducible_loop_extended
|
|
|
|
|
|
|
|
body: |
|
|
|
|
bb.0:
|
|
|
|
successors: %bb.1, %bb.2
|
2019-05-01 06:08:23 +08:00
|
|
|
$sgpr4_sgpr5_sgpr6_sgpr7 = S_LOAD_DWORDX4_IMM renamable $sgpr2_sgpr3, 0, 0, 0
|
2018-12-19 18:17:49 +08:00
|
|
|
S_CBRANCH_VCCZ %bb.2, implicit $vcc
|
|
|
|
|
|
|
|
bb.1:
|
|
|
|
successors: %bb.2
|
[AMDGPU] Extend buffer intrinsics with swizzling
Summary:
Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this information to MIR.
Intrinsics updated:
int_amdgcn_raw_buffer_load
int_amdgcn_raw_buffer_load_format
int_amdgcn_raw_buffer_store
int_amdgcn_raw_buffer_store_format
int_amdgcn_raw_tbuffer_load
int_amdgcn_raw_tbuffer_store
int_amdgcn_struct_buffer_load
int_amdgcn_struct_buffer_load_format
int_amdgcn_struct_buffer_store
int_amdgcn_struct_buffer_store_format
int_amdgcn_struct_tbuffer_load
int_amdgcn_struct_tbuffer_store
Furthermore, disable merging of VMEM buffer instructions
in SI Load/Store optimizer, if the "swizzled" bit on the instruction
is on.
The default value of the bit is 0, meaning that data in buffer
is linear and buffer instructions can be merged.
There is no difference in the generated code with this commit.
However, in the future it will be expected that front-ends
use buffer intrinsics with correct "swizzled" bit set.
Reviewers: arsenm, nhaehnle, tpr
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68200
llvm-svn: 373491
2019-10-03 01:22:36 +08:00
|
|
|
BUFFER_STORE_DWORD_OFFEN_exact killed renamable $vgpr3, renamable $vgpr2, renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0, 0, 0, 0, 0, 0, implicit $exec
|
2018-12-19 18:17:49 +08:00
|
|
|
|
|
|
|
bb.2:
|
|
|
|
successors: %bb.3, %bb.6
|
|
|
|
S_CBRANCH_VCCNZ %bb.6, implicit $vcc
|
|
|
|
|
|
|
|
bb.3:
|
|
|
|
successors: %bb.4, %bb.5
|
[AMDGPU] Extend buffer intrinsics with swizzling
Summary:
Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this information to MIR.
Intrinsics updated:
int_amdgcn_raw_buffer_load
int_amdgcn_raw_buffer_load_format
int_amdgcn_raw_buffer_store
int_amdgcn_raw_buffer_store_format
int_amdgcn_raw_tbuffer_load
int_amdgcn_raw_tbuffer_store
int_amdgcn_struct_buffer_load
int_amdgcn_struct_buffer_load_format
int_amdgcn_struct_buffer_store
int_amdgcn_struct_buffer_store_format
int_amdgcn_struct_tbuffer_load
int_amdgcn_struct_tbuffer_store
Furthermore, disable merging of VMEM buffer instructions
in SI Load/Store optimizer, if the "swizzled" bit on the instruction
is on.
The default value of the bit is 0, meaning that data in buffer
is linear and buffer instructions can be merged.
There is no difference in the generated code with this commit.
However, in the future it will be expected that front-ends
use buffer intrinsics with correct "swizzled" bit set.
Reviewers: arsenm, nhaehnle, tpr
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68200
llvm-svn: 373491
2019-10-03 01:22:36 +08:00
|
|
|
BUFFER_STORE_DWORD_OFFEN_exact killed renamable $vgpr3, killed renamable $vgpr2, killed renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0, 0, 0, 0, 0, 0, implicit $exec
|
2018-12-19 18:17:49 +08:00
|
|
|
S_CBRANCH_VCCNZ %bb.5, implicit $vcc
|
|
|
|
|
|
|
|
bb.4:
|
|
|
|
successors: %bb.5
|
2019-05-01 06:08:23 +08:00
|
|
|
renamable $sgpr12_sgpr13_sgpr14_sgpr15 = S_LOAD_DWORDX4_IMM killed renamable $sgpr2_sgpr3, 64, 0, 0
|
2018-12-19 18:17:49 +08:00
|
|
|
renamable $vgpr2 = BUFFER_ATOMIC_ADD_OFFSET_RTN killed renamable $vgpr2, killed renamable $sgpr12_sgpr13_sgpr14_sgpr15, 0, 0, 0, implicit $exec
|
|
|
|
|
|
|
|
bb.5:
|
|
|
|
successors: %bb.6
|
|
|
|
|
|
|
|
bb.6:
|
2019-05-01 06:08:23 +08:00
|
|
|
FLAT_STORE_DWORD $vgpr3_vgpr4, $vgpr2, 0, 0, 0, 0, implicit $exec, implicit $flat_scr
|
[AMDGPU] Add support for immediate operand for S_ENDPGM
Summary:
Add support for immediate operand in S_ENDPGM
Change-Id: I0c56a076a10980f719fb2a8f16407e9c301013f6
Reviewers: alexshap
Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, eraman, arphaman, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59213
llvm-svn: 355902
2019-03-12 17:52:58 +08:00
|
|
|
S_ENDPGM 0
|
2018-12-19 18:17:49 +08:00
|
|
|
...
|