forked from OSchip/llvm-project
[AMDGPU]: Fix failing assertion in SIMachineScheduler
This fixes the assertion failure "Loop in the Block Graph!". SIMachineScheduler groups instructions into blocks (also referred to as coloring or groups) and then performs a two-level scheduling: inter-block scheduling, and intra-block scheduling. This approach requires that the dependency graph on the blocks which is obtained by contracting the blocks in the original dependency graph is acyclic. In other words: Whenever A and B end up in the same block, all vertices on a path from A to B must be in the same block. When compiling an example consisting of an export followed by a buffer store, we see a dependency between these two. This dependency may be false, but that is a different issue. This dependency was not correctly accounted for by SiMachineScheduler. A new test case si-scheduler-exports.ll demonstrating this is also added in this commit. The problematic part of SiMachineScheduler was a post-optimization of the block assignment that tried to group all export instructions into a separate export block for better execution performance. This routine correctly checked that any paths from exports to exports did not contain any non-exports, but not vice-versa: In case of an export with a non-export successor dependency, that single export was moved to a separate block, which could then be both a successor and a predecessor block of a non-export block. As fix, we now skip export grouping if there are exports with direct non-export successor dependencies. This fixes the issue at hand, but is slightly pessimistic: We *could* group all exports into a separate block that have neither direct nor indirect export successor dependencies. We will review the potential performance impact and potentially revisit with a more sophisticated implementation. Note that just grouping all exports without direct non-export successor dependencies could still lead to illegal blocks, since non-export A could depend on export B that depends on export C. In that case, export C has no non-export successor, but still may not be grouped into an export block.
This commit is contained in:
parent
fa4347261e
commit
607f8ced39
|
@ -1123,36 +1123,26 @@ void SIScheduleBlockCreator::colorExports() {
|
|||
for (unsigned SUNum : DAG->TopDownIndex2SU) {
|
||||
const SUnit &SU = DAG->SUnits[SUNum];
|
||||
if (SIInstrInfo::isEXP(*SU.getInstr())) {
|
||||
// Check the EXP can be added to the group safely,
|
||||
// ie without needing any other instruction.
|
||||
// The EXP is allowed to depend on other EXP
|
||||
// (they will be in the same group).
|
||||
for (unsigned j : ExpGroup) {
|
||||
bool HasSubGraph;
|
||||
std::vector<int> SubGraph;
|
||||
// By construction (topological order), if SU and
|
||||
// DAG->SUnits[j] are linked, DAG->SUnits[j] is necessary
|
||||
// in the parent graph of SU.
|
||||
#ifndef NDEBUG
|
||||
SubGraph = DAG->GetTopo()->GetSubGraph(SU, DAG->SUnits[j],
|
||||
HasSubGraph);
|
||||
assert(!HasSubGraph);
|
||||
#endif
|
||||
SubGraph = DAG->GetTopo()->GetSubGraph(DAG->SUnits[j], SU,
|
||||
HasSubGraph);
|
||||
if (!HasSubGraph)
|
||||
continue; // No dependencies between each other
|
||||
// SU is an export instruction. Check whether one of its successor
|
||||
// dependencies is a non-export, in which case we skip export grouping.
|
||||
for (const SDep &SuccDep : SU.Succs) {
|
||||
const SUnit *SuccSU = SuccDep.getSUnit();
|
||||
if (SuccDep.isWeak() || SuccSU->NodeNum >= DAG->SUnits.size()) {
|
||||
// Ignore these dependencies.
|
||||
continue;
|
||||
}
|
||||
assert(SuccSU->isInstr() &&
|
||||
"SUnit unexpectedly not representing an instruction!");
|
||||
|
||||
// SubGraph contains all the instructions required
|
||||
// between EXP SUnits[j] and EXP SU.
|
||||
for (unsigned k : SubGraph) {
|
||||
if (!SIInstrInfo::isEXP(*DAG->SUnits[k].getInstr()))
|
||||
// Other instructions than EXP would be required in the group.
|
||||
// Abort the grouping.
|
||||
return;
|
||||
if (!SIInstrInfo::isEXP(*SuccSU->getInstr())) {
|
||||
// A non-export depends on us. Skip export grouping.
|
||||
// Note that this is a bit pessimistic: We could still group all other
|
||||
// exports that are not depended on by non-exports, directly or
|
||||
// indirectly. Simply skipping this particular export but grouping all
|
||||
// others would not account for indirect dependencies.
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
ExpGroup.push_back(SUNum);
|
||||
}
|
||||
}
|
||||
|
|
|
@ -0,0 +1,25 @@
|
|||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
|
||||
; RUN: llc -march=amdgcn -mcpu=gfx1010 --misched=si -mattr=si-scheduler < %s | FileCheck %s
|
||||
|
||||
define amdgpu_gs void @_amdgpu_gs_main() {
|
||||
; CHECK-LABEL: _amdgpu_gs_main:
|
||||
; CHECK: ; %bb.0: ; %entry
|
||||
; CHECK-NEXT: v_mov_b32_e32 v0, 0
|
||||
; CHECK-NEXT: s_mov_b32 s0, 0
|
||||
; CHECK-NEXT: s_mov_b32 s1, s0
|
||||
; CHECK-NEXT: s_mov_b32 s2, s0
|
||||
; CHECK-NEXT: v_mov_b32_e32 v1, v0
|
||||
; CHECK-NEXT: v_mov_b32_e32 v2, v0
|
||||
; CHECK-NEXT: v_mov_b32_e32 v3, v0
|
||||
; CHECK-NEXT: s_mov_b32 s3, s0
|
||||
; CHECK-NEXT: exp mrt0 off, off, off, off
|
||||
; CHECK-NEXT: buffer_store_dwordx4 v[0:3], v0, s[0:3], 0 idxen
|
||||
; CHECK-NEXT: s_endpgm
|
||||
entry:
|
||||
call void @llvm.amdgcn.exp.f32(i32 0, i32 0, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, i1 false, i1 false)
|
||||
call void @llvm.amdgcn.struct.buffer.store.v4f32(<4 x float> zeroinitializer, <4 x i32> zeroinitializer, i32 0, i32 0, i32 0, i32 0)
|
||||
ret void
|
||||
}
|
||||
|
||||
declare void @llvm.amdgcn.exp.f32(i32 immarg, i32 immarg, float, float, float, float, i1 immarg, i1 immarg)
|
||||
declare void @llvm.amdgcn.struct.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32 immarg)
|
Loading…
Reference in New Issue