forked from OSchip/llvm-project
[AMDGPU] PHI node cost should not be counted for the size and latency.
Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the result of the PHI lowering are inserted into the basic block predecessors - not into the block itself. As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop body size/cost estimation. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D105104
This commit is contained in:
parent
dd4d3f7406
commit
e585b332e4
|
@ -837,10 +837,6 @@ InstructionCost GCNTTIImpl::getCFInstrCost(unsigned Opcode,
|
|||
}
|
||||
case Instruction::Ret:
|
||||
return SCost ? 1 : 10;
|
||||
case Instruction::PHI:
|
||||
// TODO: 1. A prediction phi won't be eliminated?
|
||||
// 2. Estimate data copy instructions in this case.
|
||||
return 1;
|
||||
}
|
||||
return BaseT::getCFInstrCost(Opcode, CostKind, I);
|
||||
}
|
||||
|
|
|
@ -8,7 +8,7 @@
|
|||
; SPEED: estimated cost of 10 for instruction: ret void
|
||||
; SIZE: estimated cost of 5 for instruction: br i1
|
||||
; SIZE: estimated cost of 1 for instruction: br label
|
||||
; SIZE: estimated cost of 1 for instruction: %phi = phi i32 [
|
||||
; SIZE: estimated cost of 0 for instruction: %phi = phi i32 [
|
||||
; SIZE: estimated cost of 1 for instruction: ret void
|
||||
define amdgpu_kernel void @test_br_cost(i32 addrspace(1)* %out, i32 addrspace(1)* %vaddr, i32 %b) #0 {
|
||||
bb0:
|
||||
|
|
Loading…
Reference in New Issue