[AMDGPU] PHI node cost should not be counted for the size and latency.

Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes
  for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the
  result of the PHI lowering are inserted into the basic block predecessors - not into the block itself.
  As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop
  body size/cost estimation.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D105104
This commit is contained in:
alex-t 2021-06-30 15:48:02 +03:00
parent dd4d3f7406
commit e585b332e4
2 changed files with 1 additions and 5 deletions

View File

@ -837,10 +837,6 @@ InstructionCost GCNTTIImpl::getCFInstrCost(unsigned Opcode,
}
case Instruction::Ret:
return SCost ? 1 : 10;
case Instruction::PHI:
// TODO: 1. A prediction phi won't be eliminated?
// 2. Estimate data copy instructions in this case.
return 1;
}
return BaseT::getCFInstrCost(Opcode, CostKind, I);
}

View File

@ -8,7 +8,7 @@
; SPEED: estimated cost of 10 for instruction: ret void
; SIZE: estimated cost of 5 for instruction: br i1
; SIZE: estimated cost of 1 for instruction: br label
; SIZE: estimated cost of 1 for instruction: %phi = phi i32 [
; SIZE: estimated cost of 0 for instruction: %phi = phi i32 [
; SIZE: estimated cost of 1 for instruction: ret void
define amdgpu_kernel void @test_br_cost(i32 addrspace(1)* %out, i32 addrspace(1)* %vaddr, i32 %b) #0 {
bb0: