forked from OSchip/llvm-project
[MachineCombiner] Add up latencies of all instructions in new pattern.
Summary: When calculating the RootLatency, we add up all the latencies of the deleted instructions. But for NewRootLatency we only add the latency of the new root instructions, ignoring the latencies of the other instructions inserted. This leads the combiner to underestimate the cost of patterns which add multiple instructions. This patch fixes that by summing up the latencies of all new instructions. For NewRootNode, the more complex getLatency function is used. Note that we may be slightly more precise than just summing up all latencies. For example, consider a pattern like r1 = INS1 .. r2 = INS2 .. r3 = INS3 r1, r2 I think in some other places, the total latency of the pattern would be estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider that worth changing, I think it would be best to do in a follow-up patch. Reviewers: Gerolf, sebpop, spop, fhahn Reviewed By: fhahn Subscribers: evandro, llvm-commits Differential Revision: https://reviews.llvm.org/D40307 llvm-svn: 319951
This commit is contained in:
parent
9e776fb0dc
commit
001c3dd202
|
@ -282,9 +282,16 @@ bool MachineCombiner::improvesCriticalPathLen(
|
|||
// of the original code sequence. This may allow the transform to proceed
|
||||
// even if the instruction depths (data dependency cycles) become worse.
|
||||
|
||||
unsigned NewRootLatency = getLatency(Root, NewRoot, BlockTrace);
|
||||
unsigned RootLatency = 0;
|
||||
// Account for the latency of the inserted and deleted instructions by
|
||||
// adding up their latencies. This assumes that the inserted and deleted
|
||||
// instructions are dependent instruction chains, which might not hold
|
||||
// in all cases.
|
||||
unsigned NewRootLatency = 0;
|
||||
for (unsigned i = 0; i < InsInstrs.size() - 1; i++)
|
||||
NewRootLatency += TSchedModel.computeInstrLatency(InsInstrs[i]);
|
||||
NewRootLatency += getLatency(Root, NewRoot, BlockTrace);
|
||||
|
||||
unsigned RootLatency = 0;
|
||||
for (auto I : DelInstrs)
|
||||
RootLatency += TSchedModel.computeInstrLatency(I);
|
||||
|
||||
|
|
Loading…
Reference in New Issue