[MachineCombiner] Add up latencies of all instructions in new pattern.

Summary:
When calculating the RootLatency, we add up all the latencies of the
deleted instructions. But for NewRootLatency we only add the latency of
the new root instructions, ignoring the latencies of the other
instructions inserted. This leads the combiner to underestimate the cost
of patterns which add multiple instructions. This patch fixes that by
summing up the latencies of all new instructions. For NewRootNode, the
more complex getLatency function is used.

Note that we may be slightly more precise than just summing up
all latencies. For example, consider a pattern like

    r1 = INS1 ..
    r2 = INS2 ..
    r3 = INS3 r1, r2

I think in some other places, the total latency of the pattern would be
estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider
that worth changing, I think it would be best to do in a follow-up
patch.

Reviewers: Gerolf, sebpop, spop, fhahn

Reviewed By: fhahn

Subscribers: evandro, llvm-commits

Differential Revision: https://reviews.llvm.org/D40307

llvm-svn: 319951
This commit is contained in:
Florian Hahn 2017-12-06 20:27:33 +00:00
parent 9e776fb0dc
commit 001c3dd202
1 changed files with 9 additions and 2 deletions

View File

@ -282,9 +282,16 @@ bool MachineCombiner::improvesCriticalPathLen(
// of the original code sequence. This may allow the transform to proceed
// even if the instruction depths (data dependency cycles) become worse.
unsigned NewRootLatency = getLatency(Root, NewRoot, BlockTrace);
unsigned RootLatency = 0;
// Account for the latency of the inserted and deleted instructions by
// adding up their latencies. This assumes that the inserted and deleted
// instructions are dependent instruction chains, which might not hold
// in all cases.
unsigned NewRootLatency = 0;
for (unsigned i = 0; i < InsInstrs.size() - 1; i++)
NewRootLatency += TSchedModel.computeInstrLatency(InsInstrs[i]);
NewRootLatency += getLatency(Root, NewRoot, BlockTrace);
unsigned RootLatency = 0;
for (auto I : DelInstrs)
RootLatency += TSchedModel.computeInstrLatency(I);