AddPseudoTwoAddrDeps. This lets the scheduling infrastructure
avoid recalculating node heights. In very large testcases this
was a major bottleneck. Thanks to Roman Levenstein for finding
this!
As a side effect, fold-pcmpeqd-0.ll is now scheduled better
and it no longer requires spilling on x86-32.
llvm-svn: 61778
instructions to avoid copies, because TwoAddressInstructionPass
also does this optimization. The scheduler's version didn't
account for live-out values, which resulted in spurious commutes
and missed opportunities.
Now, TwoAddressInstructionPass handles all the opportunities,
instead of just those that the scheduler missed. The result is
usually the same, though there are occasional trivial differences
resulting from the avoidance of spurious commutes.
llvm-svn: 61611
computation code. Also, avoid adding output-depenency edges when both
defs are dead, which frequently happens with EFLAGS defs.
Compute Depth and Height lazily, and always in terms of edge latency
values. For the schedulers that don't care about latency, edge latencies
are set to 1.
Eliminate Cycle and CycleBound, and LatencyPriorityQueue's Latencies array.
These are all subsumed by the Depth and Height fields.
llvm-svn: 61073
The Cost field is removed. It was only being used in a very limited way,
to indicate when the scheduler should attempt to protect a live register,
and it isn't really needed to do that. If we ever want the scheduler to
start inserting copies in non-prohibitive situations, we'll have to
rethink some things anyway.
A Latency field is added. Instead of giving each node a single
fixed latency, each edge can have its own latency. This will eventually
be used to model various micro-architecture properties more accurately.
The PointerIntPair class and an internal union are now used, which
reduce the overall size.
llvm-svn: 60806
introduce any new spilling; it just uses unused registers.
Refactor the SUnit topological sort code out of the RRList scheduler and
make use of it to help with the post-pass scheduler.
llvm-svn: 59999
schedulers. This doesn't have much immediate impact because
targets that use these schedulers by default don't yet provide
pipeline information.
This code also didn't have the benefit of register pressure
information. Also, removing it will avoid problems with list-burr
suddenly starting to do latency-oriented scheduling on x86 when we
start providing pipeline data, which would increase spilling.
llvm-svn: 59775
the RR scheduler actually does look at latency values, but it
doesn't use a hazard recognizer so it has no way to know when
a no-op is needed, as opposed to just stalling and incrementing
the cycle count.
llvm-svn: 59759
dedicated "fast" scheduler in -fast mode instead, which is
faster. This speeds up llc -fast by a few percent on some
testcases -- the speedup only happens for code not handled by
fast-isel.
llvm-svn: 59700
is currently off by default, and can be enabled with
-disable-post-RA-scheduler=false.
This doesn't have a significant impact on most code yet because it doesn't
yet do anything to address anti-dependencies and it doesn't attempt to
disambiguate memory references. Also, several popular targets
don't have pipeline descriptions yet.
The majority of the changes here are splitting the SelectionDAG-specific
code out of ScheduleDAG, so that ScheduleDAG can be moved to
libLLVMCodeGen.a. The interface between ScheduleDAG-using code and
the rest of the scheduling code is somewhat rough and will evolve.
llvm-svn: 59676
new CycleBound value. Instead, just update CycleBound on each call.
Also, make ReleasePred and ReleaseSucc methods more consistent accross
the various schedulers.
This also happens to make ScheduleDAGRRList's CycleBound computation
somewhat more interesting, though it still doesn't have any noticeable
effect, because no current targets that use the register-pressure
reduction scheduler provide pipeline models.
llvm-svn: 59475
to carry a SmallVector of flagged nodes, just calculate the flagged nodes
dynamically when they are needed.
The local-liveness change is due to a trivial scheduling change where
the scheduler arbitrary decision differently.
llvm-svn: 59273
before creating the SUnit for the operation that it was unfolded from. This
allows each SUnit to have all of its predecessor SUnits available at the time
it is created. I don't know yet if this will be absolutely required, but it
is a little tidier to do it this way.
llvm-svn: 59083
instead of requiring all "short description" strings to begin with
two spaces. This makes these strings less mysterious, and it fixes
some cases where short description strings mistakenly did not
begin with two spaces.
llvm-svn: 57521
of two, and to not need a scratch std::vector. Also, compute the ordering
immediately in the result array, instead of in another scratch std::vector
that is copied to the result array.
llvm-svn: 55421
replacement of multiple values. This is slightly more efficient
than doing multiple ReplaceAllUsesOfValueWith calls, and theoretically
could be optimized even further. However, an important property of this
new function is that it handles the case where the source value set and
destination value set overlap. This makes it feasible for isel to use
SelectNodeTo in many very common cases, which is advantageous because
SelectNodeTo avoids a temporary node and it doesn't require CSEMap
updates for users of values that don't change position.
Revamp MorphNodeTo, which is what does all the work of SelectNodeTo, to
handle operand lists more efficiently, and to correctly handle a number
of corner cases to which its new wider use exposes it.
This commit also includes a change to the encoding of post-isel opcodes
in SDNodes; now instead of being sandwiched between the target-independent
pre-isel opcodes and the target-dependent pre-isel opcodes, post-isel
opcodes are now represented as negative values. This makes it possible
to test if an opcode is pre-isel or post-isel without having to know
the size of the current target's post-isel instruction set.
These changes speed up llc overall by 3% and reduce memory usage by 10%
on the InstructionCombining.cpp testcase with -fast and -regalloc=local.
llvm-svn: 53728