1. LiveIntervals now implement a 4 slot per instruction model. Load,
Use, Def and a Store slot. This is required in order to correctly
represent caller saved register clobbering on function calls,
register reuse in the same instruction (def resues last use) and
also spill code added later by the allocator. The previous
representation (2 slots per instruction) was insufficient and as a
result was causing subtle bugs.
2. Fixes in spill code generation. This was the major cause of
failures in the test suite.
3. Linear scan now has core support for folding memory operands. This
is untested and not enabled (the live interval update function does
not attempt to fold loads/stores in instructions).
4. Lots of improvements in the debugging output of both live intervals
and linear scan. Give it a try... it is beautiful :-)
In summary the above fixes all the issues with the recent reserved
register elimination changes and get the allocator very close to the
next big step: folding memory operands.
llvm-svn: 11654
that need them. This is very useful on CISCy targets like the X86 because it
reduces the total spill pressure, and makes better use of it's (large)
instruction set. Though the X86 backend doesn't know how to rewrite many
instructions yet, this already makes a substantial difference on 176.gcc for
example:
Before:
Time:
8.0099 ( 31.2%) 0.0100 ( 12.5%) 8.0199 ( 31.2%) 7.7186 ( 30.0%) Local Register Allocator
Code quality:
734559 asm-printer - Number of machine instrs printed
111395 ra-local - Number of registers reloaded
79902 ra-local - Number of registers spilled
231554 x86-peephole - Number of peephole optimization performed
After:
Time:
7.8700 ( 30.6%) 0.0099 ( 19.9%) 7.8800 ( 30.6%) 7.7892 ( 30.2%) Local Register Allocator
Code quality:
733083 asm-printer - Number of machine instrs printed
2379 ra-local - Number of reloads fused into instructions
109046 ra-local - Number of registers reloaded
79881 ra-local - Number of registers spilled
230658 x86-peephole - Number of peephole optimization performed
So by fusing 2300 instructions, we reduced the static number of instructions
by 1500, and reduces the number of peepholes (and thus the work) by about 900.
This also clearly reduces the number of reload/spill instructions that are
emitted.
llvm-svn: 11542
MRegisterInfo::getNumRegs() instead of
MRegisterInfo::FirstVirtualRegister.
Also use MRegisterInfo::is{Physical,Virtual}Register where
appropriate.
llvm-svn: 11477
Rename SetMachineOperandConst's formal parameters to match other methods here.
Mark some methods as being used only by the SPARC back-end.
Fix a missing-paren bug in OutputValue().
llvm-svn: 11363
MachineBasicBlock. Also change opcode to a short and numImplicitRefs
to an unsigned char so that overall MachineInstr's size stays the
same.
llvm-svn: 11357
ilist of MachineInstr objects. This allows constant time removal and
insertion of MachineInstr instances from anywhere in each
MachineBasicBlock. It also allows for constant time splicing of
MachineInstrs into or out of MachineBasicBlocks.
llvm-svn: 11340
instead of randomly groping about inside its outEdges array.
Make SchedGraph::addDummyEdges() use getNumOutEdges() instead of
outEdges.size().
Get rid of ifdefed-out code in SchedGraph::buildGraph().
llvm-svn: 11238
the Virt2PhysRegMap std::map with an std::vector. This speeds up the
register allocator another (almost) 40%, from .72->.45s in a release build
of LLC on 253.perlbmk.
llvm-svn: 11219
from physical registers, and they are always dense, it makes sense to not have
a ton of RBtree overhead. This change speeds up regalloclocal about ~30% on
253.perlbmk, from .35s -> .27s in the JIT (in LLC, it goes from .74 -> .55).
Now live variable analysis is the slowest codegen pass. Of course it doesn't
help that we have to run it twice, because regalloclocal doesn't update it,
but even if it did it would be the slowest pass (now it's just the 2x slowest
pass :(
llvm-svn: 11215
slots each. As a concequence they get numbered as 0, 2, 4 and so
on. The first slot is used for operand uses and the second for
defs. Here's an example:
0: A = ...
2: B = ...
4: C = A + B ;; last use of A
The live intervals should look like:
A = [1, 5)
B = [3, x)
C = [5, y)
llvm-svn: 11141
registers (not as the max number of registers).
Change toSpill from a std::set into a std::vector<bool>.
Use the reverse iterator adapter to do a reverse scan of allocatable
registers.
llvm-svn: 11061
of a linear search to find the first range for comparisons. This cuts
down the linear scan register allocator running time by a factor of 3
in 254.perlbmk and by a factor of 2.2 in 176.gcc.
llvm-svn: 11030
is a move between two registers, at least one of the registers is
virtual and the two live intervals do not overlap.
This results in about 40% reduction in intervals, 30% decrease in the
register allocators running time and a 20% increase in peephole
optimizations (mainly move eliminations).
The option can be enabled by passing -join-liveintervals where
appropriate.
llvm-svn: 10965
virtReg lives on the stack. Now a virtual register has an entry in the
virtual->physical map or the virtual->stack slot map but never in
both.
llvm-svn: 10958
when an implicitely defined register is later used by an alias. For example:
call foo
%reg1024 = mov %AL
The call implicitely defines EAX but only AL is used. Before this fix
no information was available on AL. Now EAX and all its aliases except
AL get defined and die at the call instruction whereas AL lives to be
killed by the assignment.
llvm-svn: 10813
LiveVariables::HandlePhysRegDef private they use information that is
not in memory when LiveVariables finishes the analysis.
Also update the TwoAddressInstructionPass to not use this interface.
llvm-svn: 10755
This should get hunked over to the Sparc backend, along with
MachineCodeForInstruction and a bunch of files in include/llvm/Codegen,
but those battles will have to wait for a later time.
llvm-svn: 10731
of the register allocator as follows:
before after
mesa 2.3790 1.5994
vpr 2.6008 1.2078
gcc 1.9840 0.5273
mcf 0.2569 0.0470
eon 1.8468 1.4359
twolf 0.9475 0.2004
burg 1.6807 1.3300
lambda 1.2191 0.3764
Speedups range anyware from 30% to over 400% :-)
llvm-svn: 10712
saved register it has a longer free range than ECX (which is defined
every time there is a fnuction call) which makes ECX a better register
to reserve.
llvm-svn: 10635
which denotes the register we would like to be assigned to (virtual or
physical). In register allocation, if this hint exists and we can map
it to a physical register (it is either a physical register or it is a
virtual register that already got assigned to a physical one) we use
that register if it is available instead of a random one in the free
pool.
llvm-svn: 10634
with live intervals was missing registers that were used before they
were defined (in the arbitrary order live intervals numbers
instructions).
llvm-svn: 10603
* Move sparc specific code out of generic code
* Eliminate the getOffset() method which made INVALID_FRAME_OFFSET
necessary, which made pulling in MAX_INT as a sentinal necessary.
llvm-svn: 10553
more operands and the two first operands are constrained to be the
same. The pass takes an instruction of the form:
a = b op c
and transforms it into:
a = b
a = a op c
and also preserves live variables.
llvm-svn: 10512
killing instruction is tracked. This causes the LiveIntervals to
create bogus intervals. The workaound is to add a range to the
interval from the redefinition to the end of the basic block.
llvm-svn: 10510
Move some of the longer LiveIntervals::Interval method out of the
header and add debug information to them. Fix bug and simplify range
merging code.
llvm-svn: 10509