In preparation for using the new register scavenger capability for providing
more than one register simultaneously, specifically note functions that have
spilled VRSAVE (currently, this can happen only in functions that use the
setjmp intrinsic). As with CR spilling, such functions will need to provide two
emergency spill slots to the scavenger.
No functionality change intended.
llvm-svn: 177832
The LR register is unconditionally reserved, and its spilling and restoration
is handled by the prologue/epilogue code. As a result, it is never explicitly
spilled by the register allocator.
No functionality change intended.
llvm-svn: 177823
We currently have a duplicated set of call instruction patterns depending
on the ABI to be followed (Darwin vs. Linux). This is a bit odd; while the
different ABIs will result in different instruction sequences, the actual
instructions themselves ought to be independent of the ABI. And in fact it
turns out that the only nontrivial difference between the two sets of
patterns is that in the PPC64 Linux ABI, the instruction used for indirect
calls is marked to take X11 as extra input register (which is indeed used
only with that ABI to hold an incoming environment pointer for nested
functions). However, this does not need to be hard-coded at the .td
pattern level; instead, the C++ code expanding calls can simply add that
use, just like it adds uses for argument registers anyway.
No change in generated code expected.
llvm-svn: 177735
Thanks to Jakob for isolating the underlying problem from the
test case in r177423. The original commit had introduced
asymmetric copy operations, but these turned out to be a work-around
to the real problem (the use of == instead of hasSubClassEq in PPCCTRLoops).
llvm-svn: 177679
Although there is only one Altivec VRSAVE register, it is a member of
a register class, and we need the ability to spill it. Because this
register is normally callee-preserved and handled by special code this
has never before been necessary. However, this capability will be required by
a forthcoming commit adding SjLj support.
llvm-svn: 177654
Currently the PPC r0 register is unconditionally reserved. There are two reasons
for this:
1. r0 is treated specially (as the constant 0) by certain instructions, and so
cannot be used with those instructions as a regular register.
2. r0 is used as a temporary register in the CR-register spilling process
(where, under some circumstances, we require two GPRs).
This change addresses the first reason by introducing a restricted register
class (without r0) for use by those instructions that treat r0 specially. These
register classes have a new pseudo-register, ZERO, which represents the r0-as-0
use. This has the side benefit of making the existing target code simpler (and
easier to understand), and will make it clear to the register allocator that
uses of r0 as 0 don't conflict will real uses of the r0 register.
Once the CR spilling code is improved, we'll be able to allocate r0.
Adding these extra register classes, for some reason unclear to me, causes
requests to the target to copy 32-bit registers to 64-bit registers. The
resulting code seems correct (and causes no test-suite failures), and the new
test case covers this new kind of asymmetric copy.
As r0 is still reserved, no functionality change intended.
llvm-svn: 177423
This change cleans up two issues with Altivec register spilling:
1. The spilling code was inefficient (using two instructions, and add and a
load, when just one would do)
2. The code assumed that r0 would always be available (true for now, but this
will change)
The new code handles VR spilling just like GPR spills but forced into r+r mode.
As a result, when any VR spills are present, we must now always allocate the
register-scavenger spill slot.
llvm-svn: 177231
For spills into a large stack frame, the FI-elimination code uses the register
scavenger to obtain a free GPR for use with an r+r-addressed load or store.
When there are no available GPRs, the scavenger gets one by using its spill
slot. Previously, we were not always allocating that spill slot and the RS
would assert when the spill slot was needed.
I don't currently have a small test that triggered the assert, but I've
created a small regression test that verifies that the spill slot is now
added when the stack frame is sufficiently large.
llvm-svn: 177140
This removes the -disable-ppc[32|64]-regscavenger options; the code
that uses the register scavenger has been working well (and has been the default)
for some time, and we don't need options to enable the old (broken) CR spilling code.
llvm-svn: 176865
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
the compiler makes use of GPR0. However, there are two flavors of
GPR0 defined by the target: the 32-bit GPR0 (R0) and the 64-bit GPR0
(X0). The spill/reload code makes use of R0 regardless of whether we
are generating 32- or 64-bit code.
This patch corrects the problem in the obvious manner, using X0 and
ADDI8 for 64-bit and R0 and ADDI for 32-bit.
llvm-svn: 165658
The PPC::EXTSW instruction preserves the low 32 bits of its input, just
like some of the x86 instructions. Use it to reduce register pressure
when the low 32 bits have multiple uses.
This requires a small change to PeepholeOptimizer since EXTSW takes a
64-bit input register.
This is related to PR5997.
llvm-svn: 158743
Thanks to Jakob's help, this now causes no new test suite failures!
Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%
The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%
In light of these slowdowns, additional profiling work is obviously needed!
llvm-svn: 158223
The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.
llvm-svn: 158206
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.
llvm-svn: 158204
This adds a full itinerary for IBM's PPC64 A2 embedded core. These
cores form the basis for the CPUs in the new IBM BG/Q supercomputer.
llvm-svn: 153842
Dynamic linking on PPC64 has had problems since we had to move the top-down
hazard-detection logic post-ra. For dynamic linking to work there needs to be
a nop placed after every call. It turns out that it is really hard to guarantee
that nothing will be placed in between the call (bl) and the nop during post-ra
scheduling. Previous attempts at fixing this by placing logic inside the
hazard detector only partially worked.
This is now fixed in a different way: call+nop codegen-only instructions. As far
as CodeGen is concerned the pair is now a single instruction and cannot be split.
This solution works much better than previous attempts.
The scoreboard hazard detector is also renamed to be more generic, there is currently
no cpu-specific logic in it.
llvm-svn: 153816
and MCSubtargetInfo.
- Added methods to update subtarget features (used when targets automatically
detect subtarget features or switch modes).
- Teach X86Subtarget to update MCSubtargetInfo features bits since the
MCSubtargetInfo layer can be shared with other modules.
- These fixes .code 16 / .code 32 support since mode switch is updated in
MCSubtargetInfo so MC code emitter can do the right thing.
llvm-svn: 134884
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
llvm-svn: 134021
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.
Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.
Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.
Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.
ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.
ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.
llvm-svn: 122541
operands.
Hopefully this fixes the llvm-gcc-powerpc-darwin9 buildbot. It really shouldn't
since missing memoperands should not affect correctness.
llvm-svn: 108540
The only folding these load/store architectures can do is converting COPY into a
load or store, and the target independent part of foldMemoryOperand already
knows how to do that.
llvm-svn: 108099