This adds in-principle support for if-converting the bctr[l] instructions.
These instructions are used for indirect branching. It seems, however, that the
current if converter will never actually predicate these. To do so, it would
need the ability to hoist a few setup insts. out of the conditionally-executed
block. For example, code like this:
void foo(int a, int (*bar)()) { if (a != 0) bar(); }
becomes:
...
beq 0, .LBB0_2
std 2, 40(1)
mr 12, 4
ld 3, 0(4)
ld 11, 16(4)
ld 2, 8(4)
mtctr 3
bctrl
ld 2, 40(1)
.LBB0_2:
...
and it would be safe to do all of this unconditionally with a predicated
beqctrl instruction.
llvm-svn: 179156
This enables us to form predicated branches (which are the same conditional
branches we had before) and also a larger set of predicated returns (including
instructions like bdnzlr which is a conditional return and loop-counter
decrement all in one).
At the moment, if conversion does not capture all possible opportunities. A
simple example is provided in early-ret2.ll, where if conversion forms one
predicated return, and then the PPCEarlyReturn pass picks up the other one. So,
at least for now, we'll keep both mechanisms.
llvm-svn: 179134
Some general cleanup and only scan the end of a BB for branches (once we're
done with the terminators and debug values, then there should not be any other
branches). These address post-commit review suggestions by Bill Schmidt.
No functionality change intended.
llvm-svn: 179112
PowerPC has a conditional branch to the link register (return) instruction: BCLR.
This should be used any time when we'd otherwise have a conditional branch to a
return. This adds a small pass, PPCEarlyReturn, which runs just prior to the
branch selection pass (and, importantly, after block placement) to generate
these conditional returns when possible. It will also eliminate unconditional
branches to returns (these happen rarely; most of the time these have already
been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice
optimization for small functions that do not maintain a stack frame.
llvm-svn: 179026
There are certain PPC instructions into which we can fold a zero immediate
operand. We can detect such cases by looking at the register class required
by the using operand (so long as it is not otherwise constrained).
llvm-svn: 178961
On cores for which we know the misprediction penalty, and we have
the isel instruction, we can profitably perform early if conversion.
This enables us to replace some small branch sequences with selects
and avoid the potential stalls from mispredicting the branches.
Enabling this feature required implementing canInsertSelect and
insertSelect in PPCInstrInfo; isel code in PPCISelLowering was
refactored to use these functions as well.
llvm-svn: 178926
These functions should have the same list of load/store instructions. Now that
all load/store forms have been normalized (to single instructions or pseudos)
they can be resynchronized.
Found by inspection, although hopefully this will improve optimization. I've
also added some comments.
llvm-svn: 178180
As Bill Schmidt pointed out to me, only on Darwin do we need to spill/restore
VRSAVE in the SjLj code. For non-Darwin, don't spill/restore VRSAVE (and I've
added some asserts to make sure that we're not).
As it turns out, we're not currently handling the Darwin case correctly (I've
added a FIXME in the test case). I've tried adding various implied register
definitions/uses to force the spill without success, so I'll need to address
this later.
llvm-svn: 178096
In preparation for using the new register scavenger capability for providing
more than one register simultaneously, specifically note functions that have
spilled VRSAVE (currently, this can happen only in functions that use the
setjmp intrinsic). As with CR spilling, such functions will need to provide two
emergency spill slots to the scavenger.
No functionality change intended.
llvm-svn: 177832
The LR register is unconditionally reserved, and its spilling and restoration
is handled by the prologue/epilogue code. As a result, it is never explicitly
spilled by the register allocator.
No functionality change intended.
llvm-svn: 177823
We currently have a duplicated set of call instruction patterns depending
on the ABI to be followed (Darwin vs. Linux). This is a bit odd; while the
different ABIs will result in different instruction sequences, the actual
instructions themselves ought to be independent of the ABI. And in fact it
turns out that the only nontrivial difference between the two sets of
patterns is that in the PPC64 Linux ABI, the instruction used for indirect
calls is marked to take X11 as extra input register (which is indeed used
only with that ABI to hold an incoming environment pointer for nested
functions). However, this does not need to be hard-coded at the .td
pattern level; instead, the C++ code expanding calls can simply add that
use, just like it adds uses for argument registers anyway.
No change in generated code expected.
llvm-svn: 177735
Thanks to Jakob for isolating the underlying problem from the
test case in r177423. The original commit had introduced
asymmetric copy operations, but these turned out to be a work-around
to the real problem (the use of == instead of hasSubClassEq in PPCCTRLoops).
llvm-svn: 177679
Although there is only one Altivec VRSAVE register, it is a member of
a register class, and we need the ability to spill it. Because this
register is normally callee-preserved and handled by special code this
has never before been necessary. However, this capability will be required by
a forthcoming commit adding SjLj support.
llvm-svn: 177654
Currently the PPC r0 register is unconditionally reserved. There are two reasons
for this:
1. r0 is treated specially (as the constant 0) by certain instructions, and so
cannot be used with those instructions as a regular register.
2. r0 is used as a temporary register in the CR-register spilling process
(where, under some circumstances, we require two GPRs).
This change addresses the first reason by introducing a restricted register
class (without r0) for use by those instructions that treat r0 specially. These
register classes have a new pseudo-register, ZERO, which represents the r0-as-0
use. This has the side benefit of making the existing target code simpler (and
easier to understand), and will make it clear to the register allocator that
uses of r0 as 0 don't conflict will real uses of the r0 register.
Once the CR spilling code is improved, we'll be able to allocate r0.
Adding these extra register classes, for some reason unclear to me, causes
requests to the target to copy 32-bit registers to 64-bit registers. The
resulting code seems correct (and causes no test-suite failures), and the new
test case covers this new kind of asymmetric copy.
As r0 is still reserved, no functionality change intended.
llvm-svn: 177423
This change cleans up two issues with Altivec register spilling:
1. The spilling code was inefficient (using two instructions, and add and a
load, when just one would do)
2. The code assumed that r0 would always be available (true for now, but this
will change)
The new code handles VR spilling just like GPR spills but forced into r+r mode.
As a result, when any VR spills are present, we must now always allocate the
register-scavenger spill slot.
llvm-svn: 177231
For spills into a large stack frame, the FI-elimination code uses the register
scavenger to obtain a free GPR for use with an r+r-addressed load or store.
When there are no available GPRs, the scavenger gets one by using its spill
slot. Previously, we were not always allocating that spill slot and the RS
would assert when the spill slot was needed.
I don't currently have a small test that triggered the assert, but I've
created a small regression test that verifies that the spill slot is now
added when the stack frame is sufficiently large.
llvm-svn: 177140
This removes the -disable-ppc[32|64]-regscavenger options; the code
that uses the register scavenger has been working well (and has been the default)
for some time, and we don't need options to enable the old (broken) CR spilling code.
llvm-svn: 176865
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
the compiler makes use of GPR0. However, there are two flavors of
GPR0 defined by the target: the 32-bit GPR0 (R0) and the 64-bit GPR0
(X0). The spill/reload code makes use of R0 regardless of whether we
are generating 32- or 64-bit code.
This patch corrects the problem in the obvious manner, using X0 and
ADDI8 for 64-bit and R0 and ADDI for 32-bit.
llvm-svn: 165658
The PPC::EXTSW instruction preserves the low 32 bits of its input, just
like some of the x86 instructions. Use it to reduce register pressure
when the low 32 bits have multiple uses.
This requires a small change to PeepholeOptimizer since EXTSW takes a
64-bit input register.
This is related to PR5997.
llvm-svn: 158743
Thanks to Jakob's help, this now causes no new test suite failures!
Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%
The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%
In light of these slowdowns, additional profiling work is obviously needed!
llvm-svn: 158223
The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.
llvm-svn: 158206
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.
llvm-svn: 158204
This adds a full itinerary for IBM's PPC64 A2 embedded core. These
cores form the basis for the CPUs in the new IBM BG/Q supercomputer.
llvm-svn: 153842
Dynamic linking on PPC64 has had problems since we had to move the top-down
hazard-detection logic post-ra. For dynamic linking to work there needs to be
a nop placed after every call. It turns out that it is really hard to guarantee
that nothing will be placed in between the call (bl) and the nop during post-ra
scheduling. Previous attempts at fixing this by placing logic inside the
hazard detector only partially worked.
This is now fixed in a different way: call+nop codegen-only instructions. As far
as CodeGen is concerned the pair is now a single instruction and cannot be split.
This solution works much better than previous attempts.
The scoreboard hazard detector is also renamed to be more generic, there is currently
no cpu-specific logic in it.
llvm-svn: 153816
and MCSubtargetInfo.
- Added methods to update subtarget features (used when targets automatically
detect subtarget features or switch modes).
- Teach X86Subtarget to update MCSubtargetInfo features bits since the
MCSubtargetInfo layer can be shared with other modules.
- These fixes .code 16 / .code 32 support since mode switch is updated in
MCSubtargetInfo so MC code emitter can do the right thing.
llvm-svn: 134884
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
llvm-svn: 134021
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.
Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.
Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.
Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.
ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.
ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.
llvm-svn: 122541
operands.
Hopefully this fixes the llvm-gcc-powerpc-darwin9 buildbot. It really shouldn't
since missing memoperands should not affect correctness.
llvm-svn: 108540
The only folding these load/store architectures can do is converting COPY into a
load or store, and the target independent part of foldMemoryOperand already
knows how to do that.
llvm-svn: 108099
addresses a longstanding deficiency noted in many FIXMEs scattered
across all the targets.
This effectively moves the problem up one level, replacing eleven
FIXMEs in the targets with eight FIXMEs in CodeGen, plus one path
through FastISel where we actually supply a DebugLoc, fixing Radar
7421831.
llvm-svn: 106243
registers. Currently it is not so marked, which leads to
VCMPEQ instructions that feed into it getting deleted.
If it is so marked, local RA complains about this sequence:
vreg = MCRF CR0
MFCR <kill of whatever preg got assigned to vreg>
All current uses of this instruction are only interested in
one of the 8 CR registers, so redefine MFCR to be a normal
unary instruction with a CR input (which is emitted only as
a comment). That avoids all problems. 7739628.
llvm-svn: 104238
This is possible because F8RC is a subclass of F4RC. We keep FMRSD around so
fextend has a pattern.
Also allow folding of memory operands on FMRSD.
llvm-svn: 97275
The PowerPC floating point registers can represent both f32 and f64 via the
two register classes F4RC and F8RC. F8RC is considered a subclass of F4RC to
allow cross-class coalescing. This coalescing only affects whether registers
are spilled as f32 or f64.
Spill slots must be accessed with load/store instructions corresponding to the
class of the spilled register. PPCInstrInfo::foldMemoryOperandImpl was looking
at the instruction opcode which is wrong.
X86 has similar floating point register classes, but doesn't try to fold
memory operands, so there is no problem there.
llvm-svn: 97262
stack frame, the prolog/epilog code was using the same
register for the copy of CR and the address of the save slot. Oops.
This is fixed here for Darwin, sort of, by reserving R2 for this case.
A better way would be to do the store before the decrement of SP,
which is safe on Darwin due to the red zone.
SVR4 probably has the same problem, but I don't know how to fix it;
there is no red zone and R2 is already used for something else.
I'm going to leave it to someone interested in that target.
Better still would be to rewrite the CR-saving code completely;
spilling each CR subregister individually is horrible code.
llvm-svn: 96015
the only real caller (GetFunctionSizeInBytes) uses it.
The custom ARM implementation of this is basically reimplementing
an assembler poorly for negligible gain. It should be removed
IMNSHO, but I'll leave that to ARMish folks to decide.
llvm-svn: 77877
This adds location info for all llvm_unreachable calls (which is a macro now) in
!NDEBUG builds.
In NDEBUG builds location info and the message is off (it only prints
"UREACHABLE executed").
llvm-svn: 75640
Don't spill to the CR save area when using the SVR4 ABI for now.
Don't rely on constants assigned for registers to be in order (they aren't assigned in order).
Make sure CR bits are mapped to the corresponding CR field.
llvm-svn: 74767
booleans. This gives a better indication of what the "addReg()" is
doing. Remembering what all of those booleans mean isn't easy, especially if you
aren't spending all of your time in that code.
I took Jakob's suggestion and made it illegal to pass in "true" for the
flag. This should hopefully prevent any unintended misuse of this (by reverting
to the old way of using addReg()).
llvm-svn: 71722
suprise to some callers, e.g. register coalescer. For now, add an parameter
that tells AnalyzeBranch whether it's safe to modify the mbb. A better
solution is out there, but I don't have time to deal with it right now.
llvm-svn: 64124
isImmediate(), isRegister(), and friends, to avoid confusion
about having two different names with the same meaning. I'm
not attached to the longer names, and would be ok with
changing to the shorter names if others prefer it.
llvm-svn: 56189
was inserted or not. This allows bitcast in fast isel to properly handle the case
where an appropriate reg-to-reg copy is not available.
llvm-svn: 55375
MachineMemOperands. The pools are owned by MachineFunctions.
This drastically reduces the number of calls to malloc/free made
during the "Emit" phase of scheduling, as well as later phases
in CodeGen. Combined with other changes, this speeds up the
"instruction selection" phase of CodeGen by 10% in some cases.
llvm-svn: 53212
the need for a flavor operand, and add a new SDNode subclass,
LabelSDNode, for use with them to eliminate the need for a label id
operand.
Change instruction selection to let these label nodes through
unmodified instead of creating copies of them. Teach the MachineInstr
emitter how to emit a MachineInstr directly from an ISD label node.
This avoids the need for allocating SDNodes for the label id and
flavor value, as well as SDNodes for each of the post-isel label,
label id, and label flavor.
llvm-svn: 52943
PPC-64 doesn't work.) This also lowers the spilling of the CR registers so that
it uses a register other than the default R0 register (the scavenger scrounges
for one). A significant part of this patch fixes how kill information is
handled.
llvm-svn: 47863
a header file from libcodegen. This violates a layering order: codegen
depends on target, not the other way around. The fix to this is to
split TII into two classes, TII and TargetInstrInfoImpl, which defines
stuff that depends on libcodegen. It is defined in libcodegen, where
the base is not.
llvm-svn: 45475
e.g. MO.isMBB() instead of MO.isMachineBasicBlock(). I don't plan on
switching everything over, so new clients should just start using the
shorter names.
Remove old long accessors, switching everything over to use the short
accessor: getMachineBasicBlock() -> getMBB(),
getConstantPoolIndex() -> getIndex(), setMachineBasicBlock -> setMBB(), etc.
llvm-svn: 45464
- Eliminate the static "print" method for operands, moving it
into MachineOperand::print.
- Change various set* methods for register flags to take a bool
for the value to set it to. Remove unset* methods.
- Group methods more logically by operand flavor in MachineOperand.h
llvm-svn: 45461
value and CR reg #. This requires swapping the order of these everywhere
that touches BCC and requires us to write custom matching logic for
PPCcondbranch :(
llvm-svn: 31835