Move platform independent code (lowering of possibly overwritten
arguments, check for tail call optimization eligibility) from
target X86ISelectionLowering.cpp to TargetLowering.h and
SelectionDAGISel.cpp.
Initial PowerPC tail call implementation:
Support ppc32 implemented and tested (passes my tests and
test-suite llvm-test).
Support ppc64 implemented and half tested (passes my tests).
On ppc tail call optimization is performed if
caller and callee are fastcc
call is a tail call (in tail call position, call followed by ret)
no variable argument lists or byval arguments
option -tailcallopt is enabled
Supported:
* non pic tail calls on linux/darwin
* module-local tail calls on linux(PIC/GOT)/darwin(PIC)
* inter-module tail calls on darwin(PIC)
If constraints are not met a normal call will be emitted.
A test checking the argument lowering behaviour on x86-64 was added.
llvm-svn: 50477
PPC-64 doesn't work.) This also lowers the spilling of the CR registers so that
it uses a register other than the default R0 register (the scavenger scrounges
for one). A significant part of this patch fixes how kill information is
handled.
llvm-svn: 47863
instead of "ISD::STORE". This allows us to mark target-specific dag
nodes as storing (such as ppc byteswap stores). This allows us to remove
more explicit isStore flags from the .td files.
Finally, add a warning for when a .td file contains an explicit
isStore and tblgen is able to infer it.
llvm-svn: 45654
adjustment fields, and an optional flag. If there is a "dynamic_stackalloc" in
the code, make sure that it's bracketed by CALLSEQ_START and CALLSEQ_END. If
not, then there is the potential for the stack to be changed while the stack's
being used by another instruction (like a call).
This can only result in tears...
llvm-svn: 44037
InOperandList. This gives one piece of important information: # of results
produced by an instruction.
An example of the change:
def ADD32rr : I<0x01, MRMDestReg, (ops GR32:$dst, GR32:$src1, GR32:$src2),
"add{l} {$src2, $dst|$dst, $src2}",
[(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
=>
def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
"add{l} {$src2, $dst|$dst, $src2}",
[(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
llvm-svn: 40033
external symbols and global addresses. Add the missing ones.
one important workaround: PPCISD::CALL is matched by both PPCcall_ELF
and PPCcall_Macho, disable the _ELF patterns for now.
llvm-svn: 34601
The algorithm it used before wasn't 100% correct, we now use an iterative
expansion model. This fixes assembler errors when compiling 403.gcc with
tail merging enabled.
Change the way the branch selector works overall: Now, the isel generates
PPC::BCC instructions (as it used to) directly, and these BCC instructions
are emitted to the output or jitted directly if branches don't need
expansion. Only if branches need expansion are instructions rewritten
and created. This should make branch select faster, and eliminates the
Bxx instructions from the .td file.
llvm-svn: 31837
value and CR reg #. This requires swapping the order of these everywhere
that touches BCC and requires us to write custom matching logic for
PPCcondbranch :(
llvm-svn: 31835
bugs including making sure that the TOS links back to the previous frame,
that the maximum call frame size is not included twice when using frame
pointers, no longer growing the frame on calls, double storing of SP and
a cleaner/faster dynamic alloca.
llvm-svn: 31792
Tell the codegen emitter that specific operands are not to be encoded, fixing
JIT regressions w.r.t. pre-inc loads and stores (e.g. lwzu, which we generate
even when general preinc loads are not enabled).
llvm-svn: 31770
pair for cleanliness. Add instructions for PPC32 preinc-stores with commented
out patterns. More improvement is needed to enable the patterns, but we're
getting close.
llvm-svn: 31749
clobber. This allows LR8 to be save/restored correctly as a 64-bit quantity,
instead of handling it as a 32-bit quantity. This unbreaks ppc64 codegen when
the code is actually located above the 4G boundary.
llvm-svn: 31734
(because the 64-bit reg target versions aren't implemented yet), doesn't
support r+r addr modes, and doesn't handle stores, but it works otherwise. :)
This is disabled unless -enable-ppc-preinc is passed to llc for now.
llvm-svn: 31621
that takes a register and condition code. Print these pieces of BLR the
right way, even though it is currently set to 'always'.
Next up: get the JIT encoding right, then enhance branch folding to produce
predicated blr for simple examples.
llvm-svn: 31449
As such, use xoaddr (indexed only), not xaddr for address selection.
This fixes CodeGen/PowerPC/2006-07-19-stwbrx-crash.ll, a crash compiling lencod.
llvm-svn: 29208
Split imm16Shifted into a sext/zext form for 64-bit support.
Add some patterns for immediate formation. For example, we now compile this:
static unsigned long long Y;
void test3() {
Y = 0xF0F00F00;
}
into:
_test3:
li r2, 3840
lis r3, ha16(_Y)
xoris r2, r2, 61680
std r2, lo16(_Y)(r3)
blr
GCC produces:
_test3:
li r0,0
lis r2,ha16(_Y)
ori r0,r0,61680
sldi r0,r0,16
ori r0,r0,3840
std r0,lo16(_Y)(r2)
blr
llvm-svn: 28883
as using incoming argument registers, so the local allocator would clobber them
between their set and use. To fix this, we give the call instructions a variable
number of uses in the CALL MachineInstr itself, so live variables understands
the live ranges of these register arguments.
llvm-svn: 28744
the copyto/fromregs instead of making the PPCISD::CALL selection code create
them. This vastly simplifies the selection code, and moves the ABI handling
parts into one place.
llvm-svn: 28346
x86 and ppc for 100% dense switch statements when relocations are non-PIC.
This support will be extended and enhanced in the coming days to support
PIC, and less dense forms of jump tables.
llvm-svn: 27947
<int -1, int -1, int -1, int -1>
and
<int 65537, int 65537, int 65537, int 65537>
Using things like:
vspltisb v0, -1
and:
vspltish v0, 1
instead of using constant pool loads.
This implements CodeGen/PowerPC/vec_splat.ll:splat_imm_i{32|16}.
llvm-svn: 27106
constant pool load. This generates significantly nicer code for splats.
When tblgen gets bugfixed, we can remove the custom selection code.
llvm-svn: 26898
Make the PPC backend not dependent on BRTWOWAY_CC and make the branch
selector smarter about the code it generates, fixing a case in the
readme.
llvm-svn: 26814
1. Use flags on the instructions in the .td file to indicate the PPC970 unit
type instead of a table in the .cpp file. Much cleaner.
2. Change the hazard recognizer to build d-groups according to the actual
algorithm used, not my flawed understanding of it.
3. Model "must be in the first slot" and "must be the only instr in a group"
accurately.
llvm-svn: 26719
void foo(float a, int *b) { *b = a; }
to this:
_foo:
fctiwz f0, f1
stfiwx f0, 0, r4
blr
instead of this:
_foo:
fctiwz f0, f1
stfd f0, -8(r1)
lwz r2, -4(r1)
stw r2, 0(r4)
blr
This implements CodeGen/PowerPC/stfiwx.ll, and also incidentally does the
right thing for GCC bugzilla 26505.
llvm-svn: 26447
Currently tblgen cannot tell which operands in the operand list are results so
it assumes the first one is a result. This is bad. Ideally we would fix this
by separating results from inputs, e.g. (res R32:$dst),
(ops R32:$src1, R32:$src2). But that's a more distruptive change. Adding
'let noResults = 1' is the workaround to tell tblgen that the instruction does
not produces a result. It works for now since tblgen does not support
instructions which produce multiple results.
llvm-svn: 25017
* Added a pseudo instruction (for each target) that represent "return void".
This is a workaround for lack of optional flag operand (return void is not
lowered so it does not have a flag operand.)
llvm-svn: 24997
from the DAGToDAG cpp file. This adds pattern support for vector and
scalar fma, which passes test/Regression/CodeGen/PowerPC/fma.ll, and
does the right thing in the presence of -disable-excess-fp-precision.
Allows us to match:
void %foo(<4 x float> * %a) {
entry:
%tmp1 = load <4 x float> * %a;
%tmp2 = mul <4 x float> %tmp1, %tmp1
%tmp3 = add <4 x float> %tmp2, %tmp1
store <4 x float> %tmp3, <4 x float> *%a
ret void
}
As:
_foo:
li r2, 0
lvx v0, r2, r3
vmaddfp v0, v0, v0, v0
stvx v0, r2, r3
blr
Or, with llc -disable-excess-fp-precision,
_foo:
li r2, 0
lvx v0, r2, r3
vxor v1, v1, v1
vmaddfp v1, v0, v0, v1
vaddfp v0, v1, v0
stvx v0, r2, r3
blr
llvm-svn: 24719
amount handling that PPC provides. These are generated by the lowering code
and prevents the dag combiner from assuming (rightfully) that the shifts
don't only look at 5 bits. This fixes a miscompilation of crafty with
the new front-end.
llvm-svn: 24615
Registers. Apologies to Jim if the scheduling info so far isn't accurate.
There's a few more things like VRsave support that need to be finished up
in my local tree before I can commit code that Does The Right Thing for
turning 4 x float into the various altivec packed float instructions.
llvm-svn: 24489
on Darwin to remove smarts from the isel. This is currently disabled by
default (uncomment setOperationAction(ISD::GlobalAddress to enable it).
tblgen needs to become smarter about tglobaladdr nodes and bigger patterns
needed to be added to the .td file. However, we can currently emit stuff like
this: :)
li r2, lo16(L_x$non_lazy_ptr)
lis r3, ha16(L_x$non_lazy_ptr)
lwzx r2, r3, r2
The obvious improvements will follow.
llvm-svn: 24390
code for long long foo(long long a, long long b) { return a + b; }
_foo:
or r2, r3, r3
or r3, r4, r4
or r4, r5, r5
or r5, r6, r6
rldicr r2, r2, 32, 31
rldicl r3, r3, 0, 32
rldicr r4, r4, 32, 31
rldicl r5, r5, 0, 32
or r2, r3, r2
or r3, r5, r4
add r4, r3, r2
rldicl r2, r4, 32, 32
or r4, r4, r4
or r3, r2, r2
blr
llvm-svn: 23809
will have to tide us over until we get real subreg support, but it prevents
the PrologEpilogInserter from spilling 8 byte GPRs on a G4 processor.
Add some initial support for TRUNCATE and ANY_EXTEND, but they don't
currently work due to issues with ScheduleDAG. Something wll have to be
figured out.
llvm-svn: 23803