other things, this allows the scheduler to unfold a load operand
in the 2008-01-08-SchedulerCrash.ll testcase, so it now successfully
clones the comparison to avoid a pushf+popf.
llvm-svn: 48777
Change insert/extract subreg instructions to be able to be used in TableGen patterns.
Use the above features to reimplement an x86-64 pseudo instruction as a pattern.
llvm-svn: 48130
an RFP register class.
Teach ScheduleDAG how to handle CopyToReg with different src/dst
reg classes.
This allows us to compile trivial inline asms that expect stuff
on the top of x87-fp stack.
llvm-svn: 48107
both work right according to the new flags.
This removes the TII::isReallySideEffectFree predicate, and adds
TII::isInvariantLoad.
It removes NeverHasSideEffects+MayHaveSideEffects and adds
UnmodeledSideEffects as machine instr flags. Now the clients
can decide everything they need.
I think isRematerializable can be implemented in terms of the
flags we have now, though I will let others tackle that.
llvm-svn: 45843
that it is cheap and efficient to get.
Move a variety of predicates from TargetInstrInfo into
TargetInstrDescriptor, which makes it much easier to query a predicate
when you don't have TII around. Now you can use MI->getDesc()->isBranch()
instead of going through TII, and this is much more efficient anyway. Not
all of the predicates have been moved over yet.
Update old code that used MI->getInstrDescriptor()->Flags to use the
new predicates in many places.
llvm-svn: 45674
checking that there was a from a global instead of a load from the stub
for a global, which is the one that's safe to hoist.
Consider this program:
volatile char G[100];
int B(char *F, int N) {
for (; N > 0; --N)
F[N] = G[N];
}
In static mode, we shouldn't be hoisting the load from G:
$ llc -relocation-model=static -o - a.bc -march=x86 -machine-licm
LBB1_1: # bb.preheader
leal -1(%eax), %edx
testl %edx, %edx
movl $1, %edx
cmovns %eax, %edx
xorl %esi, %esi
LBB1_2: # bb
movb _G(%eax), %bl
movb %bl, (%ecx,%eax)
llvm-svn: 45626
isReallySideEffectFree and isReallyTriviallyReMaterializable. Why is a load from
a global considered side-effect-free but not rematable?
llvm-svn: 45620
a header file from libcodegen. This violates a layering order: codegen
depends on target, not the other way around. The fix to this is to
split TII into two classes, TII and TargetInstrInfoImpl, which defines
stuff that depends on libcodegen. It is defined in libcodegen, where
the base is not.
llvm-svn: 45475
that "machine" classes are used to represent the current state of
the code being compiled. Given this expanded name, we can start
moving other stuff into it. For now, move the UsedPhysRegs and
LiveIn/LoveOuts vectors from MachineFunction into it.
Update all the clients to match.
This also reduces some needless #includes, such as MachineModuleInfo
from MachineFunction.
llvm-svn: 45467
e.g. MO.isMBB() instead of MO.isMachineBasicBlock(). I don't plan on
switching everything over, so new clients should just start using the
shorter names.
Remove old long accessors, switching everything over to use the short
accessor: getMachineBasicBlock() -> getMBB(),
getConstantPoolIndex() -> getIndex(), setMachineBasicBlock -> setMBB(), etc.
llvm-svn: 45464
function, then go ahead and hoist it out of the loop. This is the result:
$ cat a.c
volatile int G;
int A(int N) {
for (; N > 0; --N)
G++;
}
$ llc -o - -relocation-model=pic
_A:
...
LBB1_2: # bb
movl L_G$non_lazy_ptr-"L1$pb"(%eax), %esi
incl (%esi)
incl %edx
cmpl %ecx, %edx
jne LBB1_2 # bb
...
$ llc -o - -relocation-model=pic -machine-licm
_A:
...
movl L_G$non_lazy_ptr-"L1$pb"(%eax), %eax
LBB1_2: # bb
incl (%eax)
incl %edx
cmpl %ecx, %edx
jne LBB1_2 # bb
...
I'm limiting this to the MOV32rm x86 instruction for now.
llvm-svn: 45444
based what flag to set on whether it was already marked as
"isRematerializable". If there was a further check to determine if it's "really"
rematerializable, then I marked it as "mayHaveSideEffects" and created a check
in the X86 back-end similar to the remat one.
llvm-svn: 45132
enabled by passing -tailcallopt to llc. The optimization is
performed if the following conditions are satisfied:
* caller/callee are fastcc
* elf/pic is disabled OR
elf/pic enabled + callee is in module + callee has
visibility protected or hidden
llvm-svn: 42870
instruction flag, and use the flag along with a virtual member function
hook for targets to override if there are instructions that are only
trivially rematerializable with specific operands (i.e. constant pool
loads).
llvm-svn: 37728
with a general target hook to identify rematerializable instructions. Some
instructions are only rematerializable with specific operands, such as loads
from constant pools, while others are always rematerializable. This hook
allows both to be identified as being rematerializable with the same
mechanism.
llvm-svn: 37644
1) codegen a shift of a register as a shift, not an LEA.
2) teach the RA to convert a shift to an LEA instruction if it wants something
in three-address form.
This gives us asm diffs like:
- leal (,%eax,4), %eax
+ shll $2, %eax
which is faster on some processors and smaller on all of them.
and, more interestingly:
- movl 24(%esi), %eax
- leal (,%eax,4), %edi
+ movl 24(%esi), %edi
+ shll $2, %edi
Without #2, #1 was a significant pessimization in some cases.
This implements CodeGen/X86/shift-codegen.ll
llvm-svn: 35204
actually *removes* one of the operands, instead of just assigning both operands
the same register. This make reasoning about instructions unnecessarily complex,
because you need to know if you are before or after register allocation to match
up operand #'s with the target description file.
Changing this also gets rid of a bunch of hacky code in various places.
This patch also includes changes to fold loads into cmp/test instructions in
the X86 backend, along with a significant simplification to the X86 spill
folding code.
llvm-svn: 30108
movw. That is we promote the destination operand to r16. So
%CH = TRUNC_R16_R8 %BP
is emitted as
movw %bp, %cx.
This is incorrect. If %cl is live, it would be clobbered.
Ideally we want to do the opposite, that is emitted it as
movb ??, %ch
But this is not possible since %bp does not have a r8 sub-register.
We are now defining a new register class R16_ which is a subclass of R16
containing only those 16-bit registers that have r8 sub-registers (i.e.
AX - DX). We isel the truncate to two instructions, a MOV16to16_ to copy the
value to the R16_ class, followed by a TRUNC_R16_R8.
Due to bug 770, the register colaescer is not going to coalesce between R16 and
R16_. That will be fixed later so we can eliminate the MOV16to16_. Right now, it
can only be eliminated if we are lucky that source and destination registers are
the same.
llvm-svn: 28164
- Handle FR32 to VR128:v4f32 and FR64 to VR128:v2f64 with aliases of MOVAPS
and MOVAPD. Mark them as move instructions and *hope* they will be deleted.
llvm-svn: 26919
proves to be worth 20% on Ptrdist/ks. Might be related to dependency
breaking support.
2. Added FsMOVAPSrr and FsMOVAPDrr as aliases to MOVAPSrr and MOVAPDrr. These
are used for FR32 / FR64 reg-to-reg copies.
3. Tell reg-allocator to generate MOVSSrm / MOVSDrm and MOVSSmr / MOVSDmr to
spill / restore FsMOVAPSrr and FsMOVAPDrr.
llvm-svn: 26241
XMM registers. There are many known deficiencies and fixmes, which will be
addressed ASAP. The major benefit of this work is that it will allow the
LLVM register allocator to allocate FP registers across basic blocks.
The x86 backend will still default to x87 style FP. To enable this work,
you must pass -enable-sse-scalar-fp and either -sse2 or -sse3 to llc.
An example before and after would be for:
double foo(double *P) { double Sum = 0; int i; for (i = 0; i < 1000; ++i)
Sum += P[i]; return Sum; }
The inner loop looks like the following:
x87:
.LBB_foo_1: # no_exit
fldl (%esp)
faddl (%eax,%ecx,8)
fstpl (%esp)
incl %ecx
cmpl $1000, %ecx
#FP_REG_KILL
jne .LBB_foo_1 # no_exit
SSE2:
addsd (%eax,%ecx,8), %xmm0
incl %ecx
cmpl $1000, %ecx
#FP_REG_KILL
jne .LBB_foo_1 # no_exit
llvm-svn: 22340
their names more decriptive. A name consists of the base name, a
default operand size followed by a character per operand with an
optional special size. For example:
ADD8rr -> add, 8-bit register, 8-bit register
IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate
IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate
MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
llvm-svn: 11995
switch statements in the constructors and simplifies the
implementation of the getUseType() member function. You will have to
specify defs using MachineOperand::Def instead of MOTy::Def though
(similarly for Use and UseAndDef).
llvm-svn: 11715
* Fix bug in the createNOP method, which was not marking the operands of the
generated XCHG as useanddef. I don't think this method is actually used,
so it wasn't breaking anything, but it should be fixed anyway...
llvm-svn: 7539