otherwise labels get incorrectly merged. We handled this by emitting a
".byte 0", but this isn't correct on thumb/arm targets where the text segment
needs to be a multiple of 2/4 bytes. Handle this by emitting a noop. This
is more gross than it should be because arm/ppc are not fully mc'ized yet.
This fixes rdar://7908505
llvm-svn: 102400
SSEDomainFix will collapse to the domain with the lower number when it has a
choice. The SSEPackedSingle domain often has smaller instructions, so prefer
that.
llvm-svn: 99952
On Nehalem and newer CPUs there is a 2 cycle latency penalty on using a register
in a different domain than where it was defined. Some instructions have
equvivalents for different domains, like por/orps/orpd.
The SSEDomainFix pass tries to minimize the number of domain crossings by
changing between equvivalent opcodes where possible.
This is a work in progress, in particular the pass doesn't do anything yet. SSE
instructions are tagged with their execution domain in TableGen using the last
two bits of TSFlags. Note that not all instructions are tagged correctly. Life
just isn't that simple.
The SSE execution domain issue is very similar to the ARM NEON/VFP pipeline
issue handled by NEONMoveFixPass. This pass may become target independent to
handle both.
llvm-svn: 99524
This is work in progress. So far, SSE execution domain tables are added to
X86InstrInfo, and a skeleton pass is enabled with -sse-domain-fix.
llvm-svn: 99345
For now, this pass is fairly conservative. It only perform the replacement when both the pre- and post- extension values are used in the block. It will miss cases where the post-extension values are live, but not used.
llvm-svn: 93278
instruction is copy like where the source and destination registers can
overlap. This is to be used by the coalescable to coalesce the source and
destination registers of instructions like X86::MOVSX64rr32. Apparently
some crazy people believe the coalescer is too simple.
llvm-svn: 93210
for all the processors where I have tried it, and even when it might not help
performance, the cost is quite low. The opportunities for duplicating
indirect branches are limited by other factors so code size does not change
much due to tail duplicating indirect branches aggressively.
llvm-svn: 90144
it is definitely profitable to tail duplicate indirect branches for x86.
This is likely to be true to various degrees for all modern x86 processors.
llvm-svn: 89865
Provide special isLoadFromStackSlotPostFE and isStoreToStackSlotPostFE
interfaces to explicitly request checking for post-frame ptr elimination
operands. This uses a heuristic so it isn't reliable for correctness.
llvm-svn: 87047
machine instruction loads or stores from/to a stack slot. Unlike
isLoadFromStackSlot and isStoreFromStackSlot, the instruction may be
something other than a pure load/store (e.g. it may be an arithmetic
operation with a memory operand). This helps AsmPrinter determine when
to print a spill/reload comment.
This is only a hint since we may not be able to figure this out in all
cases. As such, it should not be relied upon for correctness.
Implement for X86. Return false by default for other architectures.
llvm-svn: 87026
unfolding loads for hoisting. getOpcodeAfterMemoryUnfold returns the
opcode of the original operation without the load, not the load
itself, MachineLICM needs to know the operand index in order to get
the correct register class. Extend getOpcodeAfterMemoryUnfold to
return this information.
llvm-svn: 85622
implementations with a new MachineInstr::isInvariantLoad, which uses
MachineMemOperands and is target-independent. This brings MachineLICM
and other functionality to targets which previously lacked an
isInvariantLoad implementation.
llvm-svn: 83475
safe. This can happen we a subreg_to_reg 0 has been coalesced. One
exception is when the instruction that folds the load is a move, then we
can simply turn it into a 32-bit load from the stack slot.
rdar://7170444
llvm-svn: 81494
builds.
--- Reverse-merging (from foreign repository) r68552 into '.':
U test/CodeGen/X86/tls8.ll
U test/CodeGen/X86/tls10.ll
U test/CodeGen/X86/tls2.ll
U test/CodeGen/X86/tls6.ll
U lib/Target/X86/X86Instr64bit.td
U lib/Target/X86/X86InstrSSE.td
U lib/Target/X86/X86InstrInfo.td
U lib/Target/X86/X86RegisterInfo.cpp
U lib/Target/X86/X86ISelLowering.cpp
U lib/Target/X86/X86CodeEmitter.cpp
U lib/Target/X86/X86FastISel.cpp
U lib/Target/X86/X86InstrInfo.h
U lib/Target/X86/X86ISelDAGToDAG.cpp
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U lib/Target/X86/X86ISelLowering.h
U lib/Target/X86/X86InstrInfo.cpp
U lib/Target/X86/X86InstrBuilder.h
U lib/Target/X86/X86RegisterInfo.td
llvm-svn: 68560
This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.
Will work on it and on X86-64 support.
llvm-svn: 68552
suprise to some callers, e.g. register coalescer. For now, add an parameter
that tells AnalyzeBranch whether it's safe to modify the mbb. A better
solution is out there, but I don't have time to deal with it right now.
llvm-svn: 64124
the conditional for the BRCOND statement. For instance, it will generate:
addl %eax, %ecx
jo LOF
instead of
addl %eax, %ecx
; About 10 instructions to compare the signs of LHS, RHS, and sum.
jl LOF
llvm-svn: 60123
Where previously LLVM might emit code like this:
ucomisd %xmm1, %xmm0
setne %al
setp %cl
orb %al, %cl
jne .LBB4_2
it now emits this:
ucomisd %xmm1, %xmm0
jne .LBB4_2
jp .LBB4_2
It has fewer instructions and uses fewer registers, but it does
have more branches. And in the case that this code is followed by
a non-fallthrough edge, it may be followed by a jmp instruction,
resulting in three branch instructions in sequence. Some effort
is made to avoid this situation.
To achieve this, X86ISelLowering.cpp now recognizes FCMP_OEQ and
FCMP_UNE in lowered form, and replace them with code that emits
two branches, except in the case where it would require converting
a fall-through edge to an explicit branch.
Also, X86InstrInfo.cpp's branch analysis and transform code now
knows now to handle blocks with multiple conditional branches. It
uses loops instead of having fixed checks for up to two
instructions. It can now analyze and transform code generated
from FCMP_OEQ and FCMP_UNE.
llvm-svn: 57873
was inserted or not. This allows bitcast in fast isel to properly handle the case
where an appropriate reg-to-reg copy is not available.
llvm-svn: 55375
MachineMemOperands. The pools are owned by MachineFunctions.
This drastically reduces the number of calls to malloc/free made
during the "Emit" phase of scheduling, as well as later phases
in CodeGen. Combined with other changes, this speeds up the
"instruction selection" phase of CodeGen by 10% in some cases.
llvm-svn: 53212
Change insert/extract subreg instructions to be able to be used in TableGen patterns.
Use the above features to reimplement an x86-64 pseudo instruction as a pattern.
llvm-svn: 48130
both work right according to the new flags.
This removes the TII::isReallySideEffectFree predicate, and adds
TII::isInvariantLoad.
It removes NeverHasSideEffects+MayHaveSideEffects and adds
UnmodeledSideEffects as machine instr flags. Now the clients
can decide everything they need.
I think isRematerializable can be implemented in terms of the
flags we have now, though I will let others tackle that.
llvm-svn: 45843
a header file from libcodegen. This violates a layering order: codegen
depends on target, not the other way around. The fix to this is to
split TII into two classes, TII and TargetInstrInfoImpl, which defines
stuff that depends on libcodegen. It is defined in libcodegen, where
the base is not.
llvm-svn: 45475
function, then go ahead and hoist it out of the loop. This is the result:
$ cat a.c
volatile int G;
int A(int N) {
for (; N > 0; --N)
G++;
}
$ llc -o - -relocation-model=pic
_A:
...
LBB1_2: # bb
movl L_G$non_lazy_ptr-"L1$pb"(%eax), %esi
incl (%esi)
incl %edx
cmpl %ecx, %edx
jne LBB1_2 # bb
...
$ llc -o - -relocation-model=pic -machine-licm
_A:
...
movl L_G$non_lazy_ptr-"L1$pb"(%eax), %eax
LBB1_2: # bb
incl (%eax)
incl %edx
cmpl %ecx, %edx
jne LBB1_2 # bb
...
I'm limiting this to the MOV32rm x86 instruction for now.
llvm-svn: 45444
based what flag to set on whether it was already marked as
"isRematerializable". If there was a further check to determine if it's "really"
rematerializable, then I marked it as "mayHaveSideEffects" and created a check
in the X86 back-end similar to the remat one.
llvm-svn: 45132
instruction flag, and use the flag along with a virtual member function
hook for targets to override if there are instructions that are only
trivially rematerializable with specific operands (i.e. constant pool
loads).
llvm-svn: 37728
with a general target hook to identify rematerializable instructions. Some
instructions are only rematerializable with specific operands, such as loads
from constant pools, while others are always rematerializable. This hook
allows both to be identified as being rematerializable with the same
mechanism.
llvm-svn: 37644
- Added a new format for instructions where the source register is implied
and it is same as the destination register. Used for pseudo instructions
that clear the destination register.
llvm-svn: 25872
XMM registers. There are many known deficiencies and fixmes, which will be
addressed ASAP. The major benefit of this work is that it will allow the
LLVM register allocator to allocate FP registers across basic blocks.
The x86 backend will still default to x87 style FP. To enable this work,
you must pass -enable-sse-scalar-fp and either -sse2 or -sse3 to llc.
An example before and after would be for:
double foo(double *P) { double Sum = 0; int i; for (i = 0; i < 1000; ++i)
Sum += P[i]; return Sum; }
The inner loop looks like the following:
x87:
.LBB_foo_1: # no_exit
fldl (%esp)
faddl (%eax,%ecx,8)
fstpl (%esp)
incl %ecx
cmpl $1000, %ecx
#FP_REG_KILL
jne .LBB_foo_1 # no_exit
SSE2:
addsd (%eax,%ecx,8), %xmm0
incl %ecx
cmpl $1000, %ecx
#FP_REG_KILL
jne .LBB_foo_1 # no_exit
llvm-svn: 22340
I/O port instructions on x86. The specific code sequence is tailored to
the parameters and return value of the intrinsic call.
Added the ability for implicit defintions to be printed in the Instruction
Printer.
Added the ability for RawFrm instruction to print implict uses and
defintions with correct comma output. This required adjustment to some
methods so that a leading comma would or would not be printed.
llvm-svn: 12782
the size of the immediate and the memory operand on instructions that
use them. This resolves problems with instructions that take both a
memory and an immediate operand but their sizes differ (i.e. ADDmi32b).
llvm-svn: 11967