implementation. The idea is that we want asmprinting to
work by converting MachineInstrs into a new MCInst class,
then the per-instruction asmprinter works on MCInst. MCInst
and the new asmprinters will not depend on most of the
llvm code generators. This allows building diassemblers
that don't link in the whole llvm code generator. This is
step #1 of many.
llvm-svn: 73743
into DarwinTargetAsmInfo.cpp. The remaining differences should
be evaluated. It seems strange that x86/arm has .zerofill but ppc
doesn't, etc.
llvm-svn: 73742
initialization of all targets (InitializeAllTargets.h) or assembler
printers (InitializeAllAsmPrinters.h). This is a step toward the
elimination of relinked object files, so that we can build normal
archives.
llvm-svn: 73543
comes after the DW_CFA_def_cfa_register, because the CFA is really ESP from the
start of the function and only gets an offset when the "subl $xxx,%esp"
instruction happens, not the other way around.
And reapply r72898:
The DWARF unwind info was incorrect. While compiling with
`-fomit-frame-pointer', we would lack the DW_CFA_advance_loc information for a
lot of function, and then they would be `0'. The linker (at least on Darwin)
needs to encode the stack size. In some cases, the stack size is too large to
directly encode. So the linker checks to see if there is a "subl $xxx,%esp"
instruction at the point where the `DW_CFA_def_cfa_offset' says the pc was. If
so, the compact encoding records the offset in the function to where the stack
size is embedded. But because the `DW_CFA_advance_loc' instructions are missing,
it looks before the function and dies.
So, instead of emitting the EH debug label before the stack adjustment
operations, emit it afterwards, right before the frame move stuff.
llvm-svn: 73465
that push immediate operands of 1, 2, and 4 bytes (extended to the native
register size in each case). The assembly mnemonics are "pushl" and "pushq."
One such instruction appears at the beginning of the "start" function , so this
is essential for accurate disassembly when unwinding."
Patch by Sean Callanan!
llvm-svn: 73407
out of sync with regular cc.
The only difference between the tail call cc and the normal
cc was that one parameter register - R9 - was reserved for
calling functions through a function pointer. After time the
tail call cc has gotten out of sync with the regular cc.
We can use R11 which is also caller saved but not used as
parameter register for potential function pointers and
remove the special tail call cc on x86-64.
llvm-svn: 73233
Emission for globals, using the correct data sections
Function alignment can be computed for each target using TargetELFWriterInfo
Some small fixes
llvm-svn: 73201
on x86 to handle more cases. Fix a bug in said code that would cause it
to read past the end of an object. Rewrite the code in
SelectionDAGLegalize::ExpandBUILD_VECTOR to be a bit more general.
Remove PerformBuildVectorCombine, which is no longer necessary with
these changes. In addition to simplifying the code, with this change,
we can now catch a few more cases of consecutive loads.
llvm-svn: 73012
nodes for vectors with an i16 element type. Add an optimization for
building a vector which is all zeros/undef except for the bottom
element, where the bottom element is an i8 or i16.
llvm-svn: 72988
Update code generator to use this attribute and remove NoImplicitFloat target option.
Update llc to set this attribute when -no-implicit-float command line option is used.
llvm-svn: 72959
build vectors with i64 elements will only appear on 32b x86 before legalize.
Since vector widening occurs during legalize, and produces i64 build_vector
elements, the dag combiner is never run on these before legalize splits them
into 32b elements.
Teach the build_vector dag combine in x86 back end to recognize consecutive
loads producing the low part of the vector.
Convert the two uses of TLI's consecutive load recognizer to pass LoadSDNodes
since that was required implicitly.
Add a testcase for the transform.
Old:
subl $28, %esp
movl 32(%esp), %eax
movl 4(%eax), %ecx
movl %ecx, 4(%esp)
movl (%eax), %eax
movl %eax, (%esp)
movaps (%esp), %xmm0
pmovzxwd %xmm0, %xmm0
movl 36(%esp), %eax
movaps %xmm0, (%eax)
addl $28, %esp
ret
New:
movl 4(%esp), %eax
pmovzxwd (%eax), %xmm0
movl 8(%esp), %eax
movaps %xmm0, (%eax)
ret
llvm-svn: 72957
`-fomit-frame-pointer', we would lack the DW_CFA_advance_loc information for a
lot of function, and then they would be `0'. The linker (at least on Darwin)
needs to encode the stack size. In some cases, the stack size is too large to
directly encode. So the linker checks to see if there is a "subl $xxx,%esp"
instruction at the point where the `DW_CFA_def_cfa_offset' says the pc was. If
so, the compact encoding records the offset in the function to where the stack
size is embedded. But because the `DW_CFA_advance_loc' instructions are missing,
it looks before the function and dies.
So, instead of emitting the EH debug label before the stack adjustment
operations, emit it afterwards, right before the frame move stuff.
llvm-svn: 72898
Update code generator to use this attribute and remove DisableRedZone target option.
Update llc to set this attribute when -disable-red-zone command line option is used.
llvm-svn: 72894
ADDC/ADDE use MVT::i1 (later, whatever it gets legalized to)
instead of MVT::Flag. Remove CARRY_FALSE in favor of 0; adjust
all target-independent code to use this format.
Most targets will still produce a Flag-setting target-dependent
version when selection is done. X86 is converted to use i32
instead, which means TableGen needs to produce different code
in xxxGenDAGISel.inc. This keys off the new supportsHasI1 bit
in xxxInstrInfo, currently set only for X86; in principle this
is temporary and should go away when all other targets have
been converted. All relevant X86 instruction patterns are
modified to represent setting and using EFLAGS explicitly. The
same can be done on other targets.
The immediate behavior change is that an ADC/ADD pair are no
longer tightly coupled in the X86 scheduler; they can be
separated by instructions that don't clobber the flags (MOV).
I will soon add some peephole optimizations based on using
other instructions that set the flags to feed into ADC.
llvm-svn: 72707
decoding. Essentially, they both map to the same column in the "opcode
extensions for one- and two-byte opcodes" table in the x86 manual. The RawFrm
complicates decoding this.
Instead, use opcode 0x01, prefix 0x01, and form MRM1r. Then have the code
emitter special case these, a la [SML]FENCE.
llvm-svn: 72556
the Intel manual (screenshot) says it should be 0b11110110 (f6). The existing
encoding causes a disassembly conflict with MMX_PAVGBrm, which really should be
0f e0."
Patch by Sean Callanan!
llvm-svn: 72508
e.g.
orl $65536, 8(%rax)
=>
orb $1, 10(%rax)
Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization.
llvm-svn: 72507
FP_TO_XINT. Necessary for some cleanups I'm working on. Updated
from the previous version (r72431) to fix a bug and make some things a
bit clearer.
llvm-svn: 72445
- added processors k8-sse3, opteron-sse3, athlon64-sse3, amdfam10, and
barcelona with appropriate sse3/4a levels
- added FeatureSSE4A for amdfam10 processors
in X86Subtarget:
- added hasSSE4A
- updated AutoDetectSubtargetFeatures to detect SSE4A
- updated GetCurrentX86CPU to detect family 15 with sse3 as k8-sse3 and
family 10h as amdfam10
New processor names match those used by gcc.
Patch by Paul Redmond!
llvm-svn: 72434
systems instead of attempting to promote them to a 64-bit SINT_TO_FP or
FP_TO_SINT. This is in preparation for removing the type legalization
code from LegalizeDAG: once type legalization is gone from LegalizeDAG,
it won't be able to handle the i64 operand/result correctly.
This isn't quite ideal, but I don't think any other operation for any
target ends up in this situation, so treating this case specially seems
reasonable.
llvm-svn: 72324
booleans. This gives a better indication of what the "addReg()" is
doing. Remembering what all of those booleans mean isn't easy, especially if you
aren't spending all of your time in that code.
I took Jakob's suggestion and made it illegal to pass in "true" for the
flag. This should hopefully prevent any unintended misuse of this (by reverting
to the old way of using addReg()).
llvm-svn: 71722
more place. This fixes a bunch of x86-64 JIT regressions.
(Introduced when the value of the magic constant changed
in 68645. At the time apparently nobody noticed; failures
were hidden in 70343-70439 by an unrelated bug, so showed
up again as "new" failures in 70440.)
llvm-svn: 71106
-Replace DebugLocTuple's Source ID with CompileUnit's GlobalVariable*
-Remove DwarfWriter::getOrCreateSourceID
-Make necessary changes for the above (fix callsites, etc.)
llvm-svn: 70520
Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.
Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'll change the JIT with a follow-up patch.
llvm-svn: 70343
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.
Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'm not 100% sure if it's necessary to change it there...
llvm-svn: 70270
PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
llvm-svn: 70225
to precisely describe the h-register subreg register classes.
Thanks to Jakob Stoklund Olesen for spotting this and for the
initial patch!
Also, make getStoreRegOpcode and getLoadRegOpcode aware of the
needs of h registers.
llvm-svn: 70211
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.
llvm-svn: 69952
in the MachineFunction class, renaming it to addLiveIn for consistency with
the same method in MachineBasicBlock. Thanks for Anton for suggesting this.
llvm-svn: 69615
leaq foo@TLSGD(%rip), %rdi
as part of the instruction sequence. Using a register other than %rdi and then
copying it to %rdi is not valid.
llvm-svn: 69350
- Add patterns for h-register extract, which avoids a shift and mask,
and in some cases a temporary register.
- Add address-mode matching for turning (X>>(8-n))&(255<<n), where
n is a valid address-mode scale value, into an h-register extract
and a scaled-offset address.
- Replace X86's MOV32to32_ and related instructions with the new
target-independent COPY_TO_SUBREG instruction.
On x86-64 there are complicated constraints on h registers, and
CodeGen doesn't currently provide a high-level way to express all of them,
so they are handled with a bunch of special code. This code currently only
supports extracts where the result is used by a zero-extend or a store,
though these are fairly common.
These transformations are not always beneficial; since there are only
4 h registers, they sometimes require extra move instructions, and
this sometimes increases register pressure because it can force out
values that would otherwise be in one of those registers. However,
this appears to be relatively uncommon.
llvm-svn: 68962
ISD::SIGN_EXTEND_INREG. Tablegen-generated code can handle
these cases, and the scheduling issues observed earlier
appear to be resolved now.
llvm-svn: 68959
Create debug_inlined dwarf section using these information. This info is used by gdb, at least on Darwin, to enable better experience debugging inlined functions. See DwarfWriter.cpp for more information on structure of debug_inlined section.
llvm-svn: 68847
with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.
This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.
Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.
Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.
Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.
Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.
llvm-svn: 68576
builds.
--- Reverse-merging (from foreign repository) r68552 into '.':
U test/CodeGen/X86/tls8.ll
U test/CodeGen/X86/tls10.ll
U test/CodeGen/X86/tls2.ll
U test/CodeGen/X86/tls6.ll
U lib/Target/X86/X86Instr64bit.td
U lib/Target/X86/X86InstrSSE.td
U lib/Target/X86/X86InstrInfo.td
U lib/Target/X86/X86RegisterInfo.cpp
U lib/Target/X86/X86ISelLowering.cpp
U lib/Target/X86/X86CodeEmitter.cpp
U lib/Target/X86/X86FastISel.cpp
U lib/Target/X86/X86InstrInfo.h
U lib/Target/X86/X86ISelDAGToDAG.cpp
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U lib/Target/X86/X86ISelLowering.h
U lib/Target/X86/X86InstrInfo.cpp
U lib/Target/X86/X86InstrBuilder.h
U lib/Target/X86/X86RegisterInfo.td
llvm-svn: 68560
This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.
Will work on it and on X86-64 support.
llvm-svn: 68552
entered via fall-through. Don't miss fallthroughs from blocks
terminated by conditional branches. Also, move
isOnlyReachableByFallthrough out of line.
llvm-svn: 68129
only reachable via fall-through edges. This dramatically reduces the
number of labels printed, and thus also the number of labels the
assembler must parse and remember.
llvm-svn: 68073
x * 40
=>
shlq $3, %rdi
leaq (%rdi,%rdi,4), %rax
This has the added benefit of allowing more multiply to be folded into addressing mode. e.g.
a * 24 + b
=>
leaq (%rdi,%rdi,2), %rax
leaq (%rsi,%rax,8), %rax
llvm-svn: 67917
%a = ...
%b = and i32 %a, 2
%c = srl i32 %b, 1
%d = br i32 %c,
into
%a = ...
%b = and %a, 2
%c = X86ISD::CMP %b, 0
%d = X86ISD::BRCOND %c ...
This applies only when the AND constant value has one bit set and the SRL
constant is equal to the log2 of the AND constant. The back-end is smart enough
to convert the result into a TEST/JMP sequence.
llvm-svn: 67728
to be returned in DL. LLVM's multiple-return-value support is
not ABI-conforming; front-ends that wish to have code emitted
that conforms to an ABI are currently expected to make
arrangements for this on their own rather than assuming that
multiple-return-values will automatically do the right thing.
This commit doesn't fundamentally change this situation.
llvm-svn: 67588
not safe in general because the immediate could be an arbitrary
value that does not fit in a 32-bit pcrel displacement.
Conservatively fall back to loading the value into a register
and calling through it.
We still do the optzn on X86-32.
llvm-svn: 67142
operand is a signed 32-bit immediate. Unlike with the 8-bit
signed immediate case, it isn't actually smaller to fold a
32-bit signed immediate instead of a load. In fact, it's
larger in the case of 32-bit unsigned immediates, because
they can be materialized with movl instead of movq.
llvm-svn: 67001
ptrtoint and inttoptr in X86FastISel. These casts aren't always
handled in the generic FastISel code because X86 sometimes needs
custom code to do truncation and zero-extension.
llvm-svn: 66988
by inserting explicit zero extensions where necessary. Included
is a testcase where SelectionDAG produces a virtual register
holding an i1 value which FastISel previously mistakenly assumed
to be zero-extended.
llvm-svn: 66941
1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.
llvm-svn: 66875
for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this). On the example, we now
generate:
_test:
movl 4(%esp), %eax
cmpl $42, (%eax)
setl %al
movzbl %al, %eax
leal 4(%eax,%eax,8), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
cmpl $41, (%eax)
movl $4, %ecx
movl $13, %eax
cmovg %ecx, %eax
ret
llvm-svn: 66869
operands can't both be fully folded at the same time. For example,
in the included testcase, a global variable is being added with
an add of two values. The global variable wants RIP-relative
addressing, so it can't share the address with another base
register, but it's still possible to fold the initial add.
llvm-svn: 66865
related transformations out of target-specific dag combine into the
ARM backend. These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).
Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0 -> (zext(cond) << 3). This happens frequently
with the recently added cp constant select optimization, but is a
very general xform. For example, we now compile the second example
in const-select.ll to:
_test:
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
seta %al
movzbl %al, %eax
movl 4(%esp), %ecx
movsbl (%ecx,%eax,4), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
leal 4(%eax), %ecx
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
cmovbe %eax, %ecx
movsbl (%ecx), %eax
ret
This passes multisource and dejagnu.
llvm-svn: 66779
and extern_weak_odr. These are the same as the non-odr versions,
except that they indicate that the global will only be overridden
by an *equivalent* global. In C, a function with weak linkage can
be overridden by a function which behaves completely differently.
This means that IP passes have to skip weak functions, since any
deductions made from the function definition might be wrong, since
the definition could be replaced by something completely different
at link time. This is not allowed in C++, thanks to the ODR
(One-Definition-Rule): if a function is replaced by another at
link-time, then the new function must be the same as the original
function. If a language knows that a function or other global can
only be overridden by an equivalent global, it can give it the
weak_odr linkage type, and the optimizers will understand that it
is alright to make deductions based on the function body. The
code generators on the other hand map weak and weak_odr linkage
to the same thing.
llvm-svn: 66339
the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.
llvm-svn: 66318
INC64_32r and INC64_16r, because these instructions are encoded
differently on x86-64. This fixes JIT regressions on x86-64 in
kimwitu++ and others.
llvm-svn: 66207
of MachineInstr def operands must be subtracted out. This bug
was uncovered by the recent x86 EFLAGS optimization. Before
that, the only instructions that ever needed unfolding were
things like CMP32rm, where NumDefs is zero.
llvm-svn: 66056
them are generic changes.
- Use the "fast" flag that's already being passed into the asm printers instead
of shoving it into the DwarfWriter.
- Instead of calling "MI->getParent()->getParent()" for every MI, set the
machine function when calling "runOnMachineFunction" in the asm printers.
llvm-svn: 65379