step is to make tblgen generate something more appropriate for MCInst,
and generate calls to operand translation routines where needed.
This includes a bunch of #if 0 code which will slowly be refactored into
something sensible.
llvm-svn: 73810
implementation. The idea is that we want asmprinting to
work by converting MachineInstrs into a new MCInst class,
then the per-instruction asmprinter works on MCInst. MCInst
and the new asmprinters will not depend on most of the
llvm code generators. This allows building diassemblers
that don't link in the whole llvm code generator. This is
step #1 of many.
llvm-svn: 73743
into DarwinTargetAsmInfo.cpp. The remaining differences should
be evaluated. It seems strange that x86/arm has .zerofill but ppc
doesn't, etc.
llvm-svn: 73742
initialization of all targets (InitializeAllTargets.h) or assembler
printers (InitializeAllAsmPrinters.h). This is a step toward the
elimination of relinked object files, so that we can build normal
archives.
llvm-svn: 73543
comes after the DW_CFA_def_cfa_register, because the CFA is really ESP from the
start of the function and only gets an offset when the "subl $xxx,%esp"
instruction happens, not the other way around.
And reapply r72898:
The DWARF unwind info was incorrect. While compiling with
`-fomit-frame-pointer', we would lack the DW_CFA_advance_loc information for a
lot of function, and then they would be `0'. The linker (at least on Darwin)
needs to encode the stack size. In some cases, the stack size is too large to
directly encode. So the linker checks to see if there is a "subl $xxx,%esp"
instruction at the point where the `DW_CFA_def_cfa_offset' says the pc was. If
so, the compact encoding records the offset in the function to where the stack
size is embedded. But because the `DW_CFA_advance_loc' instructions are missing,
it looks before the function and dies.
So, instead of emitting the EH debug label before the stack adjustment
operations, emit it afterwards, right before the frame move stuff.
llvm-svn: 73465
that push immediate operands of 1, 2, and 4 bytes (extended to the native
register size in each case). The assembly mnemonics are "pushl" and "pushq."
One such instruction appears at the beginning of the "start" function , so this
is essential for accurate disassembly when unwinding."
Patch by Sean Callanan!
llvm-svn: 73407
out of sync with regular cc.
The only difference between the tail call cc and the normal
cc was that one parameter register - R9 - was reserved for
calling functions through a function pointer. After time the
tail call cc has gotten out of sync with the regular cc.
We can use R11 which is also caller saved but not used as
parameter register for potential function pointers and
remove the special tail call cc on x86-64.
llvm-svn: 73233
Emission for globals, using the correct data sections
Function alignment can be computed for each target using TargetELFWriterInfo
Some small fixes
llvm-svn: 73201
on x86 to handle more cases. Fix a bug in said code that would cause it
to read past the end of an object. Rewrite the code in
SelectionDAGLegalize::ExpandBUILD_VECTOR to be a bit more general.
Remove PerformBuildVectorCombine, which is no longer necessary with
these changes. In addition to simplifying the code, with this change,
we can now catch a few more cases of consecutive loads.
llvm-svn: 73012
nodes for vectors with an i16 element type. Add an optimization for
building a vector which is all zeros/undef except for the bottom
element, where the bottom element is an i8 or i16.
llvm-svn: 72988
Update code generator to use this attribute and remove NoImplicitFloat target option.
Update llc to set this attribute when -no-implicit-float command line option is used.
llvm-svn: 72959
build vectors with i64 elements will only appear on 32b x86 before legalize.
Since vector widening occurs during legalize, and produces i64 build_vector
elements, the dag combiner is never run on these before legalize splits them
into 32b elements.
Teach the build_vector dag combine in x86 back end to recognize consecutive
loads producing the low part of the vector.
Convert the two uses of TLI's consecutive load recognizer to pass LoadSDNodes
since that was required implicitly.
Add a testcase for the transform.
Old:
subl $28, %esp
movl 32(%esp), %eax
movl 4(%eax), %ecx
movl %ecx, 4(%esp)
movl (%eax), %eax
movl %eax, (%esp)
movaps (%esp), %xmm0
pmovzxwd %xmm0, %xmm0
movl 36(%esp), %eax
movaps %xmm0, (%eax)
addl $28, %esp
ret
New:
movl 4(%esp), %eax
pmovzxwd (%eax), %xmm0
movl 8(%esp), %eax
movaps %xmm0, (%eax)
ret
llvm-svn: 72957
`-fomit-frame-pointer', we would lack the DW_CFA_advance_loc information for a
lot of function, and then they would be `0'. The linker (at least on Darwin)
needs to encode the stack size. In some cases, the stack size is too large to
directly encode. So the linker checks to see if there is a "subl $xxx,%esp"
instruction at the point where the `DW_CFA_def_cfa_offset' says the pc was. If
so, the compact encoding records the offset in the function to where the stack
size is embedded. But because the `DW_CFA_advance_loc' instructions are missing,
it looks before the function and dies.
So, instead of emitting the EH debug label before the stack adjustment
operations, emit it afterwards, right before the frame move stuff.
llvm-svn: 72898
Update code generator to use this attribute and remove DisableRedZone target option.
Update llc to set this attribute when -disable-red-zone command line option is used.
llvm-svn: 72894
ADDC/ADDE use MVT::i1 (later, whatever it gets legalized to)
instead of MVT::Flag. Remove CARRY_FALSE in favor of 0; adjust
all target-independent code to use this format.
Most targets will still produce a Flag-setting target-dependent
version when selection is done. X86 is converted to use i32
instead, which means TableGen needs to produce different code
in xxxGenDAGISel.inc. This keys off the new supportsHasI1 bit
in xxxInstrInfo, currently set only for X86; in principle this
is temporary and should go away when all other targets have
been converted. All relevant X86 instruction patterns are
modified to represent setting and using EFLAGS explicitly. The
same can be done on other targets.
The immediate behavior change is that an ADC/ADD pair are no
longer tightly coupled in the X86 scheduler; they can be
separated by instructions that don't clobber the flags (MOV).
I will soon add some peephole optimizations based on using
other instructions that set the flags to feed into ADC.
llvm-svn: 72707
decoding. Essentially, they both map to the same column in the "opcode
extensions for one- and two-byte opcodes" table in the x86 manual. The RawFrm
complicates decoding this.
Instead, use opcode 0x01, prefix 0x01, and form MRM1r. Then have the code
emitter special case these, a la [SML]FENCE.
llvm-svn: 72556
the Intel manual (screenshot) says it should be 0b11110110 (f6). The existing
encoding causes a disassembly conflict with MMX_PAVGBrm, which really should be
0f e0."
Patch by Sean Callanan!
llvm-svn: 72508
e.g.
orl $65536, 8(%rax)
=>
orb $1, 10(%rax)
Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization.
llvm-svn: 72507
FP_TO_XINT. Necessary for some cleanups I'm working on. Updated
from the previous version (r72431) to fix a bug and make some things a
bit clearer.
llvm-svn: 72445
- added processors k8-sse3, opteron-sse3, athlon64-sse3, amdfam10, and
barcelona with appropriate sse3/4a levels
- added FeatureSSE4A for amdfam10 processors
in X86Subtarget:
- added hasSSE4A
- updated AutoDetectSubtargetFeatures to detect SSE4A
- updated GetCurrentX86CPU to detect family 15 with sse3 as k8-sse3 and
family 10h as amdfam10
New processor names match those used by gcc.
Patch by Paul Redmond!
llvm-svn: 72434
systems instead of attempting to promote them to a 64-bit SINT_TO_FP or
FP_TO_SINT. This is in preparation for removing the type legalization
code from LegalizeDAG: once type legalization is gone from LegalizeDAG,
it won't be able to handle the i64 operand/result correctly.
This isn't quite ideal, but I don't think any other operation for any
target ends up in this situation, so treating this case specially seems
reasonable.
llvm-svn: 72324
booleans. This gives a better indication of what the "addReg()" is
doing. Remembering what all of those booleans mean isn't easy, especially if you
aren't spending all of your time in that code.
I took Jakob's suggestion and made it illegal to pass in "true" for the
flag. This should hopefully prevent any unintended misuse of this (by reverting
to the old way of using addReg()).
llvm-svn: 71722
more place. This fixes a bunch of x86-64 JIT regressions.
(Introduced when the value of the magic constant changed
in 68645. At the time apparently nobody noticed; failures
were hidden in 70343-70439 by an unrelated bug, so showed
up again as "new" failures in 70440.)
llvm-svn: 71106
-Replace DebugLocTuple's Source ID with CompileUnit's GlobalVariable*
-Remove DwarfWriter::getOrCreateSourceID
-Make necessary changes for the above (fix callsites, etc.)
llvm-svn: 70520
Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.
Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'll change the JIT with a follow-up patch.
llvm-svn: 70343
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.
Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'm not 100% sure if it's necessary to change it there...
llvm-svn: 70270
PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
llvm-svn: 70225
to precisely describe the h-register subreg register classes.
Thanks to Jakob Stoklund Olesen for spotting this and for the
initial patch!
Also, make getStoreRegOpcode and getLoadRegOpcode aware of the
needs of h registers.
llvm-svn: 70211
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.
llvm-svn: 69952
in the MachineFunction class, renaming it to addLiveIn for consistency with
the same method in MachineBasicBlock. Thanks for Anton for suggesting this.
llvm-svn: 69615
leaq foo@TLSGD(%rip), %rdi
as part of the instruction sequence. Using a register other than %rdi and then
copying it to %rdi is not valid.
llvm-svn: 69350
- Add patterns for h-register extract, which avoids a shift and mask,
and in some cases a temporary register.
- Add address-mode matching for turning (X>>(8-n))&(255<<n), where
n is a valid address-mode scale value, into an h-register extract
and a scaled-offset address.
- Replace X86's MOV32to32_ and related instructions with the new
target-independent COPY_TO_SUBREG instruction.
On x86-64 there are complicated constraints on h registers, and
CodeGen doesn't currently provide a high-level way to express all of them,
so they are handled with a bunch of special code. This code currently only
supports extracts where the result is used by a zero-extend or a store,
though these are fairly common.
These transformations are not always beneficial; since there are only
4 h registers, they sometimes require extra move instructions, and
this sometimes increases register pressure because it can force out
values that would otherwise be in one of those registers. However,
this appears to be relatively uncommon.
llvm-svn: 68962
ISD::SIGN_EXTEND_INREG. Tablegen-generated code can handle
these cases, and the scheduling issues observed earlier
appear to be resolved now.
llvm-svn: 68959