Update code generator to use this attribute and remove NoImplicitFloat target option.
Update llc to set this attribute when -no-implicit-float command line option is used.
llvm-svn: 72959
build vectors with i64 elements will only appear on 32b x86 before legalize.
Since vector widening occurs during legalize, and produces i64 build_vector
elements, the dag combiner is never run on these before legalize splits them
into 32b elements.
Teach the build_vector dag combine in x86 back end to recognize consecutive
loads producing the low part of the vector.
Convert the two uses of TLI's consecutive load recognizer to pass LoadSDNodes
since that was required implicitly.
Add a testcase for the transform.
Old:
subl $28, %esp
movl 32(%esp), %eax
movl 4(%eax), %ecx
movl %ecx, 4(%esp)
movl (%eax), %eax
movl %eax, (%esp)
movaps (%esp), %xmm0
pmovzxwd %xmm0, %xmm0
movl 36(%esp), %eax
movaps %xmm0, (%eax)
addl $28, %esp
ret
New:
movl 4(%esp), %eax
pmovzxwd (%eax), %xmm0
movl 8(%esp), %eax
movaps %xmm0, (%eax)
ret
llvm-svn: 72957
`-fomit-frame-pointer', we would lack the DW_CFA_advance_loc information for a
lot of function, and then they would be `0'. The linker (at least on Darwin)
needs to encode the stack size. In some cases, the stack size is too large to
directly encode. So the linker checks to see if there is a "subl $xxx,%esp"
instruction at the point where the `DW_CFA_def_cfa_offset' says the pc was. If
so, the compact encoding records the offset in the function to where the stack
size is embedded. But because the `DW_CFA_advance_loc' instructions are missing,
it looks before the function and dies.
So, instead of emitting the EH debug label before the stack adjustment
operations, emit it afterwards, right before the frame move stuff.
llvm-svn: 72898
integer and floating-point opcodes, introducing
FAdd, FSub, and FMul.
For now, the AsmParser, BitcodeReader, and IRBuilder all preserve
backwards compatability, and the Core LLVM APIs preserve backwards
compatibility for IR producers. Most front-ends won't need to change
immediately.
This implements the first step of the plan outlined here:
http://nondot.org/sabre/LLVMNotes/IntegerOverflow.txt
llvm-svn: 72897
Update code generator to use this attribute and remove DisableRedZone target option.
Update llc to set this attribute when -disable-red-zone command line option is used.
llvm-svn: 72894
using Promote which won't work because i64 isn't
a legal type. It's easy enough to use Custom, but
then we have the problem that when the type
legalizer is promoting FP_TO_UINT->i16, it has no
way of telling it should prefer FP_TO_SINT->i32
to FP_TO_UINT->i32. I have uncomfortably hacked
this by making the type legalizer choose FP_TO_SINT
when both are Custom.
This fixes several regressions in the testsuite.
llvm-svn: 72891
carry GlobalBaseReg, and GlobalRetAddr too in Alpha's case. This
eliminates the need for them to search through the
MachineRegisterInfo livein list in order to identify these
virtual registers. EmitLiveInCopies is now the only user of the
virtual register portion of MachineRegisterInfo's livein data.
llvm-svn: 72802
ADDC/ADDE use MVT::i1 (later, whatever it gets legalized to)
instead of MVT::Flag. Remove CARRY_FALSE in favor of 0; adjust
all target-independent code to use this format.
Most targets will still produce a Flag-setting target-dependent
version when selection is done. X86 is converted to use i32
instead, which means TableGen needs to produce different code
in xxxGenDAGISel.inc. This keys off the new supportsHasI1 bit
in xxxInstrInfo, currently set only for X86; in principle this
is temporary and should go away when all other targets have
been converted. All relevant X86 instruction patterns are
modified to represent setting and using EFLAGS explicitly. The
same can be done on other targets.
The immediate behavior change is that an ADC/ADD pair are no
longer tightly coupled in the X86 scheduler; they can be
separated by instructions that don't clobber the flags (MOV).
I will soon add some peephole optimizations based on using
other instructions that set the flags to feed into ADC.
llvm-svn: 72707
decoding. Essentially, they both map to the same column in the "opcode
extensions for one- and two-byte opcodes" table in the x86 manual. The RawFrm
complicates decoding this.
Instead, use opcode 0x01, prefix 0x01, and form MRM1r. Then have the code
emitter special case these, a la [SML]FENCE.
llvm-svn: 72556
the Intel manual (screenshot) says it should be 0b11110110 (f6). The existing
encoding causes a disassembly conflict with MMX_PAVGBrm, which really should be
0f e0."
Patch by Sean Callanan!
llvm-svn: 72508
e.g.
orl $65536, 8(%rax)
=>
orb $1, 10(%rax)
Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization.
llvm-svn: 72507
FP_TO_XINT. Necessary for some cleanups I'm working on. Updated
from the previous version (r72431) to fix a bug and make some things a
bit clearer.
llvm-svn: 72445
- added processors k8-sse3, opteron-sse3, athlon64-sse3, amdfam10, and
barcelona with appropriate sse3/4a levels
- added FeatureSSE4A for amdfam10 processors
in X86Subtarget:
- added hasSSE4A
- updated AutoDetectSubtargetFeatures to detect SSE4A
- updated GetCurrentX86CPU to detect family 15 with sse3 as k8-sse3 and
family 10h as amdfam10
New processor names match those used by gcc.
Patch by Paul Redmond!
llvm-svn: 72434
systems instead of attempting to promote them to a 64-bit SINT_TO_FP or
FP_TO_SINT. This is in preparation for removing the type legalization
code from LegalizeDAG: once type legalization is gone from LegalizeDAG,
it won't be able to handle the i64 operand/result correctly.
This isn't quite ideal, but I don't think any other operation for any
target ends up in this situation, so treating this case specially seems
reasonable.
llvm-svn: 72324
to run last because it needs to know the exact size and position of every
basic block. Currently CodePlacementOpt is set up to run last. It might be
worthwhile to investigate reordering these passes, but for now, let's just
make it work.
llvm-svn: 72037
llvm.eh.sjlj.* for better clarity as to their purpose and scope. Add
a description of llvm.eh.sjlj.setjmp to ExceptionHandling.html.
(llvm.eh.sjlj.longjmp documentation coming when that implementation is
added).
llvm-svn: 71758
booleans. This gives a better indication of what the "addReg()" is
doing. Remembering what all of those booleans mean isn't easy, especially if you
aren't spending all of your time in that code.
I took Jakob's suggestion and made it illegal to pass in "true" for the
flag. This should hopefully prevent any unintended misuse of this (by reverting
to the old way of using addReg()).
llvm-svn: 71722
without one. Use it where we were using abs on
int64_t objects.
(I strongly suspect the casts to unsigned in the
fragments in LoopStrengthReduce are not doing whatever
the original intent was, but the obvious change to
uint64_t doesn't work. Maybe later.)
llvm-svn: 71612
a supporting preliminary patch for GCC-compatible SjLJ exception handling. Note that these intrinsics are not designed to be invoked directly by the user, but
rather used by the front-end as target hooks for exception handling.
llvm-svn: 71610
and generalize it so that it can be used by IndVarSimplify. Implement the
base IndVarSimplify transformation code using IVUsers. This removes
TestOrigIVForWrap and associated code, as ScalarEvolution now has enough
builtin overflow detection and folding logic to handle all the same cases,
and more. Run "opt -iv-users -analyze -disable-output" on your favorite
loop for an example of what IVUsers does.
This lets IndVarSimplify eliminate IV casts and compute trip counts in
more cases. Also, this happens to finally fix the remaining testcases
in PR1301.
Now that IndVarSimplify is being more aggressive, it occasionally runs
into the problem where ScalarEvolutionExpander's code for avoiding
duplicate expansions makes it difficult to ensure that all expanded
instructions dominate all the instructions that will use them. As a
temporary measure, IndVarSimplify now uses a FixUsesBeforeDefs function
to fix up instructions inserted by SCEVExpander. Fortunately, this code
is contained, and can be easily removed once a more comprehensive
solution is available.
llvm-svn: 71535
only for those. These extern declarations to intrinsics are currently
being emitted at the bottom of generated .s file, which works fine with
gpasm(not sure about MPSAM though).
PIC16 linker generates errors for few cases (function-args/struct_args_5) if you do not include any
extern declarations (even if no intrinsics are being used), but that
needs to be fixed in the linker itself.
llvm-svn: 71423
more place. This fixes a bunch of x86-64 JIT regressions.
(Introduced when the value of the magic constant changed
in 68645. At the time apparently nobody noticed; failures
were hidden in 70343-70439 by an unrelated bug, so showed
up again as "new" failures in 70440.)
llvm-svn: 71106
Split large global data (both initialized and un-initialized) into multiple sections of <= 80 bytes.
Provide routines to manage PIC16 ABI naming conventions.
llvm-svn: 71073
-Replace DebugLocTuple's Source ID with CompileUnit's GlobalVariable*
-Remove DwarfWriter::getOrCreateSourceID
-Make necessary changes for the above (fix callsites, etc.)
llvm-svn: 70520
Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.
Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'll change the JIT with a follow-up patch.
llvm-svn: 70343
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.
Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'm not 100% sure if it's necessary to change it there...
llvm-svn: 70270
PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
llvm-svn: 70225
to precisely describe the h-register subreg register classes.
Thanks to Jakob Stoklund Olesen for spotting this and for the
initial patch!
Also, make getStoreRegOpcode and getLoadRegOpcode aware of the
needs of h registers.
llvm-svn: 70211
between registers and the stack may be required with the APCS ABI, but it
isn't tied to using a particular version of the ARM architecture.
llvm-svn: 69978
chained and "flagged" together. I also made a few changes to handle the
chain and flag values more consistently. I found these problems by
inspection so I'm not aware of anything that breaks because of them
(thus no testcase).
llvm-svn: 69977
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.
llvm-svn: 69952