MERGE_VALUES nodes. Replacing the result values with the
operands in one MERGE_VALUES node may cause another
MERGE_VALUES node be CSE'd with the first one, and bring
its uses along, so that the first one isn't dead, as this
code expects. Fix this by iterating until the node is
really dead. This fixes PR4699.
llvm-svn: 78619
instead of syntactically as a string. This means that it keeps track of the
segment, section, flags, etc directly and asmprints them in the right format.
This also includes parsing and validation support for llvm-mc and
"attribute(section)", so we should now start getting errors about invalid
section attributes from the compiler instead of the assembler on darwin.
Still todo:
1) Uniquing of darwin mcsections
2) Move all the Darwin stuff out to MCSectionMachO.[cpp|h]
3) there are a few FIXMEs, for example what is the syntax to get the
S_GB_ZEROFILL segment type?
llvm-svn: 78547
by aggressive chain operand optimization. UpdateNodeOperands
does not modify the node in place if it would result in
a node identical to an existing node.
llvm-svn: 78297
and high-bits values in ways that weren't correct for integer
types wider than 64 bits. This fixes a miscompile in
PPMacroExpansion.cpp in clang on x86-64.
llvm-svn: 78295
Instead of awkwardly encoding calling-convention information with ISD::CALL,
ISD::FORMAL_ARGUMENTS, ISD::RET, and ISD::ARG_FLAGS nodes, TargetLowering
provides three virtual functions for targets to override:
LowerFormalArguments, LowerCall, and LowerRet, which replace the custom
lowering done on the special nodes. They provide the same information, but
in a more immediately usable format.
This also reworks much of the target-independent tail call logic. The
decision of whether or not to perform a tail call is now cleanly split
between target-independent portions, and the target dependent portion
in IsEligibleForTailCallOptimization.
This also synchronizes all in-tree targets, to help enable future
refactoring and feature work.
llvm-svn: 78142
When LowerExtract eliminates an EXTRACT_SUBREG with a kill flag, it moves the
kill flag to the place where the sub-register is killed. This can accidentally
overlap with the use of a sibling sub-register, and we have trouble.
In the test case we have this code:
Live Ins: %R0 %R1 %R2
%R2L<def> = EXTRACT_SUBREG %R2<kill>, 1
%R2H<def> = LOAD16fi <fi#-1>, 0, Mem:LD(2,4) [FixedStack-1 + 0]
%R1L<def> = EXTRACT_SUBREG %R1<kill>, 1
%R0L<def> = EXTRACT_SUBREG %R0<kill>, 1
%R0H<def> = ADD16 %R2H<kill>, %R2L<kill>, %AZ<imp-def>, %AN<imp-def>, %AC0<imp-def>, %V<imp-def>, %VS<imp-def>
subreg: CONVERTING: %R2L<def> = EXTRACT_SUBREG %R2<kill>, 1
subreg: eliminated!
subreg: killed here: %R0H<def> = ADD16 %R2H, %R2L, %R2<imp-use,kill>, %AZ<imp-def>, %AN<imp-def>, %AC0<imp-def>, %V<imp-def>, %VS<imp-def>
The kill flag on %R2 is moved to the last instruction, and the live range overlaps with the definition of %R2H:
*** Bad machine code: Redefining a live physical register ***
- function: f
- basic block: 0x18358c0 (#0)
- instruction: %R2H<def> = LOAD16fi <fi#-1>, 0, Mem:LD(2,4) [FixedStack-1 + 0]
Register R2H was defined but already live.
The fix is to replace EXTRACT_SUBREG with IMPLICIT_DEF instead of eliminating
it completely:
subreg: CONVERTING: %R2L<def> = EXTRACT_SUBREG %R2<kill>, 1
subreg: replace by: %R2L<def> = IMPLICIT_DEF %R2<kill>
Note that these IMPLICIT_DEF instructions survive to the asm output. It is
necessary to fix the stack-color-with-reg test case because of that.
llvm-svn: 78093
This is not just a matter of passing in the target triple from the module;
currently backends are making decisions based on the build and host
architecture. The goal is to migrate to making these decisions based off of the
triple (in conjunction with the feature string). Thus most clients pass in the
target triple, or the host triple if that is empty.
This has one important change in the way behavior of the JIT and llc.
For the JIT, it was previously selecting the Target based on the host
(naturally), but it was setting the target machine features based on the triple
from the module. Now it is setting the target machine features based on the
triple of the host.
For LLC, -march was previously only used to select the target, the target
machine features were initialized from the module's triple (which may have been
empty). Now the target triple is taken from the module, or the host's triple is
used if that is empty. Then the triple is adjusted to match -march.
The take away is that -march for llc is now used in conjunction with the host
triple to initialize the subtarget. If users want more deterministic behavior
from llc, they should use -mtriple, or set the triple in the input module.
llvm-svn: 77946
to:
.quad X
even on a 32-bit system, where X is not 64-bits. There isn't much that
we can do here, so we just print:
.quad ((X) & 4294967295)
instead.
llvm-svn: 77818
padding is disabled, tabs get replaced by spaces except in the case of
the first operand, where the tab is output to line up the operands after
the mnemonics.
Add some better comments and eliminate redundant code.
Fix some testcases to not assume tabs.
llvm-svn: 77740
into the mergable section if it is one of our special cases. This could
obviously be improved, but this is the minimal fix and restores us to the
previous behavior.
llvm-svn: 77679
When the return value is not used (i.e. only care about the value in the memory), x86 does not have to use add to implement these. Instead, it can use add, sub, inc, dec instructions with the "lock" prefix.
This is currently implemented using a bit of instruction selection trick. The issue is the target independent pattern produces one output and a chain and we want to map it into one that just output a chain. The current trick is to select it into a merge_values with the first definition being an implicit_def. The proper solution is to add new ISD opcodes for the no-output variant. DAG combiner can then transform the node before it gets to target node selection.
Problem #2 is we are adding a whole bunch of x86 atomic instructions when in fact these instructions are identical to the non-lock versions. We need a way to add target specific information to target nodes and have this information carried over to machine instructions. Asm printer (or JIT) can use this information to add the "lock" prefix.
llvm-svn: 77582
and make it more aggressive, we now put:
const int G2 __attribute__((weak)) = 42;
into the text (readonly) segment like gcc, previously we put
it into the data (readwrite) segment.
llvm-svn: 77104
dumping ground of various SSE4.1 tests, since filecheck can reasonably
handle them all in one file. Generalize it to check x86-64 stuff as
well since it has a different ABI (a convenient way to test both the
reg and mem forms of these instructions).
llvm-svn: 76848
Getelementptrs that are defined to wrap are virtually useless to
optimization, and getelementptrs that are undefined on any kind
of overflow are too restrictive -- it's difficult to ensure that
all intermediate addresses are within bounds. I'm going to take
a different approach.
Remove a few optimizations that depended on this flag.
llvm-svn: 76437
Inline asm instructions may have additional <imp-def,kill> register operands.
These operands are not marked with a flag like the normal asm operands, so we
must not assert that there is a flag.
llvm-svn: 76373
stack alignment right when it is. This is not
ideal but conservatively correct. Adjust a test
to compensate for changed stack offset value.
gcc.apple/asm-block-57.c
llvm-svn: 76120
additional bug fixes:
1. The bug that everyone hit was a problem in the asmprinter where it
would remove $stub but keep the L prefix on a name when emitting the
indirect symbol. This is easy to fix by keeping the name of the stub
and the name of the symbol in a StringMap instead of just keeping a
StringSet and trying to reconstruct it late.
2. There was a problem printing the personality function. The current
logic to print out the personality function from the DWARF information
is a bit of a cesspool right now that duplicates a bunch of other
logic in the asm printer. The short version of it is that it depends
on emitting both the L and _ prefix for symbols (at least on darwin)
and until I can untangle it, it is best to switch the mangler back to
emitting both prefixes.
llvm-svn: 75646
unbreaking llvm-gcc (on Darwin).
--- Reverse-merging r75620 into '.':
U include/llvm/Support/Mangler.h
--- Reverse-merging r75610 into '.':
U test/CodeGen/X86/loop-hoist.ll
G include/llvm/Support/Mangler.h
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U lib/VMCore/Mangler.cpp
llvm-svn: 75636
to symbols instead of doing it with "printSuffixedName". This gets us to the point
where there is a real separation between computing a symbol name and printing it,
something I need for MC printer stuff.
This patch also fixes a corner case bug where unnamed private globals wouldn't get
the private label prefix.
Next up, rename all uses of getValueName -> getMangledName for better greppability,
and then tackle the ppc/arm backends to eliminate "printSuffixedName".
llvm-svn: 75610
indicates whether the label is private or not, instead of taking
prefix stuff. One effect of this is that symbols will be generated
with *just* the private prefix, instead of both the private prefix
*and* the user-label-prefix, but this doesn't matter as long as it
is consistent. For example we'll now get "Lfoo" instead of "L_foo".
These are just assembler temporary labels anyway, so they never even
make it into the .o file.
llvm-svn: 75607
1) unique globals with the existing "Count" local in Mangler, not with
atomic nonsense. Using atomics will give us nondeterminstic output
from the compiler when using multiple threads, which is bad.
2) Do not mangle an unknown global name with a type suffix. We don't
need this anymore now that llvm ir doesn't have type planes.
llvm-svn: 75541
of lea. It is better for code size (and presumably efficiency) to use:
movl $foo, %eax
rather than:
leal foo, eax
Both give a nice zero extending "move immediate" instruction, the former is just
smaller. Note that global addresses should be handled different by the x86
backend, but I chose to follow the style already in place and add more fixme's.
llvm-svn: 75403
Basically, using:
lea symbol(%rip), %rax
is not valid in -static mode, because the current RIP may not be
within 32-bits of "symbol" when an app is built partially pic and
partially static. The fix for this is to compile it to:
lea symbol, %rax
It would be better to codegen this as:
movq $symbol, %rax
but that will come next.
The hard part of fixing this bug was fixing abi-isel, which was actively
testing for the wrong behavior. Also, the RUN lines are completely impossible
to understand what they are testing. To help with this, convert the -static
x86-64 codegen tests to use filecheck. This is much more stable and makes it
more clear what the codegen is expected to be.
llvm-svn: 75382
value. Adjust other code to deal with that correctly. Make
DAGTypeLegalizer::PromoteIntRes_EXTRACT_VECTOR_ELT take advantage of
this new flexibility to simplify the code and make it deal with unusual
vectors (like <4 x i1>) correctly. Fixes PR3037.
llvm-svn: 75176
registers based on dynamic conditions. For example, X86 EBP/RBP, when used as
frame register has to be spilled in the first fixed object. It should inform
PEI this so it doesn't get allocated another stack object. Also, it should not
be spilled as other callee-saved registers but rather its spilling and restoring
are being handled by emitPrologue and emitEpilogue. Avoid spilling it twice.
llvm-svn: 75116
* remove some old code that was needed when we'd put ESP in the scale instead of
the base of some instructions.
* Fix a bug with the P modifier in inline asm that caused us to drop it.
llvm-svn: 75077
VSETCC must define all bits, which is different than it was documented
to before. Since all targets that implement VSETCC already have this
behavior, and we don't optimize based on this, just change the
documentation. We now get nice code for vec_compare.ll
llvm-svn: 74978
Avoid unnecessary duplication of operand 0 of X86::FpSET_ST0_80. This duplication would
cause one register to remain on the stack at the function return.
llvm-svn: 74534
implementation primarily differs from the former in that the asmprinter
doesn't make a zillion decisions about whether or not something will be
RIP relative or not. Instead, those decisions are made by isel lowering
and propagated through to the asm printer. To achieve this, we:
1. Represent RIP relative addresses by setting the base of the X86 addr
mode to X86::RIP.
2. When ISel Lowering decides that it is safe to use RIP, it lowers to
X86ISD::WrapperRIP. When it is unsafe to use RIP, it lowers to
X86ISD::Wrapper as before.
3. This removes isRIPRel from X86ISelAddressMode, representing it with
a basereg of RIP instead.
4. The addressing mode matching logic in isel is greatly simplified.
5. The asmprinter is greatly simplified, notably the "NotRIPRel" predicate
passed through various printoperand routines is gone now.
6. The various symbol printing routines in asmprinter now no longer infer
when to emit (%rip), they just print the symbol.
I think this is a big improvement over the previous situation. It does have
two small caveats though: 1. I implemented a horrible "no-rip" modifier for
the inline asm "P" constraint modifier. This is a short term hack, there is
a much better, but more involved, solution. 2. I had to xfail an
-aggressive-remat testcase because it isn't handling the use of RIP in the
constant-pool reading instruction. This specific test is easy to fix without
-aggressive-remat, which I intend to do next.
llvm-svn: 74372
trip counts in more cases.
Generalize ScalarEvolution's isLoopGuardedByCond code to recognize
And and Or conditions, splitting the code out into an
isNecessaryCond helper function so that it can evaluate Ands and Ors
recursively, and make SCEVExpander be much more aggressive about
hoisting instructions out of loops.
test/CodeGen/X86/pr3495.ll has an additional instruction now, but
it appears to be due to an arbitrary register allocation difference.
llvm-svn: 74048
a global with that gets printed with the :mem modifier. All operands to lea's
should be handled with the lea32mem operand kind, and this allows the TLS stuff
to do this. There are several better ways to do this, but I went for the minimal
change since I can't really test this (beyond make check).
This also makes the use of EBX explicit in the operand list in the 32-bit,
instead of implicit in the instruction.
llvm-svn: 73834
casted induction variables in cases where the cast
isn't foldable. It ended up being a pessimization in
many cases. This could be fixed, but it would require
a bunch of complicated code in IVUsers' clients. The
advantages of this approach aren't visible enough to
justify it at this time.
llvm-svn: 73706
support for x86, and UMULO/SMULO for many architectures, including PPC
(PR4201), ARM, and Cell. The resulting expansion isn't perfect, but it's
not bad.
llvm-svn: 73477
incomming chain of the RETURN node. The incomming chain must
be the outgoing chain of the CALL node. This causes the
backend to identify tail calls that are not tail calls. This
patch fixes this.
llvm-svn: 73387
out of sync with regular cc.
The only difference between the tail call cc and the normal
cc was that one parameter register - R9 - was reserved for
calling functions through a function pointer. After time the
tail call cc has gotten out of sync with the regular cc.
We can use R11 which is also caller saved but not used as
parameter register for potential function pointers and
remove the special tail call cc on x86-64.
llvm-svn: 73233
on x86 to handle more cases. Fix a bug in said code that would cause it
to read past the end of an object. Rewrite the code in
SelectionDAGLegalize::ExpandBUILD_VECTOR to be a bit more general.
Remove PerformBuildVectorCombine, which is no longer necessary with
these changes. In addition to simplifying the code, with this change,
we can now catch a few more cases of consecutive loads.
llvm-svn: 73012
nodes for vectors with an i16 element type. Add an optimization for
building a vector which is all zeros/undef except for the bottom
element, where the bottom element is an i8 or i16.
llvm-svn: 72988
build vectors with i64 elements will only appear on 32b x86 before legalize.
Since vector widening occurs during legalize, and produces i64 build_vector
elements, the dag combiner is never run on these before legalize splits them
into 32b elements.
Teach the build_vector dag combine in x86 back end to recognize consecutive
loads producing the low part of the vector.
Convert the two uses of TLI's consecutive load recognizer to pass LoadSDNodes
since that was required implicitly.
Add a testcase for the transform.
Old:
subl $28, %esp
movl 32(%esp), %eax
movl 4(%eax), %ecx
movl %ecx, 4(%esp)
movl (%eax), %eax
movl %eax, (%esp)
movaps (%esp), %xmm0
pmovzxwd %xmm0, %xmm0
movl 36(%esp), %eax
movaps %xmm0, (%eax)
addl $28, %esp
ret
New:
movl 4(%esp), %eax
pmovzxwd (%eax), %xmm0
movl 8(%esp), %eax
movaps %xmm0, (%eax)
ret
llvm-svn: 72957
integer and floating-point opcodes, introducing
FAdd, FSub, and FMul.
For now, the AsmParser, BitcodeReader, and IRBuilder all preserve
backwards compatability, and the Core LLVM APIs preserve backwards
compatibility for IR producers. Most front-ends won't need to change
immediately.
This implements the first step of the plan outlined here:
http://nondot.org/sabre/LLVMNotes/IntegerOverflow.txt
llvm-svn: 72897
Update code generator to use this attribute and remove DisableRedZone target option.
Update llc to set this attribute when -disable-red-zone command line option is used.
llvm-svn: 72894
e.g.
orl $65536, 8(%rax)
=>
orb $1, 10(%rax)
Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization.
llvm-svn: 72507
The DAGCombiner created a negative shiftamount, stored in an
unsigned variable. Later the optimizer eliminated the shift entirely as being
undefined.
Example: (srl (shl X, 56) 48). ShiftAmt is 4294967288.
Fix it by checking that the shiftamount is positive, and storing in a signed
variable.
llvm-svn: 72331
and it wasn't generating calls through @PLT for these functions.
hasLocalLinkage() is now false for available_externally,
I attempted to fix the inliner and dce to handle available_externally properly.
It passed make check.
llvm-svn: 72328
When a test fails with more than a pipeful of output on stdout AND stderr, one
of the DejaGnu programs blocks. The problem can be avoided by redirecting
stdout to a file.
llvm-svn: 71919
and generalize it so that it can be used by IndVarSimplify. Implement the
base IndVarSimplify transformation code using IVUsers. This removes
TestOrigIVForWrap and associated code, as ScalarEvolution now has enough
builtin overflow detection and folding logic to handle all the same cases,
and more. Run "opt -iv-users -analyze -disable-output" on your favorite
loop for an example of what IVUsers does.
This lets IndVarSimplify eliminate IV casts and compute trip counts in
more cases. Also, this happens to finally fix the remaining testcases
in PR1301.
Now that IndVarSimplify is being more aggressive, it occasionally runs
into the problem where ScalarEvolutionExpander's code for avoiding
duplicate expansions makes it difficult to ensure that all expanded
instructions dominate all the instructions that will use them. As a
temporary measure, IndVarSimplify now uses a FixUsesBeforeDefs function
to fix up instructions inserted by SCEVExpander. Fortunately, this code
is contained, and can be easily removed once a more comprehensive
solution is available.
llvm-svn: 71535
type, rather than assume that it does. If the operand is not vector, it
shouldn't be run through ScalarizeVectorOp. This fixes one of the
testcases in PR3886.
llvm-svn: 71453
allow it to have multiple CFG edges to that block. This is needed
to allow MachineBasicBlock::isOnlyReachableByFallthrough to work
correctly. This fixes PR4126.
llvm-svn: 71018
of returning a list of pointers to Values that are deleted. This was
unsafe, because the pointers in the list are, by nature of what
RecursivelyDeleteDeadInstructions does, always dangling. Replace this
with a simple callback mechanism. This may eventually be removed if
all clients can reasonably be expected to use CallbackVH.
Use this to factor out the dead-phi-cycle-elimination code from LSR
utility function, and generalize it to use the
RecursivelyDeleteTriviallyDeadInstructions utility function.
This makes LSR more aggressive about eliminating dead PHI cycles;
adjust tests to either be less trivial or to simply expect fewer
instructions.
llvm-svn: 70636
memory operands otherwise the writebacks get lost when the inline asm
doesn't otherwise have side effects. This fixes rdar://6839427, though
clang really shouldn't generate these anymore.
llvm-svn: 70455
Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.
Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'll change the JIT with a follow-up patch.
llvm-svn: 70343
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.
Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'm not 100% sure if it's necessary to change it there...
llvm-svn: 70270
information to simplify [sz]ext({a,+,b}) to {zext(a),+,[zs]ext(b)},
as appropriate.
These functions and the trip count code each call into the other, so
this requires careful handling to avoid infinite recursion. During
the initial trip count computation, conservative SCEVs are used,
which are subsequently discarded once the trip count is actually
known.
Among other benefits, this change lets LSR automatically eliminate
some unnecessary zext-inreg and sext-inreg operation where the
operand is an induction variable.
llvm-svn: 70241
PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
llvm-svn: 70225
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.
llvm-svn: 69952
This fixes a very subtle bug. vr defined by an implicit_def is allowed overlap with any register since it doesn't actually modify anything. However, if it's used as a two-address use, its live range can be extended and it can be spilled. The spiller must take care not to emit a reload for the vn number that's defined by the implicit_def. This is both a correctness and performance issue.
llvm-svn: 69743
%reg1498<def> = MOV32rm %reg1024, 1, %reg0, 12, %reg0, Mem:LD(4,4) [sunkaddr39 + 0]
%reg1506<def> = MOV32rm %reg1024, 1, %reg0, 8, %reg0, Mem:LD(4,4) [sunkaddr42 + 0]
%reg1486<def> = MOV32rr %reg1506
%reg1486<def> = XOR32rr %reg1486, %reg1498, %EFLAGS<imp-def,dead>
%reg1510<def> = MOV32rm %reg1024, 1, %reg0, 4, %reg0, Mem:LD(4,4) [sunkaddr45 + 0]
=>
%reg1498<def> = MOV32rm %reg2036, 1, %reg0, 12, %reg0, Mem:LD(4,4) [sunkaddr39 + 0]
%reg1506<def> = MOV32rm %reg2037, 1, %reg0, 8, %reg0, Mem:LD(4,4) [sunkaddr42 + 0]
%reg1486<def> = MOV32rr %reg1506
%reg1486<def> = XOR32rr %reg1486, %reg1498, %EFLAGS<imp-def,dead>
%reg1510<def> = MOV32rm %reg2038, 1, %reg0, 4, %reg0, Mem:LD(4,4) [sunkaddr45 + 0]
From linearscan's point of view, each of reg2036, 2037, and 2038 are separate registers, each is "killed" after a single use. The reloaded register is available and it's often clobbered right away. e.g. In thise case reg1498 is allocated EAX while reg2036 is allocated RAX. This means we end up with multiple reloads from the same stack slot in the same basic block.
Now linearscan recognize there are other reloads from same SS in the same BB. So it'll "downgrade" RAX (and its aliases) after reg2036 is allocated until the next reload (reg2037) is done. This greatly increase the likihood reloads from SS are reused.
This speeds up sha1 from OpenSSL by 5.8%. It is also an across the board win for SPEC2000 and 2006.
llvm-svn: 69585
for the optimization it's testing to kick in (although
it improves the code, getting rid of all spills).
I don't understand the optimization well enough to
rescue the test, so XFAILing.
llvm-svn: 69409
leaq foo@TLSGD(%rip), %rdi
as part of the instruction sequence. Using a register other than %rdi and then
copying it to %rdi is not valid.
llvm-svn: 69350
register is available and when it's profitable.
e.g.
xorq %r12<kill>, %r13
addq %rax, -184(%rbp)
addq %r13, -184(%rbp)
==>
xorq %r12<kill>, %r13
movq -184(%rbp), %r12
addq %rax, %r12
addq %r13, %r12
movq %r12, -184(%rbp)
Two more instructions, but fewer memory accesses. It can also open up
opportunities for more optimizations.
llvm-svn: 69341
have pointer types, though in contrast to C pointer types, SCEV
addition is never implicitly scaled. This not only eliminates the
need for special code like IndVars' EliminatePointerRecurrence
and LSR's own GEP expansion code, it also does a better job because
it lets the normal optimizations handle pointer expressions just
like integer expressions.
Also, since LLVM IR GEPs can't directly index into multi-dimensional
VLAs, moving the GEP analysis out of client code and into the SCEV
framework makes it easier for clients to handle multi-dimensional
VLAs the same way as other arrays.
Some existing regression tests show improved optimization.
test/CodeGen/ARM/2007-03-13-InstrSched.ll in particular improved to
the point where if-conversion started kicking in; I turned it off
for this test to preserve the intent of the test.
llvm-svn: 69258
operator is used by a CopyToReg to export the value to a different
block, don't reuse the CopyToReg's register for the subreg operation
result if the register isn't precisely the right class for the
subreg operation.
Also, rename the h-registers.ll test, now that there are more
than one.
llvm-svn: 69087
- Add patterns for h-register extract, which avoids a shift and mask,
and in some cases a temporary register.
- Add address-mode matching for turning (X>>(8-n))&(255<<n), where
n is a valid address-mode scale value, into an h-register extract
and a scaled-offset address.
- Replace X86's MOV32to32_ and related instructions with the new
target-independent COPY_TO_SUBREG instruction.
On x86-64 there are complicated constraints on h registers, and
CodeGen doesn't currently provide a high-level way to express all of them,
so they are handled with a bunch of special code. This code currently only
supports extracts where the result is used by a zero-extend or a store,
though these are fairly common.
These transformations are not always beneficial; since there are only
4 h registers, they sometimes require extra move instructions, and
this sometimes increases register pressure because it can force out
values that would otherwise be in one of those registers. However,
this appears to be relatively uncommon.
llvm-svn: 68962
in addition to ZERO_EXTEND and SIGN_EXTEND. Fix a bug in the
way it checked for live-out values, and simplify the way it
find users by using SDNode::use_iterator's (relatively) new
features. Also, make it slightly more permissive on targets
with free truncates.
In SelectionDAGBuild, avoid creating ANY_EXTEND nodes that are
larger than necessary. If the target's SwitchAmountTy has
enough bits, use it. This exposes the truncate to optimization
early, enabling more optimizations.
llvm-svn: 68670
with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.
This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.
Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.
Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.
Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.
Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.
llvm-svn: 68576
builds.
--- Reverse-merging (from foreign repository) r68552 into '.':
U test/CodeGen/X86/tls8.ll
U test/CodeGen/X86/tls10.ll
U test/CodeGen/X86/tls2.ll
U test/CodeGen/X86/tls6.ll
U lib/Target/X86/X86Instr64bit.td
U lib/Target/X86/X86InstrSSE.td
U lib/Target/X86/X86InstrInfo.td
U lib/Target/X86/X86RegisterInfo.cpp
U lib/Target/X86/X86ISelLowering.cpp
U lib/Target/X86/X86CodeEmitter.cpp
U lib/Target/X86/X86FastISel.cpp
U lib/Target/X86/X86InstrInfo.h
U lib/Target/X86/X86ISelDAGToDAG.cpp
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U lib/Target/X86/X86ISelLowering.h
U lib/Target/X86/X86InstrInfo.cpp
U lib/Target/X86/X86InstrBuilder.h
U lib/Target/X86/X86RegisterInfo.td
llvm-svn: 68560
This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.
Will work on it and on X86-64 support.
llvm-svn: 68552
e.g.
%reg1024<def> = MOV r1
%reg1025<def> = ADD %reg1024, %reg1026
r0 = MOV %reg1025
If it's not possible / profitable to commute ADD, then turning ADD into a LEA saves a copy.
llvm-svn: 68065
x * 40
=>
shlq $3, %rdi
leaq (%rdi,%rdi,4), %rax
This has the added benefit of allowing more multiply to be folded into addressing mode. e.g.
a * 24 + b
=>
leaq (%rdi,%rdi,2), %rax
leaq (%rsi,%rax,8), %rax
llvm-svn: 67917
e.g. allocating for GR32, bh is not used, updating bl spill weight.
bl should get the same spill weight otherwise it will be choosen
as a spill candidate since spilling bh doesn't make ebx available.
This fix PR2866.
llvm-svn: 67574
same as a normal i80 {low64, high16} rather
than its own {high64, low16}. A depressing number
of places know about this; I think I got them all.
Bitcode readers and writers convert back to the old
form to avoid breaking compatibility.
llvm-svn: 67562
%RAX<def> = ...
%RAX<def> = SUBREG_TO_REG 0, %EAX:3<kill>, 3
The first def is defining RAX, not EAX so the top bits were not zero-extended.
llvm-svn: 67511
not safe in general because the immediate could be an arbitrary
value that does not fit in a 32-bit pcrel displacement.
Conservatively fall back to loading the value into a register
and calling through it.
We still do the optzn on X86-32.
llvm-svn: 67142
size by the array amount as an i32 value instead of promoting from
i32 to i64 then doing the multiply. Not doing this broke wrap-around
assumptions that the optimizers (validly) made. The ultimate real
fix for this is to introduce i64 version of alloca and remove mallocinst.
This fixes PR3829
llvm-svn: 67093
vector shuffle mask. Forced the mask to be built using i32. Note: this will
be irrelevant once vector_shuffle no longer takes a build vector for the
shuffle mask.
llvm-svn: 67076
U test/CodeGen/X86/2009-03-13-PHIElimBug.ll
D test/CodeGen/X86/2009-03-16-PHIElimInLPad.ll
U lib/CodeGen/PHIElimination.cpp
r67049 was causing this failure:
Running /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/CodeGen/X86/dg.exp ...
FAIL: /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/CodeGen/X86/2009-03-13-PHIElimBug.ll for PR3784
Failed with exit(1) at line 1
while running: llvm-as < /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/CodeGen/X86/2009-03-13-PHIElimBug.ll | llc -march=x86 | /usr/bin/grep -A 2 {call f} | /usr/bin/grep movl
child process exited abnormally
llvm-svn: 67051
how invokes are set up. The fix could be disturbed by
register copies coming after the EH_LABEL, and also didn't
behave quite right when it was the invoke result that
was used in a phi node. Also (see new testcase) fix
another phi elimination bug while there: register copies
in the landing pad need to come after the EH_LABEL, because
that's where execution branches to when unwinding. If they
come before the EH_LABEL then they will never be executed...
Also tweak the original testcase so it doesn't use a no-longer
existing counter.
The accumulated phi elimination changes fix two of seven Ada
testsuite failures that turned up after landing pad critical
edge splitting was turned off. So there's probably more to come.
llvm-svn: 67049
ptrtoint and inttoptr in X86FastISel. These casts aren't always
handled in the generic FastISel code because X86 sometimes needs
custom code to do truncation and zero-extension.
llvm-svn: 66988
by inserting explicit zero extensions where necessary. Included
is a testcase where SelectionDAG produces a virtual register
holding an i1 value which FastISel previously mistakenly assumed
to be zero-extended.
llvm-svn: 66941
1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.
llvm-svn: 66875
for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this). On the example, we now
generate:
_test:
movl 4(%esp), %eax
cmpl $42, (%eax)
setl %al
movzbl %al, %eax
leal 4(%eax,%eax,8), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
cmpl $41, (%eax)
movl $4, %ecx
movl $13, %eax
cmovg %ecx, %eax
ret
llvm-svn: 66869
operands can't both be fully folded at the same time. For example,
in the included testcase, a global variable is being added with
an add of two values. The global variable wants RIP-relative
addressing, so it can't share the address with another base
register, but it's still possible to fold the initial add.
llvm-svn: 66865
in the Ada testcase. Reverting this only covers up
the real problem, which is a nasty conceptual difficulty
in the phi elimination pass: when eliminating phi nodes
in landing pads, the register copies need to come before
the invoke, not at the end of the basic block which is
too late... See PR3784.
llvm-svn: 66826
related transformations out of target-specific dag combine into the
ARM backend. These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).
Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0 -> (zext(cond) << 3). This happens frequently
with the recently added cp constant select optimization, but is a
very general xform. For example, we now compile the second example
in const-select.ll to:
_test:
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
seta %al
movzbl %al, %eax
movl 4(%esp), %ecx
movsbl (%ecx,%eax,4), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
leal 4(%eax), %ecx
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
cmovbe %eax, %ecx
movsbl (%ecx), %eax
ret
This passes multisource and dejagnu.
llvm-svn: 66779
alignment of the generated constant pool entry to the
desired alignment of a type. If we don't do this, we end up
trying to do movsd from 4-byte alignment memory. This fixes
450.soplex and 456.hmmer.
llvm-svn: 66641
1. Use the same value# to represent unknown values being merged into sub-registers.
2. When coalescer commute an instruction and the destination is a physical register, update its sub-registers by merging in the extended ranges.
llvm-svn: 66610
the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.
llvm-svn: 66318
with multiple chain operands. This can occur when the scheduler
has added chain operands to a node that already has a chain
operand, in order to handle physical register dependencies.
This fixes an llvm-gcc bootstrap failure on x86-64 introduced
in r66058.
llvm-svn: 66240
so it changed it into a 31 via the TLO.ShrinkDemandedConstant() call. Then it
would go through the DAG combiner again. This time it had a value of 31, which
was turned into a -1 by TLI.SimplifyDemandedBits(). This would ping pong
forever.
Teach the TLO.ShrinkDemandedConstant() call not to lower a value if the demanded
value is an XOR of all ones.
llvm-svn: 65985
extracts + build_vector into a shuffle would fail, because the
type of the new build_vector would not be legal. Try harder to
create a legal build_vector type. Note: this will be totally
irrelevant once vector_shuffle no longer takes a build_vector for
shuffle mask.
New:
_foo:
xorps %xmm0, %xmm0
xorps %xmm1, %xmm1
subps %xmm1, %xmm1
mulps %xmm0, %xmm1
addps %xmm0, %xmm1
movaps %xmm1, 0
Old:
_foo:
xorps %xmm0, %xmm0
movss %xmm0, %xmm1
xorps %xmm2, %xmm2
unpcklps %xmm1, %xmm2
pshufd $80, %xmm1, %xmm1
unpcklps %xmm1, %xmm2
pslldq $16, %xmm2
pshufd $57, %xmm2, %xmm1
subps %xmm0, %xmm1
mulps %xmm0, %xmm1
addps %xmm0, %xmm1
movaps %xmm1, 0
llvm-svn: 65791