long test(long x) { return (x & 123124) | 3; }
Currently compiles to:
_test:
orl $3, %edi
movq %rdi, %rax
andq $123127, %rax
ret
This is because instruction and DAG combiners canonicalize
(or (and x, C), D) -> (and (or, D), (C | D))
However, this is only profitable if (C & D) != 0. It gets in the way of the
3-addressification because the input bits are known to be zero.
llvm-svn: 97616
CopyToReg/CopyFromReg/INLINEASM. These are annoying because
they have the same opcode before an after isel. Fix this by
setting their NodeID to -1 to indicate that they are selected,
just like what automatically happens when selecting things that
end up being machine nodes.
With that done, give IsLegalToFold a new flag that causes it to
ignore chains. This lets the HandleMergeInputChains routine be
the one place that validates chains after a match is successful,
enabling the new hotness in chain processing. This smarter
chain processing eliminates the need for "PreprocessRMW" in the
X86 and MSP430 backends and enables MSP to start matching it's
multiple mem operand instructions more aggressively.
I currently #if out the dead code in the X86 backend and MSP
backend, I'll remove it for real in a follow-on patch.
The testcase changes are:
test/CodeGen/X86/sse3.ll: we generate better code
test/CodeGen/X86/store_op_load_fold2.ll: PreprocessRMW was
miscompiling this before, we now generate correct code
Convert it to filecheck while I'm at it.
test/CodeGen/MSP430/Inst16mm.ll: Add a testcase for mem/mem
folding to make anton happy. :)
llvm-svn: 97596
was that we weren't properly handling the case when interior
nodes of a matched pattern become dead after updating chain
and flag uses. Now we handle this explicitly in
UpdateChainsAndFlags.
llvm-svn: 97561
DoInstructionSelection. Inline "SelectRoot" into it from DAGISelHeader.
Sink some other stuff out of DAGISelHeader into SDISel.
Eliminate the various 'Indent' stuff from various targets, which dates
to when isel was recursive.
17 files changed, 114 insertions(+), 430 deletions(-)
llvm-svn: 97555
stuff now that we don't care about emulating the old broken
behavior of the old isel. This eliminates the
'CheckChainCompatible' check (along with IsChainCompatible) which
did an incorrect and inefficient scan *up* the chain nodes which
happened as the pattern was being formed and does the validation
at the end in HandleMergeInputChains when it forms a structural
pattern. This scans "down" the graph, which means that it is
quickly bounded by nodes already selected. This also handles
token factors that get "trapped" in the dag.
Removing the CheckChainCompatible nodes also shrinks the
generated tables by about 6K for X86 (down to 83K).
There are two pieces remaining before I can nuke PreprocessRMW:
1. I xfailed a test because we're now producing worse code in a
case that has nothing to do with the change: it turns out that
our use of MorphNodeTo will leave dead nodes in the graph
which (depending on how the graph is walked) end up causing
bogus uses of chains and blocking matches. This is really
bad for other reasons, so I'll fix this in a follow-up patch.
2. CheckFoldableChainNode needs to be improved to handle the TF.
llvm-svn: 97539
ComplexPattern at the root be generated multiple times, once
for each opcode they are part of. This encourages factoring
because the opcode checks get treated just like everything
else in the matcher.
llvm-svn: 97439
to a scope where every child starts with a CheckOpcode, but
executes more efficiently. Enhance DAGISelMatcherOpt to
form it.
This also fixes a bug in CheckOpcode: apparently the SDNodeInfo
objects are not pointer comparable, we have to compare the
enum name.
llvm-svn: 97438
defs or uses. The regular def and use checking below covers them, and
can be more precise. It's safe to hoist an instruction with a dead
implicit def if the register isn't live into the loop header.
llvm-svn: 97352
for alignment into the LSDA. If the TType base offset is emitted, then put the
padding there. Otherwise, put it in the call site table length. There will be no
conflict between the two sites when placing the padding in one place.
llvm-svn: 97277
The PowerPC floating point registers can represent both f32 and f64 via the
two register classes F4RC and F8RC. F8RC is considered a subclass of F4RC to
allow cross-class coalescing. This coalescing only affects whether registers
are spilled as f32 or f64.
Spill slots must be accessed with load/store instructions corresponding to the
class of the spilled register. PPCInstrInfo::foldMemoryOperandImpl was looking
at the instruction opcode which is wrong.
X86 has similar floating point register classes, but doesn't try to fold
memory operands, so there is no problem there.
llvm-svn: 97262
the alignment requirement, if it no longer makes the TType base offset overflow
into extra bytes, then we need to pad to those bytes ourselves.
llvm-svn: 97196
will eliminate the need for padding in the "Call site table length". E.g., if
we have this:
GCC_except_table1:
Lexception1:
.byte 0xff ## @LPStart Encoding = omit
.byte 0x9b ## @TType Encoding = indirect pcrel sdata4
.byte 0x7f ## @TType base offset
.byte 0x03 ## Call site Encoding = udata4
.byte 0x89 ## Call site table length
with padding of 1. We want to emit the padding like this:
GCC_except_table1:
Lexception1:
.byte 0xff ## @LPStart Encoding = omit
.byte 0x9b ## @TType Encoding = indirect pcrel sdata4
.byte 0xff ## @TType base offset
.space 1,0 ## Padding
.byte 0x03 ## Call site Encoding = udata4
.byte 0x89 ## Call site table length
and not with padding on the "Call site table length" entry.
llvm-svn: 97183
terms of store and load, which means bitcasting between scalar
integer and vector has endian-specific results, which undermines
this whole approach.
llvm-svn: 97137
GCC_except_table label but before the Lexception, which the FDE references.
This causes problems as the FDE does not point to the start of an LSDA chunk.
Use an unnormalized uleb128 for the call-site table length that includes the
padding.
llvm-svn: 97078
the number of value bits, not the number of bits of allocation for in-memory
storage.
Make getTypeStoreSize and getTypeAllocSize work consistently for arrays and
vectors.
Fix several places in CodeGen which compute offsets into in-memory vectors
to use TargetData information.
This fixes PR1784.
llvm-svn: 97064
necessary to swap the operands to handle NaN and negative zero properly.
Also, reintroduce logic for checking for NaN conditions when forming
SSE min and max instructions, fixed to take into consideration NaNs and
negative zeros. This allows forming min and max instructions in more
cases.
llvm-svn: 97025
to adding them in a determinstic order (bottom up from
the root) based on the structure of the graph itself.
This updates tests for some random changes, interesting
bits: CodeGen/Blackfin/promote-logic.ll no longer crashes.
I have no idea why, but that's good right?
CodeGen/X86/2009-07-16-LoadFoldingBug.ll also fails, but
now compiles to have one fewer constant pool entry, making
the expected load that was being folded disappear. Since it
is an unreduced mass of gnast, I just removed it.
This fixes PR6370
llvm-svn: 97023
Previously, LiveIntervalAnalysis would infer phi joins by looking for multiply
defined registers. That doesn't work if the phi join is implicitly defined in
all but one of the predecessors.
llvm-svn: 96994
no id's would cause early exit allowing IsLegalToFold to return true
instead of false, producing a cyclic dag.
This was striking the new isel because it isn't using SelectNodeTo yet,
which theoretically is just an optimization.
llvm-svn: 96972
126.gcc nightly tests. These failures uncovered latent bugs that machine DCE
could remove one half of a stack adjust down/up pair, causing PEI to assert.
This update fixes that, and the tests now pass.
llvm-svn: 96822
dragonegg self-host build. I reverted 96640 in order to revert
96556 (96640 goes on top of 96556), but it also looks like with
both of them applied the breakage happens even earlier. The
symptom of the 96556 miscompile is the following crash:
llvm[3]: Compiling AlphaISelLowering.cpp for Release build
cc1plus: /home/duncan/tmp/tmp/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:4982: void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode*, llvm::SDNode*, llvm::SelectionDAG::DAGUpdateListener*): Assertion `(!From->hasAnyUseOfValue(i) || From->getValueType(i) == To->getValueType(i)) && "Cannot use this version of ReplaceAllUsesWith!"' failed.
Stack dump:
0. Running pass 'X86 DAG->DAG Instruction Selection' on function '@_ZN4llvm19AlphaTargetLowering14LowerOperationENS_7SDValueERNS_12SelectionDAGE'
g++: Internal error: Aborted (program cc1plus)
This occurs when building LLVM using LLVM built by LLVM (via
dragonegg). Probably LLVM has miscompiled itself, though it
may have miscompiled GCC and/or dragonegg itself: at this point
of the self-host build, all of GCC, LLVM and dragonegg were built
using LLVM. Unfortunately this kind of thing is extremely hard
to debug, and while I did rummage around a bit I didn't find any
smoking guns, aka obviously miscompiled code.
Found by bisection.
r96556 | evancheng | 2010-02-18 03:13:50 +0100 (Thu, 18 Feb 2010) | 5 lines
Some dag combiner goodness:
Transform br (xor (x, y)) -> br (x != y)
Transform br (xor (xor (x,y), 1)) -> br (x == y)
Also normalize (and (X, 1) == / != 1 -> (and (X, 1)) != / == 0 to match to "test on x86" and "tst on arm"
r96640 | evancheng | 2010-02-19 01:34:39 +0100 (Fri, 19 Feb 2010) | 16 lines
Transform (xor (setcc), (setcc)) == / != 1 to
(xor (setcc), (setcc)) != / == 1.
e.g. On x86_64
%0 = icmp eq i32 %x, 0
%1 = icmp eq i32 %y, 0
%2 = xor i1 %1, %0
br i1 %2, label %bb, label %return
=>
testl %edi, %edi
sete %al
testl %esi, %esi
sete %cl
cmpb %al, %cl
je LBB1_2
llvm-svn: 96672
for ARM to just check if a function has a FP to determine if it's safe
to simplify the stack adjustment pseudo ops prior to eliminating frame
indices. Allow targets to override the default behavior and does so for ARM
and Thumb2.
llvm-svn: 96634
Moderate the weight given to very small intervals.
The spill weight given to new intervals created when spilling was not
normalized in the same way as the original spill weights calculated by
CalcSpillWeights. That meant that restored registers would tend to hang around
because they had a much higher spill weight that unspilled registers.
This improves the runtime of a few tests by up to 10%, and there are no
significant regressions.
llvm-svn: 96613
checking whether AnalyzeBranch disagrees with the CFG
directly, rather than looking for EH_LABEL instructions.
EH_LABEL instructions aren't always at the end of the
block, due to FP_REG_KILL and other things. This fixes
an infinite loop compiling MultiSource/Benchmarks/Bullet.
llvm-svn: 96611
A virtual register can be used before it is defined in the same MBB if the MBB
is part of a loop. Teach the implicit-def pass about this case.
llvm-svn: 96279
IsLegalToFold and IsProfitableToFold. The generic version of the later simply checks whether the folding candidate has a single use.
This allows the target isel routines more flexibility in deciding whether folding makes sense. The specific case we are interested in is folding constant pool loads with multiple uses.
llvm-svn: 96255
When coalescing with a physreg, remember to add imp-def and imp-kill when
dealing with sub-registers.
Also fix a related bug in VirtRegRewriter where substitutePhysReg may
reallocate the operand list on an instruction and invalidate the reg_iterator.
This can happen when a register is mentioned twice on the same instruction.
llvm-svn: 96072
created. This ensures it's updated at all time. It means targets which perform
dynamic stack alignment would know whether it is required and whether frame
pointer register cannot be made available register allocation.
This is a fix for rdar://7625239. Sorry, I can't create a reasonably sized test
case.
llvm-svn: 96069
phi cycles. Adjust a few tests to keep dead instructions from being optimized
away. This (together with my previous change for phi cycles) fixes Apple
radar 7627077.
llvm-svn: 96057
bug fixes, and with improved heuristics for analyzing foreign-loop
addrecs.
This change also flattens IVUsers, eliminating the stride-oriented
groupings, which makes it easier to work with.
llvm-svn: 95975
* Enabled R1/R2 application for nodes with infinite spill costs in the Briggs heuristic (made
safe by the changes to the normalization proceedure).
* Removed a redundant header.
llvm-svn: 95973
reduce down to a single value. InstCombine already does this transformation
but DAG legalization may introduce new opportunities. This has turned out to
be important for ARM where 64-bit values are split up during type legalization:
InstCombine is not able to remove the PHI cycles on the 64-bit values but
the separate 32-bit values can be optimized. I measured the compile time
impact of this (running llc on 176.gcc) and it was not significant.
llvm-svn: 95951
Calling RemoveOperand is very expensive on huge PHI instructions. This makes
early tail duplication run twice as fast on the Firefox JavaScript
interpreter.
llvm-svn: 95832
lowering and requires that certain types exist in ValueTypes.h. Modified widening to
check if an op can trap and if so, the widening algorithm will apply only the op on
the defined elements. It is safer to do this in widening because the optimizer can't
guarantee removing unused ops in some cases.
llvm-svn: 95823
legalization even when the IR-level optimizer has removed dead phis, such
as when the high half of an i64 value is unused on a 32-bit target.
I had to adjust a few test cases that had dead phis.
This is a partial fix for Radar 7627077.
llvm-svn: 95816
register coalescing. This fixes many crashes and
places where debug info affects codegen (when
dbg.value is lowered to machine instructions, which
it isn't yet in TOT).
llvm-svn: 95739
The major win of this is that the code is simpler and they
print on the same line as the instruction again:
movl %eax, 96(%esp) ## 4-byte Spill
movl 96(%esp), %eax ## 4-byte Reload
cmpl 92(%esp), %eax ## 4-byte Folded Reload
jl LBB7_86
llvm-svn: 95738
into TargetOpcodes.h. #include the new TargetOpcodes.h
into MachineInstr. Add new inline accessors (like isPHI())
to MachineInstr, and start using them throughout the
codebase.
llvm-svn: 95687
Previously spill registers, whose def indexes are not defined, would sometimes be improperly marked as coalescable with conflicting registers. The new findCoalesces routine conservatively assumes that any register with at least one undefined def is not coalescable with any register it interferes with.
llvm-svn: 95636
in global initializers. Instead of aborting, attempt to fold them on the
spot. If folding succeeds, emit the folded expression instead.
This fixes PR6255.
llvm-svn: 95583
warns about this base class not having a virtual destructor, but since
this class has no virtual methods and neither it or the types derived
from it has a destructor, a protected trivial destructor will do (and
shuts cppcheck up) the trick without the cost of introducing a vtable.
llvm-svn: 95526
only run for x86 with fastisel. I've found it being very effective in
eliminating some obvious dead code as result of formal parameter lowering
especially when tail call optimization eliminated the need for some of the loads
from fixed frame objects. It also shrinks a number of the tests. A couple of
tests no longer make sense and are now eliminated.
llvm-svn: 95493
following it. However, the EmitGlobalConstant method wasn't emitting a body for
the constant. The assembler doesn't like that. Before, we were generating this:
.zerofill __DATA, __common, __cmd, 1, 3
This fix puts us back to that semantic.
llvm-svn: 95336
the end of the instruction instead of expecting the caller to
do it. This currently causes the asm-verbose instruction
comments to be on the next line.
llvm-svn: 95178
than DEBUG_VALUE :( ) into the target indep AsmPrinter.cpp
file. This allows elimination of the
NO_ASM_WRITER_BOILERPLATE hack among other things.
llvm-svn: 95177
stderr if in filetype=obj mode. This is a hack, and will
live until dwarf emission and other random stuff that is
not yet going through MCStreamer is upgraded. It only
impacts filetype=obj mode.
llvm-svn: 95166
$ cat t.ll
@g = global i32 42
$ llc t.ll -o t.o -filetype=obj
$ nm t.o
00000000 D _g
There is still a ton of work left. Instructions are not being encoded
yet apparently.
llvm-svn: 95162
the one used by the JIT. Remove all forms of
addPassesToEmitFileFinish except the one used by the static
code generator. Inline the remaining version of
addPassesToEmitFileFinish into its only caller.
llvm-svn: 95109
type is the same as the element type of the vector. EXTRACT_VECTOR_ELT can
be used to extended the width of an integer type. This fixes a bug for
Generic/vector-casts.ll on a ppc750.
llvm-svn: 94990
"visit*" method is called, take the newly created nodes, walk them in a DFS
fashion, and if they don't have an ordering set, then give it one.
llvm-svn: 94757
This allows code gen and the exception table writer to cooperate to make sure
landing pads are associated with the correct invoke locations.
llvm-svn: 94726
Move the X86 implementation of function body emission up to
AsmPrinter::EmitFunctionBody, which works by calling the virtual
EmitInstruction method.
llvm-svn: 94716
Target independent isel should always pass along the "tail call" property. Change
target hook LowerCall's parameter "isTailCall" into a refernce. If the target
decides it's impossible to honor the tail call request, it should set isTailCall
to false to make target independent isel happy.
llvm-svn: 94626
change is that we now use ".linkonce discard" for global variables
instead of ".linkonce samesize". These should be the same, just less
strict. If anyone is interested in mcizing MCSection for COFF targets,
this should be easy to fix.
llvm-svn: 94623
Default HasSetDirective to true, since most targets have it.
The targets that claim to not have it probably do, or it is
spelled differently. These include Blackfin, Mips, Alpha, and
PIC16. All of these except pic16 are normal ELF targets, so
they almost certainly have it.
llvm-svn: 94585
which is more convenient, and change getPICJumpTableRelocBaseExpr
to take a MachineFunction to match.
Next, move the X86 code that create a PICBase symbol to
X86TargetLowering::getPICBaseSymbol from
X86MCInstLower::GetPICBaseSymbol, which was an asmprinter specific
library. This eliminates a 'gross hack', and allows us to
implement X86ISelLowering::getPICJumpTableRelocBaseExpr which now
calls it.
This in turn allows us to eliminate the
X86AsmPrinter::printPICJumpTableSetLabel method, which was the
only overload of printPICJumpTableSetLabel.
llvm-svn: 94526
EK_LabelDifference32 kind and the target has .set support. Simplify
X86AsmPrinter::printPICJumpTableSetLabel to make use of recent helpers.
llvm-svn: 94518
* Fixed a reduction bug which occasionally led to infinite-cost (invalid)
register allocation solutions despite the existence finite-cost solutions.
* Significantly reduced memory usage (>50% reduction).
* Simplified a lot of the solver code.
llvm-svn: 94514
MachineFunctionAnalysis dole them out, instead of having
AsmPrinter do both. Have the AsmPrinter::SetupMachineFunction
method set the 'AsmPrinter::MF' variable.
llvm-svn: 94509
1. MachineJumpTableInfo is now created lazily for a function the first time
it actually makes a jump table instead of for every function.
2. The encoding of jump table entries is now described by the
MachineJumpTableInfo::JTEntryKind enum. This enum is determined by the
TLI::getJumpTableEncoding() hook, instead of by lots of code scattered
throughout the compiler that "knows" that jump table entries are always
32-bits in pic mode (for example).
3. The size and alignment of jump table entries is now calculated based on
their kind, instead of at machinefunction creation time.
Future work includes using the EntryKind in more places in the compiler,
eliminating other logic that "knows" the layout of jump tables in various
situations.
llvm-svn: 94470
the '-pre-RA-sched' flag. It actually makes more sense to do it this way. Also,
keep track of the SDNode ordering by default. Eventually, we would like to make
this ordering a way to break a "tie" in the scheduler. However, doing that now
breaks the "CodeGen/X86/abi-isel.ll" test for 32-bit Linux.
llvm-svn: 94308
a .section. Switch to it with SwitchSection.
However, I think that this directive should be safe on any ELF target.
If so, we should hoist it up out of the X86 and SystemZ targets.
llvm-svn: 94298