pipeline stall. It's useful for targets like ARM cortex-a8. NEON has a lot
of long latency instructions so a strict register pressure reduction
scheduler does not work well.
Early experiments show this speeds up some NEON loops by over 30%.
llvm-svn: 104216
the variable actually tracks.
N.B., several back-ends are using "HasCalls" as being synonymous for something
that adjusts the stack. This isn't 100% correct and should be looked into.
llvm-svn: 103802
code, and to eliminate the need for the SelectionDAGBuilder
state to be live during CodeGenAndEmitDAG calls.
Call SDB->clear() before CodeGenAndEmitDAG calls instead of
before it, and move the CurDAG->clear() out of SelectionDAGBuilder,
which doesn't own the DAG, and into CodeGenAndEmitDAG.
llvm-svn: 102814
FunctionLoweringInfo, as it isn't SelectionDAG-specific. This isn't
completely natural, as PHI node state is not per-function but rather
per-basic-block, however there's currently no other convenient
per-basic-block state to group it with.
llvm-svn: 102109
into SelectionDAGBuilder. This avoids a separate pass over the
instructions, and has the side effect of providing debug location
information to the copy.
llvm-svn: 101906
const_casts, and it reinforces the design of the Target classes being
immutable.
SelectionDAGISel::IsLegalToFold is now a static member function, because
PIC16 uses it in an unconventional way. There is more room for API
cleanup here.
And PIC16's AsmPrinter no longer uses TargetLowering.
llvm-svn: 101635
1. Introduce some enums and accessors in the InlineAsm class
that eliminate a ton of magic numbers when handling inline
asm SDNode.
2. Add a new MDNodeSDNode selection dag node type that holds
a MDNode (shocking!)
3. Add a new argument to ISD::INLINEASM nodes that hold !srcloc
metadata, propagating it to the instruction emitter, which
drops it.
No functionality change.
llvm-svn: 100605
representation. This eliminates the 'DILocation' MDNodes for
file/line/col tuples from -O0 -g codegen.
This remove the old DebugLoc class, making it a typedef for DebugLoc,
I'll rename NewDebugLoc next.
I didn't update the JIT to use the new apis, so it will continue to
work, but be as slow as before. Someone should eventually do this
or, better yet, rip out the JIT debug info stuff and build the JIT
on top of MC.
llvm-svn: 100209
and those derived from them. These are obnoxious because
they were written as: PatLeaf<(bitconvert). Not having an
argument was foiling adding better type checking for operand
count matching up with what was required (in this case,
bitconvert always requires an operand!)
llvm-svn: 99759
bytes instead of one byte. This is important because
we're running up to too many opcodes to fit in a byte
and it is aggrevated by FIRST_TARGET_MEMORY_OPCODE
making the numbering sparse. This just bites the
bullet and bloats out the table. In practice, this
increases the size of the x86 isel table from 74.5K
to 76K. I think we'll cope :)
This fixes rdar://7791648
llvm-svn: 99494
an MCSymbol. Make the EH_LABEL MachineInstr hold its label
with an MCSymbol instead of ID. Fix a bug in MMI.cpp which
would return labels named "Label4" instead of "label4".
llvm-svn: 98463
node which has a flag. That flag in turn was used by an
already-selected adde which turned into an ADC32ri8 which
used a selected load which was chained to the load we
folded. This flag use caused us to form a cycle. Fix
this by not ignoring chains in IsLegalToFold even in
cases where the isel thinks it can.
llvm-svn: 97791
as the very last thing before node emission. This should
dramatically reduce the number of times we do 'MatchAddress'
on X86, speeding up compile time. This also improves comments
in the tables and shrinks the table a bit, now down to
80506 bytes for x86.
llvm-svn: 97703
entry we're about to process is obviously going to fail, don't
bother pushing a scope only to have it immediately be popped.
This avoids a lot of scope stack traffic in common cases.
Unfortunately, this requires duplicating some of the predicate
dispatch. To avoid duplicating the actual logic I pulled each
predicate out to its own static function which gets used in
both places.
llvm-svn: 97651
SwitchOpcodeMatcher) and have DAGISelMatcherOpt form it. This
speeds up selection, particularly for X86 which has lots of
variants of instructions with only type differences.
llvm-svn: 97645
CopyToReg/CopyFromReg/INLINEASM. These are annoying because
they have the same opcode before an after isel. Fix this by
setting their NodeID to -1 to indicate that they are selected,
just like what automatically happens when selecting things that
end up being machine nodes.
With that done, give IsLegalToFold a new flag that causes it to
ignore chains. This lets the HandleMergeInputChains routine be
the one place that validates chains after a match is successful,
enabling the new hotness in chain processing. This smarter
chain processing eliminates the need for "PreprocessRMW" in the
X86 and MSP430 backends and enables MSP to start matching it's
multiple mem operand instructions more aggressively.
I currently #if out the dead code in the X86 backend and MSP
backend, I'll remove it for real in a follow-on patch.
The testcase changes are:
test/CodeGen/X86/sse3.ll: we generate better code
test/CodeGen/X86/store_op_load_fold2.ll: PreprocessRMW was
miscompiling this before, we now generate correct code
Convert it to filecheck while I'm at it.
test/CodeGen/MSP430/Inst16mm.ll: Add a testcase for mem/mem
folding to make anton happy. :)
llvm-svn: 97596
was that we weren't properly handling the case when interior
nodes of a matched pattern become dead after updating chain
and flag uses. Now we handle this explicitly in
UpdateChainsAndFlags.
llvm-svn: 97561
DoInstructionSelection. Inline "SelectRoot" into it from DAGISelHeader.
Sink some other stuff out of DAGISelHeader into SDISel.
Eliminate the various 'Indent' stuff from various targets, which dates
to when isel was recursive.
17 files changed, 114 insertions(+), 430 deletions(-)
llvm-svn: 97555
stuff now that we don't care about emulating the old broken
behavior of the old isel. This eliminates the
'CheckChainCompatible' check (along with IsChainCompatible) which
did an incorrect and inefficient scan *up* the chain nodes which
happened as the pattern was being formed and does the validation
at the end in HandleMergeInputChains when it forms a structural
pattern. This scans "down" the graph, which means that it is
quickly bounded by nodes already selected. This also handles
token factors that get "trapped" in the dag.
Removing the CheckChainCompatible nodes also shrinks the
generated tables by about 6K for X86 (down to 83K).
There are two pieces remaining before I can nuke PreprocessRMW:
1. I xfailed a test because we're now producing worse code in a
case that has nothing to do with the change: it turns out that
our use of MorphNodeTo will leave dead nodes in the graph
which (depending on how the graph is walked) end up causing
bogus uses of chains and blocking matches. This is really
bad for other reasons, so I'll fix this in a follow-up patch.
2. CheckFoldableChainNode needs to be improved to handle the TF.
llvm-svn: 97539