This new register allocator is initially identical to RegAllocBasic, but it will
receive all of the tricks that RegAllocBasic won't get.
RegAllocGreedy will eventually replace linear scan.
llvm-svn: 121234
zextOrTrunc(), and APSInt methods extend(), extOrTrunc() and new method
trunc(), to be const and to return a new value instead of modifying the
object in place.
llvm-svn: 121120
The StrongPHIElimination pass did not work, and nobody has worked on it for two
years.
A rewrite is underway, so I am leaving this shell pass instead of deleting it
completely.
llvm-svn: 120830
Scan the MachineFunction for DBG_VALUE instructions, and replace them with a
data structure similar to LiveIntervals. The live range of a DBG_VALUE is
determined by propagating it down the dominator tree until a new DBG_VALUE is
found. When a DBG_VALUE lives in a register, its live range is confined to the
live range of the register's value.
LiveDebugVariables runs before coalescing, so DBG_VALUEs are not artificially
extended when registers are joined.
The missing half will recreate DBG_VALUE instructions from the intervals when
register allocation is complete.
The pass is disabled by default. It can be enabled with the temporary command
line option -live-debug-variables.
llvm-svn: 120636
legalization time. Since at legalization time there is no mapping from
SDNode back to the corresponding LLVM instruction and the return
SDNode is target specific, this requires a target hook to check for
eligibility. Only x86 and ARM support this form of sibcall optimization
right now.
rdar://8707777
llvm-svn: 120501
in favor of the widespread llvm style. Capitalize variables and add
newlines for visual parsing. Rename variables for readability.
And other cleanup.
llvm-svn: 120490
This analysis is going to run immediately after LiveIntervals. It will stay
alive during register allocation and keep track of user variables mentioned in
DBG_VALUE instructions.
When the register allocator is moving values between registers and the stack, it
is very hard to keep track of DBG_VALUE instructions. We usually get it wrong.
This analysis maintains a data structure that makes it easy to update DBG_VALUE
instructions.
llvm-svn: 120385
so don't claim they are. They are allocated using DAG.getNode, so attempts
to access MemSDNode fields results in reading off the end of the allocated
memory. This fixes crashes with "llc -debug" due to debug code trying to
print MemSDNode fields for these barrier nodes (since the crashes are not
deterministic, use valgrind to see this). Add some nasty checking to try
to catch this kind of thing in the future.
llvm-svn: 119901
MCStreamer instead of just MCObjectStreamer. Address changes cannot
be as efficient as we have to use DW_LNE_set_addres, but at least
most of the logic is shared.
This will be used so that, with CodeGen still using EmitDwarfLocDirective,
llvm-gcc is able to produce debug_line sections without needing an
assembler that supports .loc.
llvm-svn: 119777
if the extension types were not the same. The result was that if you
fed a select with sext and zext loads, as in the testcase, then it
would get turned into a zext (or sext) of the select, which is wrong
in the cases when it should have been an sext (resp. zext). Reported
and diagnosed by Sebastien Deldon.
llvm-svn: 119728
and testing is easier. A good example is the unknown-location.ll test that
now can just look for ".loc 1 0 0". We also don't use a DW_LNE_set_address for
every address change anymore.
llvm-svn: 119613
and xor. The 32-bit move immediates can be hoisted out of loops by machine
LICM but the isel hacks were preventing them.
Instead, let peephole optimization pass recognize registers that are defined by
immediates and the ARM target hook will fold the immediates in.
Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ
instructions if there are multiple uses. This happens when the 'and' is live
out, machine sink would have sinked the computation and that ends up pessimizing
code. The peephole pass would recognize situations where the 'and' can be
toggled to define CPSR and eliminate the comparison anyway.
2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking
important optimizations.
rdar://8663787, rdar://8241368
llvm-svn: 119548
SrcMgrDiagHandler, we can improve clang diagnostics for inline asm:
instead of reporting them on a source line of the original line,
we can report it on the correct line wherever the string literal came
from. For something like this:
void foo() {
asm("push %rax\n"
".code32\n");
}
we used to get this: (note that the line in t.c isn't helpful)
t.c:4:7: error: warning: ignoring directive for now
asm("push %rax\n"
^
<inline asm>:2:1: note: instantiated into assembly here
.code32
^
now we get:
t.c:5:8: error: warning: ignoring directive for now
".code32\n"
^
<inline asm>:2:1: note: instantiated into assembly here
.code32
^
Note that we're pointing to line 5 properly now.
llvm-svn: 119488
cookie argument to the SourceMgr diagnostic stuff. This cleanly separates
LLVMContext's inlineasm handler from the sourcemgr error handling
definition, increasing type safety and cleaning things up.
llvm-svn: 119486
Always spill the full representative register at any point where any subregister
is live.
This fixes PR8620 which caused the old logic to get confused and not spill
anything at all.
The fundamental problem here is that the coalescer is too aggressive about
physical register coalescing. It sometimes makes it impossible to allocate
registers without these emergency spills.
llvm-svn: 119375
The live range of a register defined by an early clobber starts at the use slot,
not the def slot.
Except when it is an early clobber tied to a use operand. Then it starts at the
def slot like a standard def.
llvm-svn: 119305
live ranges for the spill register are also defined at the use slot instead of
the normal def slot.
This fixes PR8612 for the inline spiller. A use was being allocated to the same
register as a spilled early clobber def.
This problem exists in all the spillers. A fix for the standard spiller is
forthcoming.
llvm-svn: 119182
catastrophic compilation time in the event of unreasonable LLVM
IR. Code quality is a separate issue--someone upstream needs to do a
better job of reducing to llvm.memcpy. If the situation can be reproduced with
any supported frontend, then it will be a separate bug.
llvm-svn: 118904
it makes no sense for allocation_order iterators to visit reserved regs.
The inline spiller depends on AliasAnalysis.
Manage the Query state to avoid uninitialized or stale results.
llvm-svn: 118800
This is the first small step towards using closed intervals for liveness instead
of the half-open intervals we're using now.
We want to be able to distinguish between a SlotIndex that represents a variable
being live-out of a basic block, and an index representing a variable live-in to
its successor.
That requires two separate indexes between blocks. One for live-outs and one for
live-ins.
With this change, getMBBEndIdx(MBB).getPrevSlot() becomes stable so it stays
greater than any instructions inserted at the end of MBB.
llvm-svn: 118747
Whenever splitting wants to insert a copy, it checks if the value can be
rematerialized cheaply instead.
Missing features:
- Delete instructions when all uses have been rematerialized.
- Truncate live ranges to the remaining uses after rematerialization.
llvm-svn: 118702
benchmarks hitting an assertion.
Adds LiveIntervalUnion::collectInterferingVRegs.
Fixes "late spilling" by checking for any unspillable live vregs among
all physReg aliases.
llvm-svn: 118701
handle cases in which a register is unavailable for spill code.
Adds LiveIntervalUnion::extract. While processing interferences on a
live virtual register, reuses the same Query object for each
physcial reg.
llvm-svn: 118423
to perform the copy, which may be of lots of memory [*]. It would be good if the
fall-back code generated something reasonable, i.e. did the copy in a loop, rather
than vast numbers of loads and stores. Add a note about this. Currently target
specific code seems to always kick in so this is more of a theoretical issue rather
than a practical one now that X86 has been fixed.
[*] It's amazing how often people pass mega-byte long arrays by copy...
llvm-svn: 118275
This way, InlineSpiller does the same amount of splitting as the standard
spiller. Splitting should really be guided by the register allocator, and
doesn't belong in the spiller at all.
llvm-svn: 118216
value type, so there is no point in passing it around using
an EVT. Use the simpler MVT everywhere. Rather than trying
to propagate this information maximally in all the code that
using the calling convention stuff, I chose to do a mainly
low impact change instead.
llvm-svn: 118167
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
"optimize for latency". Call instructions don't have the right latency and
this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
not # of micro-ops since multi-latency instructions is completely executed
even when the predicate is false. Also, some instruction will be "slower"
when they are predicated due to the register def becoming implicit input.
rdar://8598427
llvm-svn: 118135
BB#1: derived from LLVM BB %bb.nph28
Live Ins: %AL
Predecessors according to CFG: BB#0
TEST8rr %reg16384<kill>, %reg16384, %EFLAGS<imp-def>; GR8:%reg16384
JNE_4 <BB#2>, %EFLAGS<imp-use,kill>
JMP_4 <BB#2>
Successors according to CFG: BB#2 BB#2
These double CFG edges only ever occur in bugpoint-generated code, so there is
no need to attempt something clever.
llvm-svn: 117992
source, and let rewrite() clean it up.
This way, kill flags on the inserted copies are fixed as well during rewrite().
We can't just assume that all the copies we insert are going to be kills since
critical edges into loop headers sometimes require both source and dest to be
live out of a block.
llvm-svn: 117980
At least X86FloatingPoint requires correct kill flags after register allocation,
and targets using register scavenging benefit. Conservative kill flags are not
enough.
llvm-svn: 117960
at more than those which define CPSR. You can have this situation:
(1) subs ...
(2) sub r6, r5, r4
(3) movge ...
(4) cmp r6, 0
(5) movge ...
We cannot convert (2) to "subs" because (3) is using the CPSR set by
(1). There's an analogous situation here:
(1) sub r1, r2, r3
(2) sub r4, r5, r6
(3) cmp r4, ...
(5) movge ...
(6) cmp r1, ...
(7) movge ...
We cannot convert (1) to "subs" because of the intervening use of CPSR.
llvm-svn: 117950
looks like is happening:
Without the peephole optimizer:
(1) sub r6, r6, #32
orr r12, r12, lr, lsl r9
orr r2, r2, r3, lsl r10
(x) cmp r6, #0
ldr r9, LCPI2_10
ldr r10, LCPI2_11
(2) sub r8, r8, #32
(a) movge r12, lr, lsr r6
(y) cmp r8, #0
LPC2_10:
ldr lr, [pc, r10]
(b) movge r2, r3, lsr r8
With the peephole optimizer:
ldr r9, LCPI2_10
ldr r10, LCPI2_11
(1*) subs r6, r6, #32
(2*) subs r8, r8, #32
(a*) movge r12, lr, lsr r6
(b*) movge r2, r3, lsr r8
(1) is used by (x) for the conditional move at (a). (2) is used by (y) for the
conditional move at (b). After the peephole optimizer, these the flags resulting
from (1*) are ignored and only the flags from (2*) are considered for both
conditional moves.
llvm-svn: 117876
operand and one of them has a single use that is a live out copy, favor the
one that is live out. Otherwise it will be difficult to eliminate the copy
if the instruction is a loop induction variable update. e.g.
BB:
sub r1, r3, #1
str r0, [r2, r3]
mov r3, r1
cmp
bne BB
=>
BB:
str r0, [r2, r3]
sub r3, r3, #1
cmp
bne BB
This fixed the recent 256.bzip2 regression.
llvm-svn: 117675
We don't want unused values forming their own equivalence classes, so we lump
them all together in one class, and then merge them with the class of the last
used value.
llvm-svn: 117670
in SSAUpdaterImpl.h
Verifying live intervals revealed that the old method was completely wrong, and
we need an iterative approach to calculating PHI placemant. Fortunately, we have
MachineDominators available, so we don't have to compute that over and over
like SSAUpdaterImpl.h must.
Live-out values are cached between calls to mapValue() and computed in a greedy
way, so most calls will be working with very small block sets.
Thanks to Bob for explaining how this should work.
llvm-svn: 117599
proper SSA updating.
This doesn't cause MachineDominators to be recomputed since we are already
requiring MachineLoopInfo which uses dominators as well.
llvm-svn: 117598
There are currently 100 references to COFF::IMAGE_SCN in 6 files
and 11 different functions. Section to attribute mapping really
needs to happen in one place to avoid problems like this.
llvm-svn: 117473
Critical edges going into a loop are not as bad as critical exits. We can handle
them by splitting the critical edge, or by having both inside and outside
registers live out of the predecessor.
llvm-svn: 117423
the remainder register.
Example:
bb0:
x = 1
bb1:
use(x)
...
x = 2
jump bb1
When x is isolated in bb1, the inner part breaks into two components, x1 and x2:
bb0:
x0 = 1
bb1:
x1 = x0
use(x1)
...
x2 = 2
x0 = x2
jump bb1
llvm-svn: 117408
do not double-count the duplicate instructions by counting once from the
beginning and again from the end. Keep track of where the duplicates from
the beginning ended and don't go past that point when counting duplicates
at the end. Radar 8589805.
This change causes one of the MC/ARM/simple-fp-encoding tests to produce
different (better!) code without the vmovne instruction being tested.
I changed the test to produce vmovne and vmoveq instructions but moving
between register files in the opposite direction. That's not quite the same
but predicated versions of those instructions weren't being tested before,
so at least the test coverage is not any worse, just different.
llvm-svn: 117333
instructions separately from the count of non-predicated instructions. The
instruction count is used in places to determine how many instructions to
copy, predicate, etc. and things get confused if that count includes the
extra cost for microcoded ops.
llvm-svn: 117332
2) live-outs.
Previously the post-RA schedulers completely ignore these dependencies since
returns, branches, etc. are all scheduling barriers. This patch model the
latencies between instructions being scheduled and the barriers. It also
handle calls by marking their register uses.
llvm-svn: 117193