The live range of a register defined by an early clobber starts at the use slot,
not the def slot.
Except when it is an early clobber tied to a use operand. Then it starts at the
def slot like a standard def.
llvm-svn: 119305
live ranges for the spill register are also defined at the use slot instead of
the normal def slot.
This fixes PR8612 for the inline spiller. A use was being allocated to the same
register as a spilled early clobber def.
This problem exists in all the spillers. A fix for the standard spiller is
forthcoming.
llvm-svn: 119182
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
"optimize for latency". Call instructions don't have the right latency and
this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
not # of micro-ops since multi-latency instructions is completely executed
even when the predicate is false. Also, some instruction will be "slower"
when they are predicated due to the register def becoming implicit input.
rdar://8598427
llvm-svn: 118135
at more than those which define CPSR. You can have this situation:
(1) subs ...
(2) sub r6, r5, r4
(3) movge ...
(4) cmp r6, 0
(5) movge ...
We cannot convert (2) to "subs" because (3) is using the CPSR set by
(1). There's an analogous situation here:
(1) sub r1, r2, r3
(2) sub r4, r5, r6
(3) cmp r4, ...
(5) movge ...
(6) cmp r1, ...
(7) movge ...
We cannot convert (1) to "subs" because of the intervening use of CPSR.
llvm-svn: 117950
- For now, loads of [r, r] addressing mode is the same as the
[r, r lsl/lsr/asr #] variants. ARMBaseInstrInfo::getOperandLatency() should
identify the former case and reduce the output latency by 1.
- Also identify [r, r << 2] case. This special form of shifter addressing mode
is "free".
llvm-svn: 117519
elements than the result vector type. So, when an instruction like:
%8 = shufflevector <2 x float> %4, <2 x float> %7, <4 x i32> <i32 1, i32 0, i32 3, i32 2>
is translated to a DAG, each operand is changed to a concat_vectors node that appends 2 undef elements. That is:
shuffle [a,b], [c,d] is changed to:
shuffle [a,b,u,u], [c,d,u,u]
That's probably the right thing for x86 but for NEON, we'd much rather have:
shuffle [a,b,c,d], undef
Teach the DAG combiner how to do that transformation for ARM. Radar 8597007.
llvm-svn: 117482
do not double-count the duplicate instructions by counting once from the
beginning and again from the end. Keep track of where the duplicates from
the beginning ended and don't go past that point when counting duplicates
at the end. Radar 8589805.
This change causes one of the MC/ARM/simple-fp-encoding tests to produce
different (better!) code without the vmovne instruction being tested.
I changed the test to produce vmovne and vmoveq instructions but moving
between register files in the opposite direction. That's not quite the same
but predicated versions of those instructions weren't being tested before,
so at least the test coverage is not any worse, just different.
llvm-svn: 117333