When widening a copy, we are reading a larger register that may not be
live. Use an <undef> flag to tell the register scavenger and machine
code verifier that we know the value isn't defined.
We now widen:
%S6<def> = COPY %S4<kill>, %D3<imp-def>
into:
%D3<def> = VMOVD %D2<undef>, pred:14, pred:%noreg, %S4<imp-use,kill>
This also keeps the <kill> flag on %S4 so we don't inadvertently kill a
live value in %S5.
Finally, ensure that ARMBaseInstrInfo::setExecutionDomain() preserves
the <undef> flag when converting VMOVD to VORR.
llvm-svn: 141746
file. Since it should only be used when necessary propagate it through
the backend code generation and tweak testcases accordingly.
This helps with code like in clang's test/CodeGen/debug-info-line.c where
we have multiple #line directives within a single lexical block and want
to generate only a single block that contains each file change.
Part of rdar://10246360
llvm-svn: 141729
merging an lsl #2 that has multiple uses on A9. This shift is free, so there is
no problem merging it in multiple places. Other unprofitable shifts will not be
merged.
llvm-svn: 141247
Rewriting the entire loop nest now requires -enable-lsr-nested.
See PR11035 for some performance data.
A few unit tests specifically test nested LSR, and are now under a flag.
llvm-svn: 140762
Math is hard, and isScaledConstantInRange() always returned false for
negative constants. It was doing unsigned division of negative numbers
before casting back to signed.
llvm-svn: 140425
Modified ARMISelLowering::AdjustInstrPostInstrSelection to handle the
full gamut of CPSR defs/uses including instructins whose "optional"
cc_out operand is not really optional. This allowed removal of the
hasPostISelHook to simplify the .td files and make the implementation
more robust.
Fixes rdar://10137436: sqlite3 miscompile
llvm-svn: 140134
(The fix for the related failures on x86 is going to be nastier because we actually need Acquire memoperands attached to the atomic load instrs, etc.)
llvm-svn: 139221
Now the 'S' instructions, e.g. ADDS, treat S bit as optional operand as well.
Also fix isel hook to correctly set the optional operand.
rdar://10073745
llvm-svn: 139157
to be unreliable on platforms which require memcpy calls, and it is
complicating broader legalize cleanups. It is hoped that these cleanups
will make memcpy byval easier to implement in the future.
llvm-svn: 138977
- On COFF the .lcomm directive has an alignment argument.
- On ELF we fall back to .local + .comm
Based on a patch by NAKAMURA Takumi.
Fixes PR9337, PR9483 and PR10128.
llvm-svn: 138976
An instruction may define part of a register where the other bits are
undefined. In that case, it is safe to rematerialize the instruction.
For example:
%vreg2:ssub_0<def> = VLDRS <cp#0>, 0, pred:14, pred:%noreg, %vreg2<imp-def>
The extra <imp-def> operand indicates that the instruction does not read
the other parts of the virtual register, so a remat is safe.
This patch simply allows multiple def operands for the virtual register.
It is MI->readsVirtualRegister() that determines if we depend on a
previous value so remat is impossible.
llvm-svn: 138953
An instruction that redefines only part of a larger register can never
be rematerialized since the virtual register value depends on the old
value in other parts of the register.
This was fixed for the inline spiller in r138794. This patch fixes the
problem for all register allocators, and includes a small test case.
<rdar://problem/10032939>
llvm-svn: 138944
register dependency (rather than glue them together). This is general
goodness as it gives scheduler more freedom. However it is motivated by
a nasty bug in isel.
When a i64 sub is expanded to subc + sube.
libcall #1
\
\ subc
\ / \
\ / \
\ / libcall #2
sube
If the libcalls are not serialized (i.e. both have chains which are dag
entry), legalizer can serialize them in arbitrary orders. If it's
unlucky, it can force libcall #2 before libcall #1 in the above case.
subc
|
libcall #2
|
libcall #1
|
sube
However since subc and sube are "glued" together, this ends up being a
cycle when the scheduler combine subc and sube as a single scheduling
unit.
The right solution is to fix LegalizeType too chains the libcalls together.
However, LegalizeType is not processing nodes in order so that's harder than
it should be. For now, the move to physical register dependency will do.
rdar://10019576
llvm-svn: 138791
I don't really like the patterns, but I'm having trouble coming up with a
better way to handle them.
I plan on making other targets use the same legalization
ARM-without-memory-barriers is using... it's not especially efficient, but
if anyone cares, it's not that hard to fix for a given target if there's
some better lowering.
llvm-svn: 138621
Apparently we never added code to expand these pseudo instructions, and in
over a year, no one has noticed. Our register allocator must be awesome!
llvm-svn: 137551
Coalescing can remove copy-like instructions with sub-register operands
that constrained the register class. Examples are:
x86: GR32_ABCD:sub_8bit_hi -> GR32
arm: DPR_VFP2:ssub0 -> DPR
Recompute the register class of any virtual registers that are used by
less instructions after coalescing.
This affects code generation for the Cortex-A8 where we use NEON
instructions for f32 operations, c.f. fp_convert.ll:
vadd.f32 d16, d1, d0
vcvt.s32.f32 d0, d16
The register allocator is now free to use d16 for the temporary, and
that comes first in the allocation order because it doesn't interfere
with any s-registers.
llvm-svn: 137133
This hidden llc option runs the machine code verifier after expanding
ARM pseudo-instructions, but before if-conversion.
The machine code verifier is much better at pointing out liveness errors
that can trip up the register scavenger.
llvm-svn: 136439
Code like that would only be produced by bugpoint, but we should still
handle it correctly.
When a register is defined by a REG_SEQUENCE of undefs, the register
itself is undef. Previously, we would create a register with uses but no
defs.
Fixes part of PR10520.
llvm-svn: 136401
When splitting a live range immediately before an LDR_POST instruction
that redefines the address register, make sure to use the correct value
number in leaveIntvBefore.
We need the value number entering the instruction.
<rdar://problem/9793765>
llvm-svn: 135413
if (x != 0) x = 1
if (x == 1) x = 1
Previous codegen looks like this:
mov r1, r0
cmp r1, #1
mov r0, #0
moveq r0, #1
The naive lowering select between two different values. It should recognize the
test is equality test so it's more a conditional move rather than a select:
cmp r0, #1
movne r0, #0
rdar://9758317
llvm-svn: 135017
Print shifted immediate values directly rather than as a payload+shifter
value pair. This makes for more readable output assembly code, simplifies
the instruction printer, and is consistent with how Thumb immediates are
displayed.
llvm-svn: 134902
RAGreedy::tryAssign will now evict interference from the preferred
register even when another register is free.
To support this, add the EvictionCost struct that counts how many hints
are broken by an eviction. We don't want to break one hint just to
satisfy another.
Rename canEvict to shouldEvict, and add the first bit of eviction policy
that doesn't depend on spill weights: Always make room in the preferred
register as long as the evictees can be split and aren't already
assigned to their preferred register.
Also make the CSR avoidance more accurate. When looking for a cheaper
register it is OK to use a new volatile register. Only CSR aliases that
have never been used before should be avoided.
llvm-svn: 134735
already makes the assumption, which is correct on ARM, that a type's alignment is
less than its alloc size. This improves codegen with Clang (which inserts a lot of
extraneous alignment specifiers) and fixes <rdar://problem/9695089>.
llvm-svn: 134106
instructions can be used to match combinations of multiply/divide and VCVT
(between floating-point and integer, Advanced SIMD). Basically the VCVT
immediate operand that specifies the number of fraction bits corresponds to a
floating-point multiply or divide by the corresponding power of 2.
For example, VCVT (floating-point to fixed-point, Advanced SIMD) can replace a
combination of VMUL and VCVT (floating-point to integer) as follows:
Example (assume d17 = <float 8.000000e+00, float 8.000000e+00>):
vmul.f32 d16, d17, d16
vcvt.s32.f32 d16, d16
becomes:
vcvt.s32.f32 d16, d16, #3
Similarly, VCVT (fixed-point to floating-point, Advanced SIMD) can replace a
combinations of VCVT (integer to floating-point) and VDIV as follows:
Example (assume d17 = <float 8.000000e+00, float 8.000000e+00>):
vcvt.f32.s32 d16, d16
vdiv.f32 d16, d17, d16
becomes:
vcvt.f32.s32 d16, d16, #3
llvm-svn: 133813
1. (((x) & 0xFF00) >> 8) | (((x) & 0x00FF) << 8)
=> (bswap x) >> 16
2. ((x&0xff)<<8)|((x&0xff00)>>8)|((x&0xff000000)>>8)|((x&0x00ff0000)<<8))
=> (rotl (bswap x) 16)
This allows us to eliminate most of the def : Pat patterns for ARM rev16
revsh instructions. It catches many more cases for ARM and x86.
rdar://9609108
llvm-svn: 133503
for pre-2.9 bitcode files. We keep x86 unaligned loads, movnt, crc32, and the
target indep prefetch change.
As usual, updating the testsuite is a PITA.
llvm-svn: 133337