After all hard work to implement the EHABI and with the test-suite
passing, it's time to turn it on by default and allow users to
disable it as a work-around while we fix the eventual bugs that show
up.
This commit also remove the -arm-enable-ehabi-descriptors, since we
want the tables to be printed every time the EHABI is turned on
for non-Darwin ARM targets.
Although MCJIT EHABI is not working yet (needs linking with the right
libraries), this commit also fixes some relocations on MCJIT regarding
the EH tables/lib calls, and update some tests to avoid using EH tables
when none are needed.
The EH tests in the test-suite that were previously disabled on ARM
now pass with these changes, so a follow-up commit on the test-suite
will re-enable them.
llvm-svn: 200388
Originally, BLX was passed as operand #0 in MachineInstr and as operand
#2 in MCInst. But now, it's operand #2 in both cases.
This patch also removes unnecessary FileCheck in the test case added by r199127.
llvm-svn: 199928
The issue is caused when Post-RA scheduler reorders a bundle instruction
(IT block). However, it only flips the CPSR liveness of the bundle instruction,
leaves the instructions inside the bundle unchanged, which causes inconstancy and crashes
Thumb2SizeReduction.cpp::ReduceMBB().
llvm-svn: 199127
ARM symbol variants are written with parens instead of @ like this:
.word __GLOBAL_I_a(target1)
This commit adds support for parsing these symbol variants in
expressions. We introduce a new flag to MCAsmInfo that indicates the
parser should use parens to parse the symbol variant. The expression
parser is modified to look for symbol variants using parens instead
of @ when the corresponding MCAsmInfo flag is true.
The MCAsmInfo parens flag is enabled only for ARM on ELF.
By adding this flag to MCAsmInfo, we are able to get rid of
redundant ARM-specific symbol variants and use the generic variants
instead (e.g. VK_GOT instead of VK_ARM_GOT). We use the new
UseParensForSymbolVariant attribute in MCAsmInfo to correctly print
the symbol variants for arm.
To achive this we need to keep a handle to the MCAsmInfo in the
MCSymbolRefExpr class that we can check when printing the symbol
variant.
Updated Tests:
Changed case of symbol variant to match the generic kind.
test/CodeGen/ARM/tls-models.ll
test/CodeGen/ARM/tls1.ll
test/CodeGen/ARM/tls2.ll
test/CodeGen/Thumb2/tls1.ll
test/CodeGen/Thumb2/tls2.ll
PR18080
llvm-svn: 196424
By default, the behavior of IT block generation will be determinated
dynamically base on the arch (armv8 vs armv7). This patch adds backend
options: -arm-restrict-it and -arm-no-restrict-it. The former one
restricts the generation of IT blocks (the same behavior as thumbv8) for
both arches. The later one allows the generation of legacy IT block (the
same behavior as ARMv7 Thumb2) for both arches.
Clang will support -mrestrict-it and -mno-restrict-it, which is
compatible with GCC.
llvm-svn: 194592
Per original comment, the intention of this loop
is to go ahead and break the critical edge
(in order to sink this instruction) if there's
reason to believe doing so might "unblock" the
sinking of additional instructions that define
registers used by this one. The idea is that if
we have a few instructions to sink "together"
breaking the edge might be worthwhile.
This commit makes a few small changes
to help better realize this goal:
First, modify the loop to ignore registers
defined by this instruction. We don't
sink definitions of physical registers,
and sinking an SSA definition isn't
going to unblock an upstream instruction.
Second, ignore uses of physical registers.
Instructions that define physical registers are
rejected for sinking, and so moving this one
won't enable moving any defining instructions.
As an added bonus, while virtual register
use-def chains are generally small due
to SSA goodness, iteration over the uses
and definitions (used by hasOneNonDBGUse)
for physical registers like EFLAGS
can be rather expensive in practice.
(This is the original reason for looking at this)
Finally, to keep things simple continue
to only consider this trick for registers that
have a single use (via hasOneNonDBGUse),
but to avoid spuriously breaking critical edges
only do so if the definition resides
in the same MBB and therefore this one directly
blocks it from being sunk as well.
If sinking them together is meant to be,
let the iterative nature of this pass
sink the definition into this block first.
Update tests to accomodate this change,
add new testcase where sinking avoids pipeline stalls.
llvm-svn: 192608
I'd forgotten that "Requires" blocks override rather than add to the
constraints, so my pseudo-instruction was being selected in Thumb mode leading
to nonsense instructions.
rdar://problem/14817358
llvm-svn: 189096
Back in the mists of time (2008), it seems TableGen couldn't handle the
patterns necessary to match ARM's CMOV node that we convert select operations
to, so we wrote a lot of fairly hairy C++ to do it for us.
TableGen can deal with it now: there were a few minor differences to CodeGen
(see tests), but nothing obviously worse that I could see, so we should
probably address anything that *does* come up in a localised manner.
llvm-svn: 188995
- Instead of setting the suffixes in a bunch of places, just set one master
list in the top-level config. We now only modify the suffix list in a few
suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).
- Aside from removing the need for a bunch of lit.local.cfg files, this enables
4 tests that were inadvertently being skipped (one in
Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
XFAILED).
- This commit also fixes a bunch of config files to use config.root instead of
older copy-pasted code.
llvm-svn: 188513
indirect branches correctly. Under some circumstances, this led to the deletion
of basic blocks that were the destination of indirect branches. In that case it
left indirect branches to nowhere in the code.
This patch replaces, and is more general than either of the previous fixes for
indirect-branch-analysis issues, r181161 and r186461.
For other branches (not indirect) this refactor should have *almost* identical
behavior to the previous version. There are some corner cases where this
refactor is able to analyze blocks that the previous version could not (e.g.
this necessitated the update to thumb2-ifcvt2.ll).
<rdar://problem/14464830>
llvm-svn: 186735
All changes were made by the following bash script:
find test/CodeGen -name "*.ll" | \
while read NAME; do
echo "$NAME"
grep -q "^; *RUN: *llc.*debug" $NAME && continue
grep -q "^; *RUN:.*llvm-objdump" $NAME && continue
grep -q "^; *RUN: *opt.*" $NAME && continue
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\([A-Za-z0-9_-]*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC[:]* *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
done
sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
mv $TEMP $NAME
done
This script catches a superset of the cases caught by the script associated with commit r186280. It initially found some false positives due to unusual constructs in a minority of tests; all such cases were disambiguated first in commit r186621.
llvm-svn: 186624
This update was done with the following bash script:
find test/CodeGen -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
done
sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
mv $TEMP $NAME
fi
done
llvm-svn: 186280
This was done with the following sed invocation to catch label lines demarking function boundaries:
sed -i '' "s/^;\( *\)\([A-Z0-9_]*\):\( *\)test\([A-Za-z0-9_-]*\):\( *\)$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen/*/*.ll
which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct.
llvm-svn: 186258
Propagate the fix from r185712 to Thumb2 codegen as well. Original
commit message applies here as well:
A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and
packs them in the bottom half of "x". An arithmetic and logic shift are
only equivalent in this context if the shift amount is 16. We would be
shifting in ones into the bottom 16bits instead of zeros if "y" is
negative.
rdar://14338767
llvm-svn: 185982
ARM FastISel is currently only enabled for iOS non-Thumb1, and I'm working on
enabling it for other targets. As a first step I've fixed some of the tests.
Changes to ARM FastISel tests:
- Different triples don't generate the same relocations (especially
movw/movt versus constant pool loads). Use a regex to allow either.
- Mangling is different. Use a regex to allow either.
- The reserved registers are sometimes different, so registers get
allocated in a different order. Capture the names only where this
occurs.
- Add -verify-machineinstrs to some tests where it works. It doesn't
work everywhere it should yet.
- Add -fast-isel-abort to many tests that didn't have it before.
- Split out the VarArg test from fast-isel-call.ll into its own
test. This simplifies test setup because of --check-prefix.
Patch by JF Bastien
llvm-svn: 181801
The VDUP instruction source register doesn't allow a non-constant lane
index, so make sure we don't construct a ARM::VDUPLANE node asking it to
do so.
rdar://13328063
http://llvm.org/bugs/show_bug.cgi?id=13963
llvm-svn: 176413
This fixes some problems with too conservative checking where we were
marking all aliases of a register as used, and then also checking all
aliases when allocating a register.
<rdar://problem/13249625>
llvm-svn: 175782
When creating an allocation hint for a register pair, make sure the hint
for the physical register reference is still in the allocation order.
rdar://13240556
llvm-svn: 175541
These tests in particular try to use escaped square brackets as an
argument to grep, which is failing for me with native win32 python. It
appears the backslash is being lost near the CreateProcess*() call.
llvm-svn: 173506
are more expensive than the non-flag setting variant. Teach thumb2 size
reduction pass to avoid generating them unless we are optimizing for size.
rdar://12892707
llvm-svn: 170728
into a sbc with a positive number, the immediate should be complemented, not
negated. Also added a missing pattern for ARM codegen.
rdar://12559385
llvm-svn: 166613
aligned address. Based on patch by David Peixotto.
Also use vld1.64 / vst1.64 with 128-bit alignment to take advantage of alignment
hints. rdar://12090772, rdar://12238782
llvm-svn: 164089
Now that it is possible to dynamically tie MachineInstr operands,
predicated instructions are possible in SSA form:
%vreg3<def> = SUBri %vreg1, -2147483647, pred:14, pred:%noreg, %opt:%noreg
%vreg4<def,tied1> = MOVCCr %vreg3<tied0>, %vreg1, %pred:12, pred:%CPSR
Becomes a predicated SUBri with a tied imp-use:
SUBri %vreg1, -2147483647, pred:13, pred:%CPSR, opt:%noreg, %vreg1<imp-use,tied0>
This means that any instruction that is safe to move can be folded into
a MOVCC, and the *CC pseudo-instructions are no longer needed.
The test case changes reflect that Thumb2SizeReduce recognizes the
predicated instructions. It didn't understand the pseudos.
llvm-svn: 163274
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.
Bug 12213
Patch by Yin Ma!
llvm-svn: 163136
It is not my plan to duplicate the entire ARM instruction set with
predicated versions. We need a way of representing predicated
instructions in SSA form without requiring a separate opcode.
Then the pseudo-instructions can go away.
llvm-svn: 162061
The ARM select instructions are just predicated moves. If the select is
the only use of an operand, the instruction defining the operand can be
predicated instead, saving one instruction and decreasing register
pressure.
This implementation can turn AND/ORR/EOR instructions into their
corresponding ANDCC/ORRCC/EORCC variants. Ideally, we should be able to
predicate any instruction, but we don't yet support predicated
instructions in SSA form.
llvm-svn: 161994
another mechanical change accomplished though the power of terrible Perl
scripts.
I have manually switched some "s to 's to make escaping simpler.
While I started this to fix tests that aren't run in all configurations,
the massive number of tests is due to a really frustrating fragility of
our testing infrastructure: things like 'grep -v', 'not grep', and
'expected failures' can mask broken tests all too easily.
Essentially, I'm deeply disturbed that I can change the testsuite so
radically without causing any change in results for most platforms. =/
llvm-svn: 159547
through my perl nets.
With this, the test suite passes even if I force it to run with the
built-in shell test logic, except for a test which REQUIREs shell.
llvm-svn: 159529
This was done through the aid of a terrible Perl creation. I will not
paste any of the horrors here. Suffice to say, it require multiple
staged rounds of replacements, state carried between, and a few
nested-construct-parsing hacks that I'm not proud of. It happens, by
luck, to be able to deal with all the TCL-quoting patterns in evidence
in the LLVM test suite.
If anyone is maintaining large out-of-tree test trees, feel free to poke
me and I'll send you the steps I used to convert things, as well as
answer any painful questions etc. IRC works best for this type of thing
I find.
Once converted, switch the LLVM lit config to use ShTests the same as
Clang. In addition to being able to delete large amounts of Python code
from 'lit', this will also simplify the entire test suite and some of
lit's architecture.
Finally, the test suite runs 33% faster on Linux now. ;]
For my 16-hardware-thread (2x 4-core xeon e5520): 36s -> 24s
llvm-svn: 159525
Live ranges with a constrained register class may benefit from splitting
around individual uses. It allows the remaining live range to use a
larger register class where it may allocate. This is like spilling to a
different register class.
This is only attempted on constrained register classes.
<rdar://problem/11438902>
llvm-svn: 157354
Use a dedicated MachO load command to annotate data-in-code regions.
This is the same format the linker produces for final executable images,
allowing consistency of representation and use of introspection tools
for both object and executable files.
Data-in-code regions are annotated via ".data_region"/".end_data_region"
directive pairs, with an optional region type.
data_region_directive := ".data_region" { region_type }
region_type := "jt8" | "jt16" | "jt32" | "jta32"
end_data_region_directive := ".end_data_region"
The previous handling of ARM-style "$d.*" labels was broken and has
been removed. Specifically, it didn't handle ARM vs. Thumb mode when
marking the end of the section.
rdar://11459456
llvm-svn: 157062
This option has been disabled for a while, and it is going away so I can
clean up the coalescer code.
The tests that required physreg joining to be enabled were almost all of
the form "tiny function with interference between arguments and return
value". Such functions are usually inlined in the real world.
The problem exposed by phys_subreg_coalesce-3.ll is real, but fairly
rare.
llvm-svn: 157027
the feature set of v7a. This comes about if the user specifies something like
-arch armv7 -mcpu=cortex-m3. We shouldn't be generating instructions such as
uxtab in this case.
rdar://11318438
llvm-svn: 155601
both fallthrough and a conditional branch target the same successor.
Gracefully delete the conditional branch and introduce any unconditional
branch needed to reach the actual successor. This fixes memory
corruption in 2009-06-15-RegScavengerAssert.ll and possibly other tests.
Also, while I'm here fix a latent bug I spotted by inspection. I never
applied the same fundamental fix to this fallthrough successor finding
logic that I did to the logic used when there are no conditional
branches. As a consequence it would have selected landing pads had they
be aligned in just the right way here. I don't have a test case as
I spotted this by inspection, and the previous time I found this
required have of TableGen's source code to produce it. =/ I hate backend
bugs. ;]
Thanks to Jim Grosbach for helping me reason through this and reviewing
the fix.
llvm-svn: 154867
This is mostly to test the waters. I'd like to get results from FNT
build bots and other bots running on non-x86 platforms.
This feature has been pretty heavily tested over the last few months by
me, and it fixes several of the execution time regressions caused by the
inlining work by preventing inlining decisions from radically impacting
block layout.
I've seen very large improvements in yacr2 and ackermann benchmarks,
along with the expected noise across all of the benchmark suite whenever
code layout changes. I've analyzed all of the regressions and fixed
them, or found them to be impossible to fix. See my email to llvmdev for
more details.
I'd like for this to be in 3.1 as it complements the inliner changes,
but if any failures are showing up or anyone has concerns, it is just
a flag flip and so can be easily turned off.
I'm switching it on tonight to try and get at least one run through
various folks' performance suites in case SPEC or something else has
serious issues with it. I'll watch bots and revert if anything shows up.
llvm-svn: 154816
LSR always tries to make the ICmp in the loop latch use the incremented
induction variable. This allows the induction variable to be kept in a
single register.
When the induction variable limit is equal to the stride,
SimplifySetCC() would break LSR's hard work by transforming:
(icmp (add iv, stride), stride) --> (cmp iv, 0)
This forced us to use lea for the IC update, preventing the simpler
incl+cmp.
<rdar://problem/7643606>
<rdar://problem/11184260>
llvm-svn: 154119
When an strd instruction doesn't get the registers it wants, it can be
expanded into two str instructions. Make sure the first str doesn't kill
the base register in the case where the base and data registers are
identical:
t2STRi12 %R0<kill>, %R0, 4, pred:14, pred:%noreg
t2STRi12 %R2<kill>, %R0, 8, pred:14, pred:%noreg
<rdar://problem/11101911>
llvm-svn: 153611
The arm_neon intrinsics can create virtual registers from the DPair
register class which allows both even-odd and odd-even D-register pairs.
This fixes PR12389.
llvm-svn: 153603
* Removed test/lib/llvm.exp - it is no longer needed
* Deleted the dg.exp reading code from test/lit.cfg. There are no dg.exp files
left in the test suite so this code is no longer required. test/lit.cfg is
now much shorter and clearer
* Removed a lot of duplicate code in lit.local.cfg files that need access to
the root configuration, by adding a "root" attribute to the TestingConfig
object. This attribute is dynamically computed to provide the same
information as was previously provided by the custom getRoot functions.
* Documented the config.root attribute in docs/CommandGuide/lit.pod
llvm-svn: 153408
When an outgoing call takes more than 2k of arguments on the stack, we
don't allocate that call frame in the prolog, but adjust the stack
pointer immediately before the call instead.
This causes problems with the emergency spill slot because PEI can't
track stack pointer adjustments on the second pass, and if the outgoing
arguments are too big, SP can't be used to reach the emergency spill
slot at all.
Work around these problems by ensuring there is a base or frame pointer
that can be used to access the emergency spill slot.
<rdar://problem/10917166>
llvm-svn: 151604
The tied source operand of tMUL is the second source operand, not the
first like every other two-address thumb instruction. Special case it
in the size reduction pass to make sure we create the tMUL instruction
properly.
llvm-svn: 151315
live across BBs before register allocation. This miscompiled 197.parser
when a cmp + b are optimized to a cbnz instruction even though the CPSR def
is live-in a successor.
cbnz r6, LBB89_12
...
LBB89_12:
ble LBB89_1
The fix consists of two parts. 1) Teach LiveVariables that some unallocatable
registers might be liveouts so don't mark their last use as kill if they are.
2) ARM constantpool island pass shouldn't form cbz / cbnz if the conditional
branch does not kill CPSR.
rdar://10676853
llvm-svn: 148168
This function runs after all constant islands have been placed, and may
shrink some instructions to their 2-byte forms. This can actually cause
some constant pool entries to move out of range because of growing
alignment padding.
Treat instructions that may be shrunk the same as inline asm - they
erode the known alignment bits.
Also reinstate an old assertion in verify(). It is correct now that
basic block offsets include alignments.
Add a single large test case that will hopefully exercise many parts of
the constant island pass.
<rdar://problem/10670199>
llvm-svn: 147885
define physical registers. It's currently very restrictive, only catching
cases where the CE is in an immediate (and only) predecessor. But it catches
a surprising large number of cases.
rdar://10660865
llvm-svn: 147827
Now that canRealignStack() understands frozen reserved registers, it is
safe to use it for aligned spill instructions.
It will only return true if the registers reserved at the beginning of
register allocation allow for dynamic stack realignment.
<rdar://problem/10625436>
llvm-svn: 147579
This patch caused a miscompilation of oggenc because a frame pointer was
suddenly needed halfway through register allocation.
<rdar://problem/10625436>
llvm-svn: 147487
ARM targets with NEON units have access to aligned vector loads and
stores that are potentially faster than unaligned operations.
Add support for spilling the callee-saved NEON registers to an aligned
stack area using 16-byte aligned NEON loads and store.
This feature is off by default, controlled by an -align-neon-spills
command line option.
llvm-svn: 147211
Use the spill slot alignment as well as the local variable alignment to
determine when the stack needs to be realigned. This works now that the
ARM target can always realign the stack by using a base pointer.
Still respect the ARMBaseRegisterInfo::canRealignStack() function
vetoing a realigned stack. Don't use aligned spill code in that case.
llvm-svn: 146997
to finalize MI bundles (i.e. add BUNDLE instruction and computing register def
and use lists of the BUNDLE instruction) and a pass to unpack bundles.
- Teach more of MachineBasic and MachineInstr methods to be bundle aware.
- Switch Thumb2 IT block to MI bundles and delete the hazard recognizer hack to
prevent IT blocks from being broken apart.
llvm-svn: 146542
I followed three heuristics for deciding whether to set 'true' or
'false':
- Everything target independent got 'true' as that is the expected
common output of the GCC builtins.
- If the target arch only has one way of implementing this operation,
set the flag in the way that exercises the most of codegen. For most
architectures this is also the likely path from a GCC builtin, with
'true' being set. It will (eventually) require lowering away that
difference, and then lowering to the architecture's operation.
- Otherwise, set the flag differently dependending on which target
operation should be tested.
Let me know if anyone has any issue with this pattern or would like
specific tests of another form. This should allow the x86 codegen to
just iteratively improve as I teach the backend how to differentiate
between the two forms, and everything else should remain exactly the
same.
llvm-svn: 146370
register dependency (rather than glue them together). This is general
goodness as it gives scheduler more freedom. However it is motivated by
a nasty bug in isel.
When a i64 sub is expanded to subc + sube.
libcall #1
\
\ subc
\ / \
\ / \
\ / libcall #2
sube
If the libcalls are not serialized (i.e. both have chains which are dag
entry), legalizer can serialize them in arbitrary orders. If it's
unlucky, it can force libcall #2 before libcall #1 in the above case.
subc
|
libcall #2
|
libcall #1
|
sube
However since subc and sube are "glued" together, this ends up being a
cycle when the scheduler combine subc and sube as a single scheduling
unit.
The right solution is to fix LegalizeType too chains the libcalls together.
However, LegalizeType is not processing nodes in order so that's harder than
it should be. For now, the move to physical register dependency will do.
rdar://10019576
llvm-svn: 138791
Fix base register type and canonicallize to the "ldm" spelling rather than
"ldmia." Add diagnostics for incorrect writeback token and out-of-range
registers.
llvm-svn: 137986
(including compilation, assembly). Move relocation model Reloc::Model from
TargetMachine to MCCodeGenInfo so it's accessible even without TargetMachine.
llvm-svn: 135468
if (x != 0) x = 1
if (x == 1) x = 1
Previous codegen looks like this:
mov r1, r0
cmp r1, #1
mov r0, #0
moveq r0, #1
The naive lowering select between two different values. It should recognize the
test is equality test so it's more a conditional move rather than a select:
cmp r0, #1
movne r0, #0
rdar://9758317
llvm-svn: 135017
The normal tBX instruction is predicable, so there's no reason the
pseudos for using it as a return shouldn't be. Gives us some nice code-gen
improvements as can be seen by the test changes. In particular, several
tests now have to disable if-conversion because it works too well and defeats
the test.
llvm-svn: 134746
The DSP instructions in the Thumb2 instruction set are an optional extension
in the Cortex-M* archtitecture. When present, the implementation is considered
an "ARMv7E-M implementation," and when not, an "ARMv7-M implementation."
Add a subtarget feature hook for the v7e-m instructions and hook it up. The
cortex-m3 cpu is an example of a v7m implementation, while the cortex-m4 is
a v7e-m implementation.
rdar://9572992
llvm-svn: 134261
Fix a FIXME and allow predication (in Thumb2) for the T1 register to
register MOV instructions. This allows some better codegen with
if-conversion (as seen in the test updates), plus it lays the groundwork
for pseudo-izing the tMOVCC instructions.
llvm-svn: 134197
It's just a t2LDMIA_UPD instruction with extra codegen properties, so it
doesn't need the encoding information. As a side-benefit, we now correctly
recognize for instruction printing as a 'pop' instruction.
llvm-svn: 134173
for pre-2.9 bitcode files. We keep x86 unaligned loads, movnt, crc32, and the
target indep prefetch change.
As usual, updating the testsuite is a PITA.
llvm-svn: 133337
we try to branch to them.
Before we were creating successor lists with duplicated entries. Fixing that
found a bug in isBlockOnlyReachableByFallthrough that would causes it to
return the wrong answer for
-----------
...
jne foo
jmp bar
foo:
----------
llvm-svn: 132882
of reserved registers.
Use RegisterClassInfo in RABasic as well. This slightly changes som
allocation orders because RegisterClassInfo puts CSR aliases last.
llvm-svn: 132581
by non-CMP expressions. The executable test case (129821) would test
this as well, if we had an "-O0 -disable-arm-fast-isel" LLVM-GCC
tester. Alas, the ARM assembly would be very difficult to check with
FileCheck.
The thumb2-cbnz.ll test is affected; it generates larger code (tst.w
vs. cmp #0), but I believe the new version is correct.
rdar://problem/9298790
llvm-svn: 131261
Original message:
Teach MachineCSE how to do simple cross-block CSE involving physregs. This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 .
llvm-svn: 130877
These tests all follow the same pattern:
mov r2, r0
movs r0, #0
$CMP r2, r1
it eq
moveq r0, #1
bx lr
The first 'mov' can be eliminated by rematerializing 'movs r0, #0' below the
test instruction:
$CMP r0, r1
mov.w r0, #0
it eq
moveq r0, #1
bx lr
So far, only physreg coalescing can do that. The register allocators won't yet
split live ranges just to eliminate copies. They can learn, but this particular
problem is not likely to show up in real code. It only appears because r0 is
used for both the function argument and return value.
llvm-svn: 130858
more callee-saved registers and introduce copies. Only allows it if scheduling
a node above calls would end up lessen register pressure.
Call operands also has added ABI restrictions for register allocation, so be
extra careful with hoisting them above calls.
rdar://9329627
llvm-svn: 130245
Fixes Thumb2 ADCS and SBCS lowering: <rdar://problem/9275821>.
t2ADCS/t2SBCS are now pseudo instructions, consistent with ARM, so the
assembly printer correctly prints the 's' suffix.
Fixes Thumb2 adde -> SBC matching to check for live/dead carry flags.
Fixes the internal ARM machine opcode mnemonic for ADCS/SBCS.
Fixes ARM SBC lowering to check for live carry (potential bug).
llvm-svn: 130048
add <rd>, sp, #<imm8>
ldr <rd>, [sp, #<imm8>]
When the offset from sp is multiple of 4 and in range of 0-1020.
This saves code size by utilizing 16-bit instructions.
rdar://9321541
llvm-svn: 129971
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.
Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
llvm-svn: 129421
masks to match inversely for the code as is to work. For the example given
we actually want:
bfi r0, r2, #1, #1
not #0, however, given the way the pattern is written it's not possible
at the moment.
Fixes rdar://9177502
llvm-svn: 128320
Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.
This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.
llvm-svn: 127498
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.
This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.
llvm-svn: 127459
1. Fixed ARM pc adjustment.
2. Fixed dynamic-no-pic codegen
3. CSE of pc-relative load of global addresses.
It's now enabled by default for Darwin.
llvm-svn: 123991
DAG. Disable using "-disable-sched-cycles".
For ARM, this enables a framework for modeling the cpu pipeline and
counting stalls. It also activates several heuristics to drive
scheduling based on the model. Scheduling is inherently imprecise at
this stage, and until spilling is improved it may defeat attempts to
schedule. However, this framework provides greater control over
tuning codegen.
Although the flag is not target-specific, it should have very little
affect on the default scheduler used by x86. The only two changes that
affect x86 are:
- scheduling a high-latency operation bumps the current cycle so independent
operations can have their latency covered. i.e. two independent 4
cycle operations can produce results in 4 cycles, not 8 cycles.
- Two operations with equal register pressure impact and no
latency-based stalls on their uses will be prioritized by depth before height
(height is irrelevant if no stalls occur in the schedule below this point).
llvm-svn: 123971
Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.
The test changes are needed to keep those spill-q tests from testing aligned
spills and restores. If the only aligned stack objects are spill slots, we
no longer realign the stack frame. Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.
llvm-svn: 122995
Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions.
This change made a big difference in the code generated for the
CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing
a fine job, but some instructions that were previously moved outside the loop
are not moved now. It's using fewer VFP registers now, which is generally
a good thing, so I think the estimates for register pressure changed and that
affected the LICM behavior. Since that isn't obviously wrong, I've just
changed the test file. This completes the work for Radar 8711675.
llvm-svn: 121730
state. Previously Thumb2 would restore sp from fp like this:
mov sp, r7
sub, sp, #4
If an interrupt is taken after the 'mov' but before the 'sub', callee-saved
registers might be clobbered by the interrupt handler. Instead, try
restoring directly from sp:
add sp, #4
Or, if necessary (with VLA, etc.) use a scratch register to compute sp and
then restore it:
sub.w r4, r7, #8
mov sp, r7
rdar://8465407
llvm-svn: 119977
Remove movePastCSLoadStoreOps and associated code for simple pointer
increments. Update routines that depended upon other opcodes for save/restore.
Adjust all testcases accordingly.
llvm-svn: 119725
and xor. The 32-bit move immediates can be hoisted out of loops by machine
LICM but the isel hacks were preventing them.
Instead, let peephole optimization pass recognize registers that are defined by
immediates and the ARM target hook will fold the immediates in.
Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ
instructions if there are multiple uses. This happens when the 'and' is live
out, machine sink would have sinked the computation and that ends up pessimizing
code. The peephole pass would recognize situations where the 'and' can be
toggled to define CPSR and eliminate the comparison anyway.
2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking
important optimizations.
rdar://8663787, rdar://8241368
llvm-svn: 119548
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
"optimize for latency". Call instructions don't have the right latency and
this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
not # of micro-ops since multi-latency instructions is completely executed
even when the predicate is false. Also, some instruction will be "slower"
when they are predicated due to the register def becoming implicit input.
rdar://8598427
llvm-svn: 118135
There were a number of issues to fix up here:
* The "device" argument of the llvm.memory.barrier intrinsic should be
used to distinguish the "Full System" domain from the "Inner Shareable"
domain. It has nothing to do with using DMB vs. DSB instructions.
* The compiler should never need to emit DSB instructions. Remove the
ARMISD::SYNCBARRIER node and also remove the instruction patterns for DSB.
* Merge the separate DMB/DSB instructions for options only used for the
disassembler with the default DMB/DSB instructions. Add the default
"full system" option ARM_MB::SY to the ARM_MB::MemBOpt enum.
* Add a separate ARMISD::MEMBARRIER_MCR node for subtargets that implement
a data memory barrier using the MCR instruction.
* Fix up encodings for these instructions (except MCR).
I also updated the tests and added a few new ones to check for DMB options
that were not currently being exercised.
llvm-svn: 117756
operand and one of them has a single use that is a live out copy, favor the
one that is live out. Otherwise it will be difficult to eliminate the copy
if the instruction is a loop induction variable update. e.g.
BB:
sub r1, r3, #1
str r0, [r2, r3]
mov r3, r1
cmp
bne BB
=>
BB:
str r0, [r2, r3]
sub r3, r3, #1
cmp
bne BB
This fixed the recent 256.bzip2 regression.
llvm-svn: 117675
- Initial register pressure in the loop should be all the live defs into the
loop. Not just those from loop preheader which is often empty.
- When an instruction is hoisted, update register pressure from loop preheader
to the original BB.
- Treat only use of a virtual register as kill since the code is still SSA.
llvm-svn: 116956
"long latency" enough to hoist even if it may increase spilling. Reloading
a value from spill slot is often cheaper than performing an expensive
computation in the loop. For X86, that means machine LICM will hoist
SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON
instructions.
- Enable register pressure aware machine LICM by default.
llvm-svn: 116781
callee-saved registers at the end of the lists. Also prefer to avoid using
the low registers that are in register subclasses required by certain
instructions, so that those registers will more likely be available when needed.
This change makes a huge improvement in spilling in some cases. Thanks to
Jakob for helping me realize the problem.
Most of this patch is fixing the testsuite. There are quite a few places
where we're checking for specific registers. I changed those to wildcards
in places where that doesn't weaken the tests. The spill-q.ll and
thumb2-spill-q.ll tests stopped spilling with this change, so I added a bunch
of live values to force spills on those tests.
llvm-svn: 116055
between the high and low registers for prologue/epilogue code. This was
a Darwin-only thing that wasn't providing a realistic benefit anymore.
Combining the save areas simplifies the compiler code and results in better
ARM/Thumb2 codegen.
For example, previously we would generate code like:
push {r4, r5, r6, r7, lr}
add r7, sp, #12
stmdb sp!, {r8, r10, r11}
With this change, we combine the register saves and generate:
push {r4, r5, r6, r7, r8, r10, r11, lr}
add r7, sp, #12
rdar://8445635
llvm-svn: 114340
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.
llvm-svn: 113570
"For ARM stack frames that utilize variable sized objects and have either
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs."
r112986 fixed a latent bug exposed by the above.
llvm-svn: 112989
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs.
rdar://7352504
rdar://8374540
rdar://8355680
llvm-svn: 112883
int x(int t) {
if (t & 256)
return -26;
return 0;
}
We generate this:
tst.w r0, #256
mvn r0, #25
it eq
moveq r0, #0
while gcc generates this:
ands r0, r0, #256
it ne
mvnne r0, #25
bx lr
Scandalous really!
During ISel time, we can look for this particular pattern. One where we have a
"MOVCC" that uses the flag off of a CMPZ that itself is comparing an AND
instruction to 0. Something like this (greatly simplified):
%r0 = ISD::AND ...
ARMISD::CMPZ %r0, 0 @ sets [CPSR]
%r0 = ARMISD::MOVCC 0, -26 @ reads [CPSR]
All we have to do is convert the "ISD::AND" into an "ARM::ANDS" that sets [CPSR]
when it's zero. The zero value will all ready be in the %r0 register and we only
need to change it if the AND wasn't zero. Easy!
llvm-svn: 112664
comparison that would overflow.
- The other under/overflow cases can't actually happen because the immediates
which would trigger them are legal (so we don't enter this code), but
adjusted the style to make it clear the transform is always valid.
llvm-svn: 112053
platform. It's apparently "bl __muldf3" on linux, for example. Since that's
not what we're checking here, it's more robust to just force a triple. We
just wwant to check that the inline FP instructions are only generated
on cpus that have them."
llvm-svn: 110830
memory and synchronization barrier dmb and dsb instructions.
- Change instruction names to something more sensible (matching name of actual
instructions).
- Added tests for memory barrier codegen.
llvm-svn: 110785
Also added a test case to check for the added benefit of this patch: it's optimizing away the unnecessary restore of sp from fp for some non-leaf functions.
llvm-svn: 110707
have 4 bits per register in the operand encoding), but have undefined
behavior when the operand value is 13 or 15 (SP and PC, respectively).
The trivial coalescer in linear scan sometimes will merge a copy from
SP into a subsequent instruction which uses the copy, and if that
instruction cannot legally reference SP, we get bad code such as:
mls r0,r9,r0,sp
instead of:
mov r2, sp
mls r0, r9, r0, r2
This patch adds a new register class for use by Thumb2 that excludes
the problematic registers (SP and PC) and is used instead of GPR
for those operands which cannot legally reference PC or SP. The
trivial coalescer explicitly requires that the register class
of the destination for the COPY instruction contain the source
register for the COPY to be considered for coalescing. This prevents
errant instructions like that above.
PR7499
llvm-svn: 109842
integers with mov + vdup. 8003375. This is
currently disabled by default because LICM will
not hoist a VDUP, so it pessimizes the code if
the construct occurs inside a loop (8248029).
llvm-svn: 109799
Add explicit testcases for tail calls within the same module.
Duplicate some code to humor those who think .w doesn't apply on ARM.
Leave this disabled on Thumb1, and add some comments explaining why it's hard
and won't gain much.
llvm-svn: 107851
There are 2 changes relative to the previous version of the patch:
1) For the "simple" if-conversion case, there's no need to worry about
RemoveExtraEdges not handling an unanalyzable branch. Predicated terminators
are ignored in this context, so RemoveExtraEdges does the right thing.
This might break someday if we ever treat indirect branches (BRIND) as
predicable, but for now, I just removed this part of the patch, because
in the case where we do not add an unconditional branch, we rely on keeping
the fall-through edge to CvtBBI (which is empty after this transformation).
The change relative to the previous patch is:
@@ -1036,10 +1036,6 @@
IterIfcvt = false;
}
- // RemoveExtraEdges won't work if the block has an unanalyzable branch,
- // which is typically the case for IfConvertSimple, so explicitly remove
- // CvtBBI as a successor.
- BBI.BB->removeSuccessor(CvtBBI->BB);
RemoveExtraEdges(BBI);
// Update block info. BB can be iteratively if-converted.
2) My patch exposed a bug in the code for merging the tail of a "diamond",
which had previously never been exercised. The code was simply checking that
the tail had a single predecessor, but there was a case in
MultiSource/Benchmarks/VersaBench/dbms where that single predecessor was
neither edge of the diamond. I added the following change to check for
that:
@@ -1276,7 +1276,18 @@
// tail, add a unconditional branch to it.
if (TailBB) {
BBInfo TailBBI = BBAnalysis[TailBB->getNumber()];
- if (TailBB->pred_size() == 1 && !TailBBI.HasFallThrough) {
+ bool CanMergeTail = !TailBBI.HasFallThrough;
+ // There may still be a fall-through edge from BBI1 or BBI2 to TailBB;
+ // check if there are any other predecessors besides those.
+ unsigned NumPreds = TailBB->pred_size();
+ if (NumPreds > 1)
+ CanMergeTail = false;
+ else if (NumPreds == 1 && CanMergeTail) {
+ MachineBasicBlock::pred_iterator PI = TailBB->pred_begin();
+ if (*PI != BBI1->BB && *PI != BBI2->BB)
+ CanMergeTail = false;
+ }
+ if (CanMergeTail) {
MergeBlocks(BBI, TailBBI);
TailBBI.IsDone = true;
} else {
With these fixes, I was able to run all the SingleSource and MultiSource
tests successfully.
llvm-svn: 107110