This is essentially a recommit of r285893, but with a correctness fix. The
problem of the original commit was that this:
bic r5, r7, #31
cbz r5, .LBB2_10
got rewritten into:
lsrs r5, r7, #5
beq .LBB2_10
The result in destination register r5 is not the same and this is incorrect
when r5 is not dead. So this fix includes checking the uses of the AND
destination register. And also, compared to the original commit, some regression
tests didn't need changing anymore because of this extra check.
For completeness, this was the original commit message:
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more
efficient instruction selection if the bitmask is one consecutive sequence of
set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and
set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and
set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit
into the sign bit with one LSLS and change the condition query from NE/EQ to
MI/PL (we could also implement this by shifting into the carry bit and
branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower
zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two
16-bit instructions but can elide the CMP and doesn't require materializing a
complex immediate, so is also a win.
Differential Revision: https://reviews.llvm.org/D27761
llvm-svn: 289794
This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk.
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.
llvm-svn: 285893
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.
llvm-svn: 281323
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.
llvm-svn: 281215
There is not an official documented ABI for frame pointers in Thumb2,
but we should try to emit something which is useful.
We use r7 as the frame pointer for Thumb code, which currently means
that if a function needs to save a high register (r8-r11), it will get
pushed to the stack between the frame pointer (r7) and link register
(r14). This means that while a stack unwinder can follow the chain of
frame pointers up the stack, it cannot know the offset to lr, so does
not know which functions correspond to the stack frames.
To fix this, we need to push the callee-saved registers in two batches,
with the first push saving the low registers, fp and lr, and the second
push saving the high registers. This is already implemented, but
previously only used for iOS. This patch turns it on for all Thumb2
targets when frame pointers are required by the ABI, and the frame
pointer is r7 (Windows uses r11, so this isn't a problem there). If
frame pointer elimination is enabled we still emit a single push/pop
even if we need a frame pointer for other reasons, to avoid increasing
code size.
We must also ensure that lr is pushed to the stack when using a frame
pointer, so that we end up with a complete frame record. Situations that
could cause this were rare, because we already push lr in most
situations so that we can return using the pop instruction.
Differential Revision: https://reviews.llvm.org/D23516
llvm-svn: 279506
The R_ARM_PLT32 relocation is deprecated and is not produced by MC.
This means that the code being deleted is dead from the .o point of
view and was making the .s more confusing.
llvm-svn: 272909
I'm really not sure why we were in the first place, it's the linker's job to
convert between BL/BLX as necessary. Even worse, using BLX left Thumb calls
that could be locally resolved completely unencodable since all offsets to BLX
are multiples of 4.
rdar://26182344
llvm-svn: 269101
Essentially the same as the GEP change in r230786.
A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)
import fileinput
import sys
import re
pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")
for line in sys.stdin:
sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7649
llvm-svn: 230794
The C and C++ semantics for compare_exchange require it to return a bool
indicating success. This gets mapped to LLVM IR which follows each cmpxchg with
an icmp of the value loaded against the desired value.
When lowered to ldxr/stxr loops, this extra comparison is redundant: its
results are implicit in the control-flow of the function.
This commit makes two changes: it replaces that icmp with appropriate PHI
nodes, and then makes sure earlyCSE is called after expansion to actually make
use of the opportunities revealed.
I've also added -{arm,aarch64}-enable-atomic-tidy options, so that
existing fragile tests aren't perturbed too much by the change. Many
of them either rely on undef/unreachable too pervasively to be
restored to something well-defined (particularly while making sure
they test the same obscure assert from many years ago), or depend on a
particular CFG shape, which is disrupted by SimplifyCFG.
rdar://problem/16227836
llvm-svn: 209883
This option is from 2010, designed to work around a linker issue on Darwin for
ARM. According to grosbach this is no longer an issue and this option can
safely be removed.
llvm-svn: 203576
This update was done with the following bash script:
find test/CodeGen -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
done
sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
mv $TEMP $NAME
fi
done
llvm-svn: 186280
return values are bitcasts.
The chain had previously been being clobbered with the entry node to
the dag, which sometimes caused other code in the function to be
erroneously deleted when tailcall optimization kicked in.
<rdar://problem/13827621>
llvm-svn: 181696
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.
PR12419
rdar://9770785
rdar://11195178
llvm-svn: 154370
Also more cleanly separate the ARM vs. Thumb functionality. Previously, the
encoding would be incorrect for some Thumb instructions (the indirect calls).
llvm-svn: 127637
legalization time. Since at legalization time there is no mapping from
SDNode back to the corresponding LLVM instruction and the return
SDNode is target specific, this requires a target hook to check for
eligibility. Only x86 and ARM support this form of sibcall optimization
right now.
rdar://8707777
llvm-svn: 120501
Add explicit testcases for tail calls within the same module.
Duplicate some code to humor those who think .w doesn't apply on ARM.
Leave this disabled on Thumb1, and add some comments explaining why it's hard
and won't gain much.
llvm-svn: 107851
basic tests.
This has been well tested on Darwin but not elsewhere.
It should work provided the linker correctly resolves
B.W <label in other function>
which it has not seen before, at least from llvm-based
compilers. I'm leaving the arm-tail-calls switch in
until I see if there's any problems because of that;
it might need to be disabled for some environments.
llvm-svn: 106299