This patch adds Cortex-R8 to Target Parser and TableGen.
It also adds CodeGen tests for the build attributes.
Patch by Pablo Barrio.
Differential Revision: http://reviews.llvm.org/D17925
llvm-svn: 263132
The initial change was insufficiently complete for always getting the semantics
of __builtin_longjmp correct. The builtin is translated into a
`tInt_eh_sjlj_longjmp` DAG node. This node set R7 as clobbered. However, the
code would then follow up with a clobber of R11. I had failed to notice the
imp-def,kill on R7 in the isel. Unfortunately, it seems that it is not possible
to conditionalise the Defs list via an !if. Instead, construct a new parallel
WIN node and prefer that when targeting windows. This ensures that we now both
correctly model the __builtin_longjmp as well as construct the frame in a more
ABI conformant manner.
llvm-svn: 263123
WoA uses r11 as the FP even though it is a pure thumb-2 environment in contrast
to AAPCS which states r7. This adjusts __builtin_longjmp to not clobber r7 and
to properly restore the frame pointer on execution.
llvm-svn: 263118
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make
sure we only emit valid types or the ones that were explicitly marked as custom.
Now, passing check-all and test-suite on x86, ARM and AArch64.
This patch fixes PR17193 (and a long time FIXME in the tests).
llvm-svn: 262738
This test failed in some ARM bots after a divmod change because it was
running on a native llc, instead of targeted one. This makes sure the test
is target-specific (as intended), and also copies to ARM and AArch64
directories. If it is also supposed to work on other architectures, I'll
leave as an exercise to the respective maintainers.
llvm-svn: 262620
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
This patch fixes PR17193 (and a long time FIXME in the tests).
llvm-svn: 262507
Most of the time ARM has the CCR.UNALIGN_TRP bit set to false which
means that unaligned loads/stores do not trap and even extensive testing
will not catch these bugs. However the multi/double variants are not
affected by this bit and will still trap. In effect a more aggressive
load/store optimization will break existing (bad) code.
These bugs do not necessarily manifest in the broken code where the
misaligned pointer is formed but often later in perfectly legal code
where it is accessed. This means recompiling system libraries (which
have no alignment bugs) with a newer compiler will break existing
applications (with alignment bugs) that worked before.
So (under protest) I implemented this safe mode which limits the
formation of multi/double operations to cases that are not affected by
user code (stack operations like spills/reloads) or cases where the
normal operations trap anyway (floating point load/stores). It is
disabled by default.
Differential Revision: http://reviews.llvm.org/D17015
llvm-svn: 262504
DMB instructions can be expensive, so it's best to avoid them if possible. In
atomicrmw operations there will always be an attempted store so a release
barrier is always needed, but in the cmpxchg case we can delay the DMB until we
know we'll definitely try to perform a store (and so need release semantics).
In the strong cmpxchg case this isn't quite free: we must duplicate the LDREX
instructions to skip the barrier on subsequent iterations. The basic outline
becomes:
ldrex rOld, [rAddr]
cmp rOld, rDesired
bne Ldone
dmb
Lloop:
strex rRes, rNew, [rAddr]
cbz rRes Ldone
ldrex rOld, [rAddr]
cmp rOld, rDesired
beq Lloop
Ldone:
So we'll skip this version for strong operations in "minsize" functions.
llvm-svn: 261568
Certain optimization passes (like globaldce) can prune function
declaration that SjLjEHPrepare assumed would exit when it'd
runOnFunction.
This fixes PR26669.
llvm-svn: 261303
Summary:
Without this, this command
$ llvm-run llc -stop-after machine-cp -o - <( echo '' )
outputs an error, because we close stdout twice -- once when closing the
file opened for "-o", and again when closing outs().
Also clarify in the outs() definition that you can't ever call it if you
want to open your own raw_fd_ostream on stdout.
Reviewers: jroelofs, tstellarAMD
Subscribers: jholewinski, qcolombet, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D17422
llvm-svn: 261286
Add support for TLS access for Windows on ARM. This generates a similar access
to MSVC for ARM.
The changes to the tablegen data is needed to support loading an external symbol
global that is not for a call. The adjustments to the DAG to DAG transforms are
needed to preserve the 32-bit move.
llvm-svn: 259676
The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores
which happens to be a lot faster than __divsi3/__modsi3 when the core
has hardware divide instructions. Do the same here.
Fixes PR26450.
llvm-svn: 259657
description and changed the regression test accordingly.
The default configuration of a Cortex-R7 is to implement the
VFPv3-D16 architecture and the feature line as it was is too
restrictive.
llvm-svn: 259480
The basic optimisation was to convert (mul $LHS, $complex_constant) into
roughly "(shl (mul $LHS, $simple_constant), $simple_amt)" when it was expected
to be cheaper. The original logic checks that the mul only has one use (since
we're mangling $complex_constant), but when used in even more complex
addressing modes there may be an outer addition that can pick up the wrong
value too.
I *think* the ARM addressing-mode problem is actually unreachable at the
moment, but that depends on complex assessments of the profitability of
pre-increment addressing modes so I've put a real check in there instead of an
assertion.
llvm-svn: 259228
The trap instruction is emitted as a data-in-text rather
than an instruction. This patch uses the .inst directive
for emitting trap.
Differential Revision: http://reviews.llvm.org/D16684
llvm-svn: 259182
Various bits we want to use the new ABI actually compile with "-arch armv7k
-miphoneos-version-min=9.0". Not ideal, but also not ridiculous given how
slices work.
llvm-svn: 258975
For historic reasons, the behavior of .align differs between targets.
Fortunately, there are alternatives, .p2align and .balign, which make the
interpretation of the parameter explicit, and which behave consistently across
targets.
This patch teaches MC to use .p2align instead of .align, so that people reading
code for multiple architectures don't have to remember which way each platform
does its .align directive.
Differential Revision: http://reviews.llvm.org/D16549
llvm-svn: 258750
This patch was originally committed as r257885, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258683
This patch was originally committed as r257884, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258682
In the optimizer (GVN etc.) when eliminating redundant nodes with different
flags, the flags are ignored for the purposes of testing for congruence, and
then intersected for the purposes of producing a result that supports the union
of all the uses. This commit makes SelectionDAG's CSE do the same thing,
allowing it to CSE nodes in more cases. This fixes PR26063.
Differential Revision: http://reviews.llvm.org/D15957
llvm-svn: 257940
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.
PR26136
llvm-svn: 257930
# The first commit's message is:
Revert "[ARM] Add DSP build attribute and extension targeting"
This reverts commit b11cc50c0b4a7c8cdb628abc50b7dc226ff583dc.
# This is the 2nd commit message:
Revert "[ARM] Add new system registers to ARMv8-M Baseline/Mainline"
This reverts commit 837d08454e3e5beb8581951ac26b22fa07df3cd5.
llvm-svn: 257916
platforms.
With ELF, the alignment of a global variable in a shared library will
get copied into an executables linked against it, if the executable even
accesss the variable. So, it's not possible to implicitly increase
alignment based on access patterns, or you'll break existing binaries.
This happened to affect libc++'s std::cout symbol, for example. See
thread: http://thread.gmane.org/gmane.comp.compilers.clang.devel/45311
(This is a re-commit of r257719, without the bug reported in
PR26144. I've tweaked the code to not assert-fail in
enforceKnownAlignment when computeKnownBits doesn't recurse far enough
to find the underlying Alloca/GlobalObject value.)
Differential Revision: http://reviews.llvm.org/D16145
llvm-svn: 257902
I originally reapplied this in 257550, but had to revert again due to bot
breakage. The only change in this version is to allow either the TypeSize
or the TypeAllocSize of the variable to be the one represented in debug info
(hopefully in the future we can figure out how to encode the difference).
Additionally, several bot failures following r257550, were due to
optimizer bugs now fixed in r257787 and r257795.
r257550 commit message was:
```
The follow extra changes were made to test cases:
Manually making the variable be the actual type instead of a pointer
to avoid pointer-size differences in generic code:
LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll
LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll
LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll
LLVM :: DebugInfo/Generic/varargs.ll
Delete sizing information from debug info for the same reason
(but the presence of the pointer was important to the test case):
LLVM :: DebugInfo/Generic/restrict.ll
LLVM :: DebugInfo/Generic/tu-composite.ll
LLVM :: Linker/type-unique-type-array-a.ll
LLVM :: Linker/type-unique-simple2.ll
Fixing an incorrect DW_OP_deref
LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll
Fixing a missing DW_OP_deref
LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll
Additionally, clang should no longer complain during bootstrap should no
longer happen after r257534.
The original commit message was:
``
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.
One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.
Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
variable as that would depend on the target, even though this test is
supposed to be generic
- I had to manually declared size/align for reference type. See also the
discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref
``
```
llvm-svn: 257850
Since r230276, we support an improved legalization for f64->f16,
which goes through a temporary f32, improving codegen when
f32->f16 is legal but not f64->f16. This requires unsafe-fp-math.
However, that legalization assumed that the second step, producing
a pseudo-softened f16, had type i16. That's not true on targets
with illegal i16, such as ARM.
Use the initial f64->f16 result type instead.
llvm-svn: 257794
platforms.
With ELF, the alignment of a global variable in a shared library will
get copied into an executables linked against it, if the executable even
accesss the variable. So, it's not possible to implicitly increase
alignment based on access patterns, or you'll break existing binaries.
This happened to affect libc++'s std::cout symbol, for example. See
thread: http://thread.gmane.org/gmane.comp.compilers.clang.devel/45311
llvm-svn: 257719
Previous implementation in http://reviews.llvm.org/D10522
created external references to __emutls_v.* variables.
Such references are inaccurate and cannot be handled by
all linkers, e.g. Android dynamic and gold linkers for aarch64.
Now a new LowerEmuTLS pass to go through all global variables,
and add emutls_v.* and emutls_t.* variables.
These __emutls* variables have the same linkage and
visibility as the associated user defined TLS variable.
Also removed old code that dump __emutls* variables in AsmPrinter.cpp,
and updated TLS unit tests.
Differential Revision: http://reviews.llvm.org/D15300
llvm-svn: 257718
The follow extra changes were made to test cases:
Manually making the variable be the actual type instead of a pointer
to avoid pointer-size differences in generic code:
LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll
LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll
LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll
LLVM :: DebugInfo/Generic/varargs.ll
Delete sizing information from debug info for the same reason
(but the presence of the pointer was important to the test case):
LLVM :: DebugInfo/Generic/restrict.ll
LLVM :: DebugInfo/Generic/tu-composite.ll
LLVM :: Linker/type-unique-type-array-a.ll
LLVM :: Linker/type-unique-simple2.ll
Fixing an incorrect DW_OP_deref
LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll
Fixing a missing DW_OP_deref
LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll
Additionally, clang should no longer complain during bootstrap should no
longer happen after r257534.
The original commit message was:
```
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.
One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.
Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
variable as that would depend on the target, even though this test is
supposed to be generic
- I had to manually declared size/align for reference type. See also the
discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref
```
llvm-svn: 257550
Summary:
BFC instructions are available in ARMv6T2 and above.
Reviewers: t.p.northover
Subscribers: aemerson
Differential Revision: http://reviews.llvm.org/D16076
llvm-svn: 257546
VMOVs are not strictly speaking cheap, but they are as expensive as a vector
copy (VORR), so we should prefer rematerialization over splitting when it
applies.
rdar://problem/23754176
llvm-svn: 257545
Summary:
r255334 matches bit-reverse pattern in InstCombine and generates calls to Instrinsic::bitreverse.
RBIT instruction is only available for ARMv6t2 and above. This patch has the intrinsic expanded during legalization for ARMv4 and ARMv5.
Patch by Z. Zheng <zhaoshiz@codeaurora.org>
Reviewers: apazos, jmolloy, weimingz
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D15932
llvm-svn: 257188
Summary:
During legalization if i16, do not ASSERTZEXT the result of FP_TO_FP16.
Directly return an FP_TO_FP16 node with return type as the
promote-to-type of i16.
This patch also removes extraneous length check. This legalization
should be valid even if integer and float types are of different
lengths.
This patch breaks a hard-float test for fp16 args. The test is changed
to allow a vmov to zero-out the top bits, and also ensure that the
return value is in an FP register.
Reviewers: ab, jmolloy
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D15438
llvm-svn: 257184