points.
There is an infinite loop that can occur in Shrink Wrapping while searching
for the Save/Restore points.
Part of this search checks whether the save/restore points are located in
different loop nests and if so, uses the (post) dominator trees to find the
immediate (post) dominator blocks. However, if the current block does not have
any immediate (post) dominators then this search will result in an infinite
loop. This can occur in code containing an infinite loop.
The modification checks whether the immediate (post) dominator is different from
the current save/restore block. If it is not, then the search terminates and the
current location is not considered as a valid save/restore point for shrink wrapping.
Phabricator: http://reviews.llvm.org/D11607
llvm-svn: 244247
return StringSwitch<int>(Flags)
.Case("g", 0x1)
.Case("nzcvq", 0x2)
.Case("nzcvqg", 0x3)
.Default(-1);
...
// The _g and _nzcvqg versions are only valid if the DSP extension is
// available.
if (!Subtarget->hasThumb2DSP() && (Mask & 0x2))
return -1;
ARMARM confirms that the comment is right, and the code was wrong.
llvm-svn: 244029
This adds the software division routines for the Windows RTABI. These are not
expected to be used often though as most modern Windows ARM capable targets
support hardware division. In the case that the target CPU doesnt support
hardware division, this will be the fallback.
llvm-svn: 243952
Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s.
The backend is liable to start relying on that (if it hasn't already),
so make uniquable `DICompileUnit`s illegal and automatically upgrade old
bitcode. This is a nice cleanup, since we can remove an unnecessary
`DenseSet` (and the associated uniquing info) from `LLVMContextImpl`.
Almost all the testcases were updated with this script:
git grep -e '= !DICompileUnit' -l -- test |
grep -v test/Bitcode |
xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,'
I imagine something similar should work for out-of-tree testcases.
llvm-svn: 243885
This is necessary for WatchOS support, where the compact unwind format assumes
this kind of layout. For now we only want this on Swift-like CPUs though, where
it's been the Xcode behaviour for ages. Also, since it can expand the prologue
we don't want it at -Oz.
llvm-svn: 243884
Enabling merging of extern globals appears to be generally either beneficial or
harmless. On some benchmarks suites (on Cortex-M4F, Cortex-A9, and Cortex-A57)
it gives improvements in the 1-5% range, but in the rest the overall effect is
zero.
Differential Revision: http://reviews.llvm.org/D10966
llvm-svn: 243874
In http://reviews.llvm.org/rL215382, IT forming was made more conservative under
the belief that a flag-setting instruction was unpredictable inside an IT block on ARMv6M.
But actually, ARMv6M doesn't even support IT blocks so that's impossible. In the ARMARM for
v7M, v7AR and v8AR it states that the semantics of such an instruction changes inside an
IT block - it doesn't set the flags. So actually it is fine to use one inside an IT block
as long as the flags register is dead afterwards.
This gives significant performance improvements in a variety of MPEG based workloads.
Differential revision: http://reviews.llvm.org/D11680
llvm-svn: 243869
Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags,
using `DW_TAG_variable` in their place Stop exposing the `tag:` field at
all in the assembly format for `DILocalVariable`.
Most of the testcase updates were generated by the following sed script:
find test/ -name "*.ll" -o -name "*.mir" |
xargs grep -l 'DILocalVariable' |
xargs sed -i '' \
-e 's/tag: DW_TAG_arg_variable, //' \
-e 's/tag: DW_TAG_auto_variable, //'
There were only a handful of tests in `test/Assembly` that I needed to
update by hand.
(Note: a follow-up could change `DILocalVariable::DILocalVariable()` to
set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable`
(as appropriate), instead of having that logic magically in the backend
in `DbgVariable`. I've added a FIXME to that effect.)
llvm-svn: 243774
For a modulo (reminder) operation,
clang -target armv7-none-linux-gnueabi generates "__modsi3"
clang -target armv7-none-eabi generates "__aeabi_idivmod"
clang -target armv7-linux-androideabi generates "__modsi3"
Android bionic libc doesn't provide a __modsi3, instead it provides a
"__aeabi_idivmod". This patch fixes the LLVM ARMISelLowering to generate
the correct call when ever there is a modulo operation.
Differential Revision: http://reviews.llvm.org/D11661
llvm-svn: 243717
This commit defines subtarget feature strict-align and uses it instead of
cl::opt -arm-strict-align to decide whether strict alignment should be
forced. Also, remove the logic that was checking the OS and architecture
as clang is now responsible for setting strict-align based on the command
line options specified and the target architecute and OS.
rdar://problem/21529937
http://reviews.llvm.org/D11470
llvm-svn: 243493
The 'common' section TLS is not implemented.
Current C/C++ TLS variables are not placed in common section.
DWARF debug info to get the address of TLS variables is not generated yet.
clang and driver changes in http://reviews.llvm.org/D10524
Added -femulated-tls flag to select the emulated TLS model,
which will be used for old targets like Android that do not
support ELF TLS models.
Added TargetLowering::LowerToTLSEmulatedModel as a target-independent
function to convert a SDNode of TLS variable address to a function call
to __emutls_get_address.
Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel
for TLSModel::Emulated. Although all targets supporting ELF TLS models are
enhanced, emulated TLS model has been tested only for Android ELF targets.
Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for
emulated TLS variables.
Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls.
TODO: Add proper DIE for emulated TLS variables.
Added new unit tests with emulated TLS.
Differential Revision: http://reviews.llvm.org/D10522
llvm-svn: 243438
Add a verifier check that `DILocalVariable`s of tag
`DW_TAG_arg_variable` always have a non-zero 'arg:' field, and those of
tag `DW_TAG_auto_variable` always have a zero 'arg:' field. These are
the only configurations that are properly understood by the backend.
(Also, fix the bad examples in LangRef and test/Assembler, and fix the
bug in Kaleidoscope Ch8.)
A large number of testcases seem to have bitrotted their way forward
from some ancient version of the debug info hierarchy that didn't have
`arg:` parameters. If you have out-of-tree testcases that start failing
in the verifier and you don't care enough to get the `arg:` right, you
may have some luck just calling:
sed -e 's/, arg: 0/, arg: 1/'
or some such, but I hand-updated the ones in tree.
llvm-svn: 243183
Some shufflevectors are currently being incorrectly lowered in the AArch32
backend as the existing checks for detecting the NEON operations from the
shufflevector instruction expects the shuffle mask and the vector operands to be
of the same length.
This is not always the case as the mask may be twice as long as the operand;
here only the lower half of the shufflemask gets checked, so provided the lower
half of the shufflemask looks like a vector transpose (or even is just all -1
for undef) then the intrinsics may get incorrectly lowered into a vector
transpose (VTRN) instruction.
This patch fixes this by accommodating for both cases and adds regression tests.
Differential Revision: http://reviews.llvm.org/D11407
llvm-svn: 243103
is an immediate, in this check the value is negated and stored in and int64_t.
The value can be -2^63 yet the result cannot be stored in an int64_t and this
gives some undefined behaviour causing failures. The negation is only necessary
when the values is within a certain range and so it should not need to negate
-2^63, this patch introduces this and also a regression test.
Differential Revision: http://reviews.llvm.org/D11408
llvm-svn: 243100
whether register r9 should be reserved.
This recommits r242737, which broke bots because the number of subtarget
features went over the limit of 64.
This change is needed because we cannot use a backend option to set
cl::opt "arm-reserve-r9" when doing LTO.
Out-of-tree projects currently using cl::opt option "-arm-reserve-r9" to
reserve r9 should make changes to add subtarget feature "reserve-r9" to
the IR.
rdar://problem/21529937
Differential Revision: http://reviews.llvm.org/D11320
llvm-svn: 242756
Re-apply of r241928 which had to be reverted because of the r241926
revert.
This commit factors out common code from MergeBaseUpdateLoadStore() and
MergeBaseUpdateLSMultiple() and introduces a new function
MergeBaseUpdateLSDouble() which merges adds/subs preceding/following a
strd/ldrd instruction into an strd/ldrd instruction with writeback where
possible.
Differential Revision: http://reviews.llvm.org/D10676
llvm-svn: 242743
Re-apply r241926 with an additional check that r13 and r15 are not used
for LDRD/STRD. See http://llvm.org/PR24190. This also already includes
the fix from r241951.
Differential Revision: http://reviews.llvm.org/D10623
llvm-svn: 242742
whether register r9 should be reserved.
This change is needed because we cannot use a backend option to set
cl::opt "arm-reserve-r9" when doing LTO.
Out-of-tree projects currently using cl::opt option "-arm-reserve-r9" to
reserve r9 should make changes to add subtarget feature "reserve-r9" to
the IR.
rdar://problem/21529937
Differential Revision: http://reviews.llvm.org/D11320
llvm-svn: 242737
This is the first step toward supporting shrink-wrapping for this target.
The changes could be summarized by these items:
- Expand the tail-call return as part of the expand pseudo pass.
- Get rid of the assumptions that the epilogue is the exit block:
* Do not assume which registers are free in the epilogue. (This indirectly
improve the lowering of the code for the segmented stacks, see the test
cases.)
* Take into account that the basic block can be empty.
Related to <rdar://problem/20821730>
llvm-svn: 242714
Reapply r242500 now that the swift schedmodel includes LDRLIT.
This is mostly done to disable the PostRAScheduler which optimizes for
instruction latencies which isn't a good fit for out-of-order
architectures. This also allows to leave out the itinerary table in
swift in favor of the SchedModel ones.
This change leads to performance improvements/regressions by as much as
10% in some benchmarks, in fact we loose 0.4% performance over the
llvm-testsuite for reasons that appear to be unknown or out of the
compilers control. rdar://20803802 documents the investigation of
these effects.
While it is probably a good idea to perform the same switch for the
other ARM out-of-order CPUs, I limited this change to swift as I cannot
perform the benchmark verification on the other CPUs.
Differential Revision: http://reviews.llvm.org/D10513
llvm-svn: 242588
This is mainly for the benefit of GlobalMerge, so that an alias into a
MergedGlobals variable has the same size as the original non-merged
variable.
Differential Revision: http://reviews.llvm.org/D10837
llvm-svn: 242520
This is mostly done to disable the PostRAScheduler which optimizes for
instruction latencies which isn't a good fit for out-of-order
architectures. This also allows to leave out the itinerary table in
swift in favor of the SchedModel ones.
This change leads to performance improvements/regressions by as much as
10% in some benchmarks, in fact we loose 0.4% performance over the
llvm-testsuite for reasons that appear to be unknown or out of the
compilers control. rdar://20803802 documents the investigation of
these effects.
While it is probably a good idea to perform the same switch for the
other ARM out-of-order CPUs, I limited this change to swift as I cannot
perform the benchmark verification on the other CPUs.
Differential Revision: http://reviews.llvm.org/D10513
llvm-svn: 242500
llvm.eh.sjlj.setjmp was used as part of the SjLj exception handling
style but is also used in clang to implement __builtin_setjmp. The ARM
backend needs to output additional dispatch tables for the SjLj
exception handling style, these tables however can't be emitted if
llvm.eh.sjlj.setjmp is simply used for __builtin_setjmp and no actual
landing pad blocks exist.
To solve this issue a new llvm.eh.sjlj.setup_dispatch intrinsic is
introduced which is used instead of llvm.eh.sjlj.setjmp in the SjLj
exception handling lowering, so we can differentiate between the case
where we actually need to setup a dispatch table and the case where we
just need the __builtin_setjmp semantic.
Differential Revision: http://reviews.llvm.org/D9313
llvm-svn: 242481
This reverts commit r242300.
This is causing buildbot failures which we are investigating.
I'll reapply once we know whats going on, but for now want to
get the bots green.
llvm-svn: 242428
pairs for 32-bit immediates.
This change is needed to avoid emitting movt/movw pairs when doing LTO
and do so on a per-function basis.
Out-of-tree projects currently using cl::opt option -arm-use-movt=0 or
false to avoid emitting movt/movw pairs should make changes to add
subtarget feature "+no-movt" (see the changes made to clang in r242368).
rdar://problem/21529937
Differential Revision: http://reviews.llvm.org/D11026
llvm-svn: 242369
These were the cause of a verifier error when building 7zip with
-verify-machineinstrs. Running 'make check' with the verifier
triggered the same error on the test here so i've updated the test
to run the verifier on one of its runs instead of adding a new one.
While looking at this code, there was a stale comment that these
instructions were only used for disassembly. This probably used to
be the case, but they are now used in the 'ARM load / store optimization pass' too.
llvm-svn: 242300
The ones committed were orthogonal to the change and would have passed before
that revision. What it *did* do was prevent an assertion failure when
generating object files.
llvm-svn: 242166
This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan():
- Rename the function to determineCalleeSaves()
- Pass a bitset of callee saved registers by reference, thus avoiding
the function-global PhysRegUsed bitset in MachineRegisterInfo.
- Without PhysRegUsed the implementation is fine tuned to not save
physcial registers which are only read but never modified.
Related to rdar://21539507
Differential Revision: http://reviews.llvm.org/D10909
llvm-svn: 242165
The 64/128-bit vector types are legal if NEON instructions are
available. However, there was no matching patterns for @llvm.cttz.*()
intrinsics and result in fatal error.
This commit fixes the problem by lowering cttz to:
a. ctpop((x & -x) - 1)
b. width - ctlz(x & -x) - 1
llvm-svn: 242037
Register r12 ('ip') is used by GCC for this purpose
and hence is used here. As discussed on the GCC mailing
list, the register choice is an ABI issue and so
choosing the same register as GCC means
__builtin_call_with_static_chain is compatible.
A similar patch has just gone in the AArch64 backend,
so this is just the ARM counterpart, following the same
discussion.
Patch by Stephen Cross.
llvm-svn: 241996
This commit factors out common code from MergeBaseUpdateLoadStore() and
MergeBaseUpdateLSMultiple() and introduces a new function
MergeBaseUpdateLSDouble() which merges adds/subs preceding/following a
strd/ldrd instruction into an strd/ldrd instruction with writeback where
possible.
Differential Revision: http://reviews.llvm.org/D10676
llvm-svn: 241928
This improves the logic in several ways and is a preparation for
followup patches:
- First perform an analysis and create a list of merge candidates, then
transform. This simplifies the code in that you have don't have to
care to much anymore that you may be holding iterators to
MachineInstrs that get removed.
- Analyze/Transform basic blocks in reverse order. This allows to use
LivePhysRegs to find free registers instead of the RegisterScavenger.
The RegisterScavenger will become less precise in the future as it
relies on the deprecated kill-flags.
- Return the newly created node in MergeOps so there's no need to look
around in the schedule to find it.
- Rename some MBBI iterators to InsertBefore to make their role clear.
- General code cleanup.
Differential Revision: http://reviews.llvm.org/D10140
llvm-svn: 241920
This commit changes the target arch to fix the test case commited in r241566
that was failing on ninja-x64-msvc-RA-centos6. Also add checks to make sure
the callee's address is loaded to blx's operand.
llvm-svn: 241588
be emitted.
This is needed to enable ARM long calls for LTO and enable and disable it on a
per-function basis.
Out-of-tree projects currently using EnableARMLongCalls to emit long calls
should start passing "+long-calls" to the feature string (see the changes made
to clang in r241565).
rdar://problem/21529937
Differential Revision: http://reviews.llvm.org/D9364
llvm-svn: 241566
This commit changes normal isel and fast isel to read the user-defined trap
function name from function attribute "trap-func-name" attached to llvm.trap or
llvm.debugtrap instead of from TargetOptions::TrapFuncName. This is needed to
use clang's command line option "-ftrap-function" for LTO and enable changing
the trap function name on a per-call-site basis.
Out-of-tree projects currently using TargetOptions::TrapFuncName to specify the
trap function name should attach attribute "trap-func-name" to the call sites
of llvm.trap and llvm.debugtrap instead.
rdar://problem/21225723
Differential Revision: http://reviews.llvm.org/D10832
llvm-svn: 241305
When the store sequence being combined actually stores the base register, we
should not mark it as killed until the end.
rdar://21504262
llvm-svn: 241003
Some of the the permissible ARM -mfpu options, which are supported in GCC,
are currently not present in llvm/clang.This patch adds the options:
'neon-fp16', 'vfpv3-fp16', 'vfpv3-d16-fp16', 'vfpv3xd' and 'vfpv3xd-fp16.
These are related to half-precision floating-point and single precision.
Reviewers: rengolin, ranjeet.singh
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10645
llvm-svn: 240930
This patch fixes the error in ARM.td which stated that Cortex-R5
floating point unit can do only single precision, when it can do double as well.
Reviewers: rengolin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10769
llvm-svn: 240799
Cortex-R4F TRM states that fpu supports both single and double precision.
This patch corrects the information in ARM.td file and corresponding test.
Reviewers: rengolin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10763
llvm-svn: 240776