so we can use initializer lists for the ARMSubtarget and then
use this to initialize a moved DataLayout on the subtarget from
the TargetMachine.
llvm-svn: 210861
Windows on ARM uses COFF/PE which is intrinsically position independent. For
the case of 32-bit immediates, use a pair-wise relocation as otherwise we may
exceed the range of operators. This fixes a code generation crash when using
-Oz when targeting Windows on ARM.
llvm-svn: 210814
As requested by AArch64 subtargets.
Note that this will have no effect until the
AArch64 target actually enables the pass like this:
substitutePass(&PostRASchedulerID, &PostMachineSchedulerID);
As soon as armv7 switches over, PostMachineScheduler will become the
default postRA scheduler, so this won't be necessary any more.
Targets using the old postRA schedule would then do:
substitutePass(&PostMachineSchedulerID, &PostRASchedulerID);
llvm-svn: 210167
The Cyclone CPU is similar to swift for most LLVM purposes, but does have two
preferred instructions for zeroing a VFP register. This teaches LLVM about
them.
llvm-svn: 205309
After all hard work to implement the EHABI and with the test-suite
passing, it's time to turn it on by default and allow users to
disable it as a work-around while we fix the eventual bugs that show
up.
This commit also remove the -arm-enable-ehabi-descriptors, since we
want the tables to be printed every time the EHABI is turned on
for non-Darwin ARM targets.
Although MCJIT EHABI is not working yet (needs linking with the right
libraries), this commit also fixes some relocations on MCJIT regarding
the EH tables/lib calls, and update some tests to avoid using EH tables
when none are needed.
The EH tests in the test-suite that were previously disabled on ARM
now pass with these changes, so a follow-up commit on the test-suite
will re-enable them.
llvm-svn: 200388
The ARM backend has been using most of the MachO related subtarget
checks almost interchangeably, and since the only target it's had to
run on has been IOS (which is all three of MachO, Darwin and IOS) it's
worked out OK so far.
But we'd like to support embedded targets under the "*-*-none-macho"
triple, which means everything starts falling apart and inconsistent
behaviours emerge.
This patch should pick a reasonably sensible set of behaviours for the
new triple (and any others that come along, with luck). Some choices
were debatable (notably FP == r7 or r11), but we can revisit those
later when deficiencies become apparent.
llvm-svn: 198617
Longer term, we want to move users to "*-*-*-macho" for embedded work, but for
now people are relying on the last thing we told them, which is unfortunately
"*-*-darwin-eabi".
rdar://problem/15703934
llvm-svn: 198602
Clang sets the float-abi target option manually, but no longer
annotates each function with its ABI. This can lead to confusing
mistmatch between "clang -emit-llvm | llc" and normal clang
invocations.
Besides which, gnueabihf actually *is* hard-float. Defaulting to soft
was just perverse.
llvm-svn: 197554
Most users would be surprised if "isCOFF" and "isMachO" were simultaneously
true, unless they'd put the compiler in a box with a gun attached to a photon
detector.
This makes sure precisely one of the three formats is true for any triple and
simplifies some target logic based on that.
llvm-svn: 196934
- krait processor currently modeled with the same features as A9.
- Krait processor additionally has VFP4 (fused multiply add/sub)
and hardware division features enabled.
- krait has currently the same Schedule model as A9
- krait cpu flag is not recognized by the GNU assembler yet,
it is replaced with march=armv7-a to avoid a lower march
from being used.
llvm-svn: 196619
By default, the behavior of IT block generation will be determinated
dynamically base on the arch (armv8 vs armv7). This patch adds backend
options: -arm-restrict-it and -arm-no-restrict-it. The former one
restricts the generation of IT blocks (the same behavior as thumbv8) for
both arches. The later one allows the generation of legacy IT block (the
same behavior as ARMv7 Thumb2) for both arches.
Clang will support -mrestrict-it and -mno-restrict-it, which is
compatible with GCC.
llvm-svn: 194592
Add a Virtualization ARM subtarget feature along with adding proper build
attribute emission for Tag_Virtualization_use (encodes Virtualization and
TrustZone) and Tag_MPextension_use.
Also rework test/CodeGen/ARM/2010-10-19-mc-elf-objheader.ll testcase to
something that is more maintainable. This changes the focus of this
testcase away from testing CPU defaults (which is tested elsewhere), onto
specifically testing that attributes are encoded correctly.
llvm-svn: 193859
Adds a subtarget feature for the CRC instructions (optional in v8-A) to the ARM (32-bit) backend.
Differential Revision: http://llvm-reviews.chandlerc.com/D2036
llvm-svn: 193599
There's a barrier instruction so that should still be used, but most actual
atomic operations are going to need a platform decision on the correct
behaviour (either nop if single-threaded or OS-support otherwise).
rdar://problem/15287210
llvm-svn: 193399
The hint instructions ("nop", "yield", etc) are mostly Thumb2-only, but have
been ported across to the v6M architecture. Fortunately, v6M seems to sit
nicely between v6 (thumb-1 only) and v6T2, so we can add a feature for it
fairly easily.
rdar://problem/15144406
llvm-svn: 192097
This patch enables calls to __aeabi_idivmod when in EABI mode,
by using the remainder value returned on registers (R1),
enabled by the ARM triple "none-eabi". Note that Darwin and
GNUEABI triples will continue lowering on GNU style, that is,
using the stack for the remainder.
Still need to add SREM/UREM support fix for 64-bit lowering.
llvm-svn: 186390
Performance monitors, including a basic cycle counter, are an official
extension in the ARMv7 specification. This adds support for enabling and
disabling them, orthogonally from CPU selection.
rdar://problem/13939186
llvm-svn: 182602
This patch matches GCC behavior: the code used to only allow unaligned
load/store on ARM for v6+ Darwin, it will now allow unaligned load/store
for v6+ Darwin as well as for v7+ on Linux and NaCl.
The distinction is made because v6 doesn't guarantee support (but LLVM
assumes that Apple controls hardware+kernel and therefore have
conformant v6 CPUs), whereas v7 does provide this guarantee (and
Linux/NaCl behave sanely).
The patch keeps the -arm-strict-align command line option, and adds
-arm-no-strict-align. They behave similarly to GCC's -mstrict-align and
-mnostrict-align.
I originally encountered this discrepancy in FastIsel tests which expect
unaligned load/store generation. Overall this should slightly improve
performance in most cases because of reduced I$ pressure.
llvm-svn: 182175
These instructions aren't universally available, but depend on a specific
extension to the normal ARM architecture (rather than, say, v6/v7/...) so a new
feature is appropriate.
This also enables the feature by default on A-class cores which usually have
these extensions, to avoid breaking existing code and act as a sensible
default.
llvm-svn: 179171
NEON is not IEEE 754 compliant, so we should avoid lowering single-precision
floating point operations with NEON unless unsafe-math is turned on. The
equivalent VFP instructions are IEEE 754 compliant, but in some cores they're
much slower, so some archs/OSs might still request it to be on by default,
such as Swift and Darwin.
llvm-svn: 177651
are more expensive than the non-flag setting variant. Teach thumb2 size
reduction pass to avoid generating them unless we are optimizing for size.
rdar://12892707
llvm-svn: 170728
textually as NativeClient. Also added a link to the native client project for
readers unfamiliar with it.
A Clang patch will follow shortly.
llvm-svn: 169291
missed in the first pass because the script didn't yet handle include
guards.
Note that the script is now able to handle all of these headers without
manual edits. =]
llvm-svn: 169224
predicates.
Also remove NEON2 since it's not really useful and it is confusing. If
NEON + VFP4 implies NEON2 but NEON2 doesn't imply NEON + VFP4, what does it
really mean?
rdar://10139676
llvm-svn: 154480
1. The new instruction itinerary entries are not properly described.
2. The asm parser can't handle vfms and vfnms.
3. There were no assembler, disassembler test cases.
4. HasNEON2 has the wrong assembler predicate.
rdar://10139676
llvm-svn: 154456
In this update:
- I assumed neon2 does not imply vfpv4, but neon and vfpv4 imply neon2.
- I kept setting .fpu=neon-vfpv4 code attribute because that is what the
assembler understands.
Patch by Ana Pazos <apazos@codeaurora.org>
llvm-svn: 152036
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.
Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.
rdar://8979299
llvm-svn: 151623
Build on previous patches to successfully distinguish between an M-series and A/R-series MSR and MRS instruction. These take different mask names and have a *slightly* different opcode format.
Add decoder and disassembler tests.
Improvement on the previous patch - successfully distinguish between valid v6m and v7m masks (one is a subset of the other). The patch had to be edited slightly to apply to ToT.
llvm-svn: 140696
instructions are more aligned than the CPU requires, and adds some additional
directives, to follow in future patches. Patch by David Meyer!
llvm-svn: 139125
The DSP instructions in the Thumb2 instruction set are an optional extension
in the Cortex-M* archtitecture. When present, the implementation is considered
an "ARMv7E-M implementation," and when not, an "ARMv7-M implementation."
Add a subtarget feature hook for the v7e-m instructions and hook it up. The
cortex-m3 cpu is an example of a v7m implementation, while the cortex-m4 is
a v7e-m implementation.
rdar://9572992
llvm-svn: 134261
itineraries.
- Refactor TargetSubtarget to be based on MCSubtargetInfo.
- Change tablegen generated subtarget info to initialize MCSubtargetInfo
and hide more details from targets.
llvm-svn: 134257
be the first encoded as the first feature. It then uses the CPU name to look up
features / scheduling itineray even though clients know full well the CPU name
being used to query these properties.
The fix is to just have the clients explictly pass the CPU name!
llvm-svn: 134127
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.
Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.
Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.
Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.
ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.
ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.
llvm-svn: 122541
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
additional pipeline stall. So it's frequently better to single codegen
vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
vmla + vmla is very bad. But this isn't ideal either:
vmul
vadd
vmla
Instead, we want to expand the second vmla:
vmla
vmul
vadd
Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
faster.
Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.
A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
vmla / vmls will trigger one of the special hazards.
Work in progress, only A+B are enabled.
llvm-svn: 120960
"-mattr=+vfp3" is specified. However, this will not work for hardware that
only supports 16 registers. Add a new flag to support -"mattr=+vfp3,+d16".
Patch by Jan Voung!
llvm-svn: 116310
cost modeling for if-conversion. Now if only we had a way to estimate the misprediction probability.
Adjsut CodeGen/ARM/ifcvt10.ll. The pipeline on Cortex-A8 is long enough that it is still profitable
to predicate an ldm, but the shorter pipeline on Cortex-A9 makes it unprofitable.
llvm-svn: 114995