We were assuming all SBFX-like operations would have the shl/asr form, but
often when the field being extracted is an i8 or i16, we end up with a
SIGN_EXTEND_INREG acting on a shift instead. Simple enough to check for though.
llvm-svn: 213754
The post-indexed instructions were missing the constraint, causing unpredictable STRH instructions to be emitted.
The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed.
llvm-svn: 213729
insertions.
The old behavior could cause arbitrarily bad memory usage in the DAG
combiner if there was heavy traffic of adding nodes already on the
worklist to it. This commit switches the DAG combine worklist to work
the same way as the instcombine worklist where we null-out removed
entries and only add new entries to the worklist. My measurements of
codegen time shows slight improvement. The memory utilization is
unsurprisingly dominated by other factors (the IR and DAG itself
I suspect).
This change results in subtle, frustrating churn in the particular order
in which DAG combines are applied which causes a number of minor
regressions where we fail to match a pattern previously matched by
accident. AFAICT, all of these should be using AddToWorklist to directly
or should be written in a less brittle way. None of the changes seem
drastically bad, and a few of the changes seem distinctly better.
A major change required to make this work is to significantly harden the
way in which the DAG combiner handle nodes which become dead
(zero-uses). Previously, we relied on the ability to "priority-bump"
them on the combine worklist to achieve recursive deletion of these
nodes and ensure that the frontier of remaining live nodes all were
added to the worklist. Instead, I've introduced a routine to just
implement that precise logic with no indirection. It is a significantly
simpler operation than that of the combiner worklist proper. I suspect
this will also fix some other problems with the combiner.
I think the x86 changes are really minor and uninteresting, but the
avx512 change at least is hiding a "regression" (despite the test case
being just noise, not testing some performance invariant) that might be
looked into. Not sure if any of the others impact specific "important"
code paths, but they didn't look terribly interesting to me, or the
changes were really minor. The consensus in review is to fix any
regressions that show up after the fact here.
Thanks to the other reviewers for checking the output on other
architectures. There is a specific regression on ARM that Tim already
has a fix prepped to commit.
Differential Revision: http://reviews.llvm.org/D4616
llvm-svn: 213727
We should update the usages to all of the results;
otherwise, we might get assertion failure or SEGV during
the type legalization of ATOMIC_CMP_SWAP_WITH_SUCCESS
with two or more illegal types.
For example, in the following sequence, both i8 and i1
might be illegal in some target, e.g. armv5, mipsel, mips64el,
%0 = cmpxchg i8* %ptr, i8 %desire, i8 %new monotonic monotonic
%1 = extractvalue { i8, i1 } %0, 1
Since both i8 and i1 should be legalized, the corresponding
ATOMIC_CMP_SWAP_WITH_SUCCESS dag will be checked/replaced/updated
twice.
If we don't update the usage to *ALL* of the results in the
first round, the DAG for extractvalue might be processed earlier.
The GetPromotedInteger() will result in assertion failure,
because its operand (i.e. the success bit of cmpxchg) is not
promoted beforehand.
llvm-svn: 213569
When performing a dynamic stack adjustment without optimisations, we would mark
SP as def and R4 as kill. This occurred as part of the expansion of a
WIN__CHKSTK SDNode which indicated the proper handling of SP and R4. The result
would be that we would double define SP as part of an operation, which is
obviously incorrect.
Furthermore, the VTList for the chain had an incorrect parameter type of i32
instead of Other.
Correct these to permit proper lowering of __builtin_alloca at -O0.
llvm-svn: 213442
Actual support for softening f16 operations is still limited, and can be added
when it's needed. But Soften is much closer to being a useful thing to try
than keeping it Legal when no registers can actually hold such values.
Longer term, we probably want something between Soften and Promote semantics
for most targets, it'll be more efficient to promote the 4 basic operations to
f32 than libcall them.
llvm-svn: 213372
The post-indexed instructions were missing the constraint, causing unpredictable STR instructions to be emitted.
The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed.
This fixes PR20323.
llvm-svn: 213369
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.
During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.
Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.
llvm-svn: 213248
This fixes an issue where a local value is defined before and used after an
inline asm call with side effects.
This fix simply flushes the local value map, which updates the insertion point
for the inline asm call to be above any previously defined local values.
This fixes <rdar://problem/17694203>
llvm-svn: 213203
The coalescer is very aggressive at propagating constraints on the register classes, and the register allocator doesn’t know how to split sub-registers later to recover. This patch provides an escape valve for targets that encounter this problem to limit coalescing.
This patch also implements such for ARM to lower register pressure when using lots of large register classes. This works around PR18825.
llvm-svn: 213078
This crash was pretty common while compiling Rust for iOS (armv7). Reason -
SjLj preparation step was lowering aggregate arguments as ExtractValue +
InsertValue. ExtractValue has assertion which checks that there is some data in
value, which is not true in case of empty (no fields) structures. Rust uses
them quite extensively so this patch uses a 'select true, %val, undef'
instruction to lower the argument.
Patch by Valerii Hiora.
llvm-svn: 212922
This completes the handling for DLL import storage symbols when lowering
instructions. A DLL import storage symbol must have an additional load
performed prior to use. This is applicable to variables and functions.
This is particularly important for non-function symbols as it is possible to
handle function references by emitting a thunk which performs the translation
from the unprefixed __imp_ symbol to the proper symbol (although, this is a
non-optimal lowering). For a variable symbol, no such thunk can be
accommodated.
llvm-svn: 212431
separate MDNode so they can be uniqued via folding set magic. To conserve
space, DIVariable nodes are still variable-length, with the last two
fields being optional.
No functional change.
http://reviews.llvm.org/D3526
llvm-svn: 212050
Most of this is just tests that were silently succeeding in spite of
schema changes I made over a year ago. Cleaning them up as they lead to
failures in a change I'm working on/will come soon.
test/DebugInfo/2010-01-19-DbgScope.ll was removed as it tested miscoping
where a DebugLoc described a location not in the current function. The
test case doesn't describe why this is a valid situation and should be
supported, so I'm removing it and shortly going to commit changes that
make this firmly unsupported/assert-fail.
llvm-svn: 211628
ARM v7M has ldrex/strex but not ldrexd/strexd. This means 32-bit
operations should work as normal, but 64-bit ones are almost certainly
doomed.
Patch by Phoebe Buckheister.
llvm-svn: 211042
This also simplifies the IR we create slightly: instead of working out
where success & failure should go manually, it turns out we can just
always jump to a success/failure block created for the purpose. Later
phases will sort out the mess without much difficulty.
llvm-svn: 210917
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.
As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.
At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.
By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.
Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.
Summary for out of tree users:
------------------------------
+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.
llvm-svn: 210903
Windows on ARM uses COFF/PE which is intrinsically position independent. For
the case of 32-bit immediates, use a pair-wise relocation as otherwise we may
exceed the range of operators. This fixes a code generation crash when using
-Oz when targeting Windows on ARM.
llvm-svn: 210814
This commit is to improve global merge pass and support global symbol merge.
The global symbol merge is not enabled by default. For aarch64, we need some
more back-end fix to make it really benifit ADRP CSE.
llvm-svn: 210640
The armv7-windows-itanium environment is nearly identical to the MSVC ABI. It
has a few divergences, mostly revolving around the use of the Itanium ABI for
C++. VLA support is one of the extensions that are amongst the set of the
extensions.
This adds support for proper VLA emission for this environment. This is
somewhat similar to the handling for __chkstk emission on X86 and the large
stack frame emission for ARM. The invocation style for chkstk is still
controlled via the -mcmodel flag to clang.
Make an explicit note that this is an extension.
llvm-svn: 210489
COFF/PE, so the relocation model is never static. Loosen the assertion
accordingly. The relocation can still be emitted properly, as it will be
converted to an IMAGE_REL_ARM_ADDR32 which will be resolved by the loader
taking the base relocation into account. This is necessary to permit the
emission of long calls which can be controlled via the -mlong-calls option in
the driver.
llvm-svn: 210399
It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables.
This also adds backend support for generating the jump-instruction tables on ARM and X86.
Note that since the jumptable attribute creates a second function pointer for a
function, any function marked with jumptable must also be marked with unnamed_addr.
llvm-svn: 210280
This patch changes GlobalAlias to point to an arbitrary ConstantExpr and it is
up to MC (or the system assembler) to decide if that expression is valid or not.
This reduces our ability to diagnose invalid uses and how early we can spot
them, but it also lets us do things like
@test5 = alias inttoptr(i32 sub (i32 ptrtoint (i32* @test2 to i32),
i32 ptrtoint (i32* @bar to i32)) to i32*)
An important implication of this patch is that the notion of aliased global
doesn't exist any more. The alias has to encode the information needed to
access it in its metadata (linkage, visibility, type, etc).
Another consequence to notice is that getSection has to return a "const char *".
It could return a NullTerminatedStringRef if there was such a thing, but when
that was proposed the decision was to just uses "const char*" for that.
llvm-svn: 210062
Unordered is strictly weaker than monotonic, so if the latter doesn't have any
barriers then the former certainly shouldn't.
rdar://problem/16548260
llvm-svn: 209901
Darwin prologues save their GPRs in two stages: a narrow push of r0-r7 & lr,
followed by a wide push of the remaining registers if there are any. AAPCS uses
a single push.w instruction.
It turns out that, on average, enough registers get pushed that code is smaller
in the AAPCS prologue, which is a nice property for M-class programmers. They
also have other options available for back-traces, so can hopefully deal with
the fact that FP & LR aren't adjacent in memory.
rdar://problem/15909583
llvm-svn: 209895
The C and C++ semantics for compare_exchange require it to return a bool
indicating success. This gets mapped to LLVM IR which follows each cmpxchg with
an icmp of the value loaded against the desired value.
When lowered to ldxr/stxr loops, this extra comparison is redundant: its
results are implicit in the control-flow of the function.
This commit makes two changes: it replaces that icmp with appropriate PHI
nodes, and then makes sure earlyCSE is called after expansion to actually make
use of the opportunities revealed.
I've also added -{arm,aarch64}-enable-atomic-tidy options, so that
existing fragile tests aren't perturbed too much by the change. Many
of them either rely on undef/unreachable too pervasively to be
restored to something well-defined (particularly while making sure
they test the same obscure assert from many years ago), or depend on a
particular CFG shape, which is disrupted by SimplifyCFG.
rdar://problem/16227836
llvm-svn: 209883
Cortex-M4 only has single-precision floating point support, so any LLVM
"double" type will have been split into 2 i32s by now. Fortunately, the
consecutive-register framework turns out to be precisely what's needed to
reconstruct the double and follow AAPCS-VFP correctly!
rdar://problem/17012966
llvm-svn: 209650
This intrinsic permits the emission of platform specific undefined sequences.
ARM has reserved the 0xde opcode which takes a single integer parameter (ignored
by the CPU). This permits the operating system to implement custom behaviour on
this trap. The llvm.arm.undefined intrinsic is meant to provide a means for
generating the target specific behaviour from the frontend. This is
particularly useful for Windows on ARM which has made use of a series of these
special opcodes.
llvm-svn: 209390
Although the previous code would construct a bundle and add the correct elements
to it, it would not finalise the bundle. This resulted in the InternalRead
markers not being added to the MachineOperands nor, more importantly, the
externally visible defs to the bundle itself. So, although the bundle was not
exposing the def, the generated code would be correct because there was no
optimisations being performed. When optimisations were enabled, the post
register allocator would kick in, and the hazard recognizer would reorder
operations around the load which would define the value being operated upon.
Rather than manually constructing the bundle, simply construct and finalise the
bundle via the finaliseBundle call after both MIs have been emitted. This
improves the code generation with optimisations where IMAGE_REL_ARM_MOV32T
relocations are emitted.
The changes to the other tests are the result of the bundle generation
preventing the scheduler from hoisting the moves across the loads. The net
effect of the generated code is equivalent, but, is much more identical to what
is actually being lowered.
llvm-svn: 209267
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.
llvm-svn: 209123
Windows on ARM uses R11 for the frame pointer even though the environment is a
pure Thumb-2, thumb-only environment. Replicate this behaviour to improve
Windows ABI compatibility. This register is used for fast stack walking, and
thus is part of the Windows ABI.
llvm-svn: 209085
WoA uses COFF, not ELF. ARMISelLowering::createTLOF would previously return ELF
for any non-MachO platform. This was a missed site when the original change for
target format support for Windows on ARM was done.
llvm-svn: 209057
This patch changes the design of GlobalAlias so that it doesn't take a
ConstantExpr anymore. It now points directly to a GlobalObject, but its type is
independent of the aliasee type.
To avoid changing all alias related tests in this patches, I kept the common
syntax
@foo = alias i32* @bar
to mean the same as now. The cases that used to use cast now use the more
general syntax
@foo = alias i16, i32* @bar.
Note that GlobalAlias now behaves a bit more like GlobalVariable. We
know that its type is always a pointer, so we omit the '*'.
For the bitcode, a nice surprise is that we were writing both identical types
already, so the format change is minimal. Auto upgrade is handled by looking
through the casts and no new fields are needed for now. New bitcode will
simply have different types for Alias and Aliasee.
One last interesting point in the patch is that replaceAllUsesWith becomes
smart enough to avoid putting a ConstantExpr in the aliasee. This seems better
than checking and updating every caller.
A followup patch will delete getAliasedGlobal now that it is redundant. Another
patch will add support for an explicit offset.
llvm-svn: 209007
DIBuilder maintains this invariant and the current DwarfDebug code could
end up doing weird things if it contained declarations (such as putting
the definition DIE inside a CU that contained the declaration - this
doesn't seem like a good idea, so rather than adding logic to handle
this case we'll just ban in for now & cross that bridge if we come to
it later).
llvm-svn: 209004
This reverts commit r208934.
The patch depends on aliases to GEPs with non zero offsets. That is not
supported and fairly broken.
The good news is that GlobalAlias is being redesigned and will have support
for offsets, so this patch should be a nice match for it.
llvm-svn: 208978
Add some Windows on ARM specific library calls. These are provided by msvcrt,
and can be used to perform integer to floating-point conversions (and
vice-versa) mirroring similar functions in the RTABI.
llvm-svn: 208949
This commit implements two command line switches -global-merge-on-external
and -global-merge-aligned, and both of them are false by default, so this
optimization is disabled by default for all targets.
For ARM64, some back-end behaviors need to be tuned to get this optimization
further enabled.
llvm-svn: 208934
If the function has the landingpad instruction, then the
handlerdata should be emitted even if the function has
nouwnind attribute. Otherwise, following code will not
work:
void test1() noexcept {
try {
throw_exception();
} catch (...) {
log_unexpected_exception();
}
}
Since the cantunwind was incorrectly emitted and the
LSDA is not available.
llvm-svn: 208791
The commit r208166 will cause some regression on ARM EHABI.
This fix has been committed in r208715, and an assertion failure
test case has been committed in r208770.
This commit further extends the unittest so that the actual
value in the handlerdata will be checked.
llvm-svn: 208790
This commit was already commited as revision rL208689 and discussd in
phabricator revision D3704.
But the test file was crashing on OS X and windows.
I fixed the test file in the same way as in rL208340.
llvm-svn: 208711
The current patterns for REV16 misses mostn __builtin_bswap16() due to
legalization promoting the operands to from load/stores toi32s and then
truncing/extending them. This patch adds new patterns that catch the resultant
DAGs and codegens them to rev16 instructions. Tests included.
rdar://15353652
llvm-svn: 208620
This patch adds support to ARM for custom lowering of the
llvm.{u|s}add.with.overflow.i32 intrinsics for i32/i64. This is particularly useful
for handling idiomatic saturating math functions as generated by
InstCombineCompare.
Test cases included.
rdar://14853450
llvm-svn: 208435
When using the ARM AAPCS, HFAs (Homogeneous Floating-point Aggregates) must
be passed in a block of consecutive floating-point registers, or on the stack.
This means that unused floating-point registers cannot be back-filled with
part of an HFA, however this can currently happen. This patch, along with the
corresponding clang patch (http://reviews.llvm.org/D3083) prevents this.
llvm-svn: 208413
Handle lowering of global addresses for PIC mode compilation on Windows. Always
use the movw/movt load to load the address as Windows on ARM requires ARMv7+ and
is a pure Thumb environment.
llvm-svn: 208385
When building on Windows, the default target is Windows. Windows on ARM does
not support ARM mode compilation, resulting in test failures. Simply specify a
triple to ensure that we are testing the correct behaviour.
llvm-svn: 208340
The ARM::BLX instruction is an ARM mode instruction. The Windows on ARM target
is limited to Thumb instructions. Correctly use the thumb mode tBLXr
instruction. This would manifest as an errant write into the object file as the
instruction is 4-bytes in length rather than 2. The result would be a corrupted
object file that would eventually result in an executable that would crash at
runtime.
llvm-svn: 208152
remove it from the list of unspilled registers. Otherwise the following
attempt to keep the stack aligned by picking an extra GPR register to
spill will not work as it picks up r11.
llvm-svn: 208129
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
llvm-svn: 208104
Windows on ARM does not conform to AEABI. However, memset would be emitted
using the AEABI signature, resulting in inverted parameters. Handle this
special case appropriately.
llvm-svn: 207943
This introduces the stack lowering emission of the stack probe function for
Windows on ARM. The stack on Windows on ARM is a dynamically paged stack where
any page allocation which crosses a page boundary of the following guard page
will cause a page fault. This page fault must be handled by the kernel to
ensure that the page is faulted in. If this does not occur and a write access
any memory beyond that, the page fault will go unserviced, resulting in an
abnormal program termination.
The watermark for the stack probe appears to be at 4080 bytes (for
accommodating the stack guard canaries and stack alignment) when SSP is
enabled. Otherwise, the stack probe is emitted on the page size boundary of
4096 bytes.
llvm-svn: 207615
IMAGE_REL_ARM_MOV32T relocations require that the movw/movt pair-wise
relocation is not split up and reordered. When expanding the mov32imm
pseudo-instruction, create a bundle if the machine operand is referencing an
address. This helps ensure that the relocatable address load is not reordered
by subsequent passes.
Unfortunately, this only partially handles the case as the Constant Island Pass
occurs after the instructions are unbundled and does not properly handle
bundles. That is a more fundamental issue with the pass itself and beyond the
scope of this change.
llvm-svn: 207608
This intrinsic is no longer needed with the new @llvm.arm.hint(i32) intrinsic
which provides a generic, extensible manner for adding hint instructions. This
functionality can now be represented as @llvm.arm.hint(i32 5).
llvm-svn: 207246
Introduce the llvm.arm.hint(i32) intrinsic that can be used to inject hints into
the instruction stream. This is particularly useful for generating IR from a
compiler where the user may inject an intrinsic (e.g. __yield). These are then
pattern substituted into the correct instruction which already existed.
llvm-svn: 207242
The point of these calls is to allow Thumb-1 code to make use of the VFP unit
to perform its operations. This is not desirable with -msoft-float, since most
of the reasons you'd want that apply equally to the runtime library.
rdar://problem/13766161
llvm-svn: 206874
handles Intrinsic::trap if TargetOptions::TrapFuncName is set.
This fixes a bug in which the trap function was not taken into consideration
when a program was compiled without optimization (at -O0).
<rdar://problem/16291933>
llvm-svn: 206323
Previously, BranchProbabilityInfo::calcLoopBranchHeuristics would determine the weights of basic blocks inside loops even when it didn't have enough information to estimate the branch probabilities correctly. This patch fixes the function to exit early if it doesn't see any exit edges or back edges and let the later heuristics determine the weights.
This fixes PR18705 and <rdar://problem/15991090>.
Differential Revision: http://reviews.llvm.org/D3363
llvm-svn: 206194
The current memory-instruction optimization logic in CGP, which sinks parts of
the address computation that can be adsorbed by the addressing mode, does this
by explicitly converting the relevant part of the address computation into
IR-level integer operations (making use of ptrtoint and inttoptr). For most
targets this is currently not a problem, but for targets wishing to make use of
IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a
problem for two reasons:
1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr
2. In cases where type-punning was used, and BasicAA was used
to override TBAA, BasicAA may no longer do so. (this had forced us to disable
all use of TBAA in CodeGen; something which we can now enable again)
This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by
default (except for those targets that use AA during CodeGen), and so aside
from some PowerPC subtargets and SystemZ, there should be no change in
behavior. We may be able to switch completely away from the ptrtoint/inttoptr
sinking on all targets, but further testing is required.
I've doubled-up on a number of existing tests that are sensitive to the
address sinking behavior (including some store-merging tests that are
sensitive to the order of the resulting ADD operations at the SDAG level).
llvm-svn: 206092
This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.
Patch by Luqman Aden and Alex Crichton!
llvm-svn: 205997
More updating of tests to be explicit about the target triple rather than
relying on the default target triple supporting ARM mode.
Indicate to lit that object emission is not yet available for Windows on ARM.
llvm-svn: 205545
This changes the tests that were targeting ARM EABI to explicitly specify the
environment rather than relying on the default. This breaks with the new
Windows on ARM support when running the tests on Windows where the default
environment is no longer EABI.
Take the opportunity to avoid a pointless redirect (helps when trying to debug
with providing a command line invocation which can be copy and pasted) and
removing a few greps in favour of FileCheck.
llvm-svn: 205541
Implementing this via ComputeMaskedBits has two advantages:
+ It actually works. DAGISel doesn't deal with the chains properly
in the previous pattern-based solution, so they never trigger.
+ The information can be used in other DAG combines, as well as the
trivial "get rid of truncs". For example if the trunc is in a
different basic block.
rdar://problem/16227836
llvm-svn: 205540
The terminal barrier of a cmpxchg expansion will be either Acquire or
SequentiallyConsistent. In either case it can be skipped if the
operation has Monotonic requirements on failure.
rdar://problem/15996804
llvm-svn: 205535
The previous situation where ATOMIC_LOAD_WHATEVER nodes were expanded
at MachineInstr emission time had grown to be extremely large and
involved, to account for the subtly different code needed for the
various flavours (8/16/32/64 bit, cmpxchg/add/minmax).
Moving this transformation into the IR clears up the code
substantially, and makes future optimisations much easier:
1. an atomicrmw followed by using the *new* value can be more
efficient. As an IR pass, simple CSE could handle this
efficiently.
2. Making use of cmpxchg success/failure orderings only has to be done
in one (simpler) place.
3. The common "cmpxchg; did we store?" idiom can be exposed to
optimisation.
I intend to gradually improve this situation within the ARM backend
and make sure there are no hidden issues before moving the code out
into CodeGen to be shared with (at least ARM64/AArch64, though I think
PPC & Mips could benefit too).
llvm-svn: 205525
add operation since extract_vector_elt can perform an extend operation. Get the input lane
type from the vector on which we're performing the vpaddl operation on and extend or
truncate it to the output type of the original add node.
llvm-svn: 205523
This changes the tests that were targeting ARM EABI to explicitly specify the
environment rather than relying on the default. This breaks with the new
Windows on ARM support when running the tests on Windows where the default
environment is no longer EABI.
Take the opportunity to avoid a pointless redirect (helps when trying to debug
with providing a command line invocation which can be copy and pasted) and
removing a few greps in favour of FileCheck.
llvm-svn: 205465
ARM specific optimiztion, finding places in ARM machine code where 2 dmbs
follow one another, and eliminating one of them.
Patch by Reinoud Elhorst.
llvm-svn: 205409
The Cyclone CPU is similar to swift for most LLVM purposes, but does have two
preferred instructions for zeroing a VFP register. This teaches LLVM about
them.
llvm-svn: 205309
This moves one case of raw text checking down into the MCStreamer
interfaces in the form of a virtual function, even if we ultimately end
up consolidating on the one-or-many line tables issue one day, this is
nicer in the interim. This just generally streamlines a bunch of use
cases into a common code path.
llvm-svn: 205287
We've already got versions without the barriers, so this just adds IR-level
support for generating the new v8 ones.
rdar://problem/16227836
llvm-svn: 204813
Implementing the LLVM part of the call to __builtin___clear_cache
which translates into an intrinsic @llvm.clear_cache and is lowered
by each target, either to a call to __clear_cache or nothing at all
incase the caches are unified.
Updating LangRef and adding some tests for the implemented architectures.
Other archs will have to implement the method in case this builtin
has to be compiled for it, since the default behaviour is to bail
unimplemented.
A Clang patch is required for the builtin to be lowered into the
llvm intrinsic. This will be done next.
llvm-svn: 204802
Sicne MBB->computeRegisterLivenes() returns Dead for sub regs like s0,
d0 is used in vpop instead of updating sp, which causes s0 dead before
its use.
This patch checks the liveness of each subreg to make sure the reg is
actually dead.
llvm-svn: 204411
These linkages were introduced some time ago, but it was never very
clear what exactly their semantics were or what they should be used
for. Some investigation found these uses:
* utf-16 strings in clang.
* non-unnamed_addr strings produced by the sanitizers.
It turns out they were just working around a more fundamental problem.
For some sections a MachO linker needs a symbol in order to split the
section into atoms, and llvm had no idea that was the case. I fixed
that in r201700 and it is now safe to use the private linkage. When
the object ends up in a section that requires symbols, llvm will use a
'l' prefix instead of a 'L' prefix and things just work.
With that, these linkages were already dead, but there was a potential
future user in the objc metadata information. I am still looking at
CGObjcMac.cpp, but at this point I am convinced that linker_private
and linker_private_weak are not what they need.
The objc uses are currently split in
* Regular symbols (no '\01' prefix). LLVM already directly provides
whatever semantics they need.
* Uses of a private name (start with "\01L" or "\01l") and private
linkage. We can drop the "\01L" and "\01l" prefixes as soon as llvm
agrees with clang on L being ok or not for a given section. I have two
patches in code review for this.
* Uses of private name and weak linkage.
The last case is the one that one could think would fit one of these
linkages. That is not the case. The semantics are
* the linker will merge these symbol by *name*.
* the linker will hide them in the final DSO.
Given that the merging is done by name, any of the private (or
internal) linkages would be a bad match. They allow llvm to rename the
symbols, and that is really not what we want. From the llvm point of
view, these objects should really be (linkonce|weak)(_odr)?.
For now, just keeping the "\01l" prefix is probably the best for these
symbols. If we one day want to have a more direct support in llvm,
IMHO what we should add is not a linkage, it is just a hidden_symbol
attribute. It would be applicable to multiple linkages. For example,
on weak it would produce the current behavior we have for objc
metadata. On internal, it would be equivalent to private (and we
should then remove private).
llvm-svn: 203866
This option enables LowerInvoke's obsolete SJLJ EH support, but the
target used in this test (ARM Darwin) no longer uses the LowerInvoke
pass, so the option has no effect here. This target currently uses
the newer SjLjEHPrepare pass instead.
This cleanup will help with removing "-enable-correct-eh-support".
Differential Revision: http://llvm-reviews.chandlerc.com/D3064
llvm-svn: 203810
This is a follow-up to r203635. Saleem pointed out that since symbolic register
names are much easier to read, it would be good if we could turn them off only
when we really need to because we're using an external assembler.
Differential Revision: http://llvm-reviews.chandlerc.com/D3056
llvm-svn: 203806
When the list of VFP registers to be saved was non-contiguous (so multiple
vpush/vpop instructions were needed) these were being ordered oddly, as in:
vpush {d8, d9}
vpush {d11}
This led to the layout in memory being [d11, d8, d9] which is ugly and doesn't
match the CFI_INSTRUCTIONs we're generating either (so Dwarf info would be
broken).
This switches the order of vpush/vpop (in both prologue and epilogue,
obviously) so that the Dwarf locations are correct again.
rdar://problem/16264856
llvm-svn: 203655
It seems gas can't handle CFI directives with VFP register names ("d12", etc.).
This broke us trying to build Chromium for Android after 201423.
A gas bug has been filed: https://sourceware.org/bugzilla/show_bug.cgi?id=16694
compnerd suggested making this conditional on whether we're using the integrated
assembler or not. I'll look into that in a follow-up patch.
Differential Revision: http://llvm-reviews.chandlerc.com/D3049
llvm-svn: 203635
Use the options in the ARMISelLowering to control whether tail calls are
optimised or not. Previously, this option was entirely ignored on the ARM
target and only honoured on x86.
This option is mostly useful in profiling scenarios. The default remains that
tail call optimisations will be applied.
llvm-svn: 203577
This option is from 2010, designed to work around a linker issue on Darwin for
ARM. According to grosbach this is no longer an issue and this option can
safely be removed.
llvm-svn: 203576
Tail call optimisation was previously disabled on all targets other than
iOS5.0+. This enables the tail call optimisation on all Thumb 2 capable
platforms.
The test adjustments are to remove the IR hint "tail" to function invocation.
The tests were designed assuming that tail call optimisations would not kick in
which no longer holds true.
llvm-svn: 203575
The syntax for "cmpxchg" should now look something like:
cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).
rdar://problem/15996804
llvm-svn: 203559
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.
The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.
The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.
I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.
The net result is that we don't create temporary labels that are never used.
llvm-svn: 203204
scan the register file for sub- and super-registers.
No functionality change intended.
(Tests are updated because the comments in the assembler output are
different.)
llvm-svn: 202416
Summary:
Fixes an issue where a test attempts to use -mcpu=cortex-a15 on non-ARM targets.
This triggers an assertion on MIPS since it doesn't know what ABI to use by default for
unrecognized processors.
Reviewers: rengolin
Reviewed By: rengolin
CC: llvm-commits, aemerson, rengolin
Differential Revision: http://llvm-reviews.chandlerc.com/D2876
llvm-svn: 202256
The function with uwtable attribute might be visited by the
stack unwinder, thus the link register should be considered
as clobbered after the execution of the branch and link
instruction (i.e. the definition of the machine instruction
can't be ignored) even when the callee function are marked
with noreturn.
llvm-svn: 202165
NaCl's ARM ABI uses 16 byte stack alignment, so set that in
ARMSubtarget.cpp.
Using 16 byte alignment exposes an issue in code generation in which a
varargs function leaves a 4 byte gap between the values of r1-r3 saved
to the stack and the following arguments that were passed on the
stack. (Previously, this code only needed to support 4 byte and 8
byte alignment.)
With this issue, llc generated:
varargs_func:
sub sp, sp, #16
push {lr}
sub sp, sp, #12
add r0, sp, #16 // Should be 20
stm r0, {r1, r2, r3}
ldr r0, .LCPI0_0 // Address of va_list
add r1, sp, #16
str r1, [r0]
bl external_func
Fix the bug by checking for "Align > 4". Also simplify the code by
using OffsetToAlignment(), and update comments.
Differential Revision: http://llvm-reviews.chandlerc.com/D2677
llvm-svn: 201497
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.
Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
(fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
(should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
(should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
to be enabled regardless of default setting or -no-integrated-as.
(should fix SystemZ buildbots)
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
llvm-svn: 201333
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
llvm-svn: 201237
* CPRCs may be allocated to co-processor registers or the stack – they may never be allocated to core registers
* When a CPRC is allocated to the stack, all other VFP registers should be marked as unavailable
The difference is only noticeable in rare cases where there are a large number of floating point arguments (e.g.
7 doubles + additional float, double arguments). Although it's probably still better to avoid vmov as it can cause
stalls in some older ARM cores. The other, more subtle benefit, is to minimize difference between the various
calling conventions.
rdar://16039676
llvm-svn: 201193
Similarly to the vshrn instructions, these are simple zext/sext + trunc
operations. Using normal LLVM IR should allow for better code, and more sharing
with the AArch64 backend.
llvm-svn: 201093
For A- and R-class processors, r12 is not normally callee-saved, but is for
interrupt handlers. See AAPCS, 5.3.1.1, "Use of IP by the linker".
llvm-svn: 201089
vshrn is just the combination of a right shift and a truncate (and the limits
on the immediate value actually mean the signedness of the shift doesn't
matter). Using that representation allows us to get rid of an ARM-specific
intrinsic, share more code with AArch64 and hopefully get better code out of
the mid-end optimisers.
llvm-svn: 201085
According to the AAPCS, when a CPRC is allocated to the stack, all other
VFP registers should be marked as unavailable.
I have also modified the rules for allocating non-CPRCs to the stack, to make
it more explicit that all GPRs must be made unavailable. I cannot think of a
case where the old version would produce incorrect answers, so there is no test
for this.
llvm-svn: 200970
This patch fixes the ldr-pseudo implementation to work when used in
inline assembly. The fix is to move arm assembler constant pools
from the ARMAsmParser class to the ARMTargetStreamer class.
Previously we kept the assembler generated constant pools in the
ARMAsmParser object. This does not work for inline assembly because
a new parser object is created for each blob of inline assembly.
This patch moves the constant pools to the ARMTargetStreamer class
so that the constant pool will remain alive for the entire code
generation process.
An ARMTargetStreamer class is now required for the arm backend.
There was no existing implementation for MachO, only Asm and ELF.
Instead of creating an empty MachO subclass, we decided to make the
ARMTargetStreamer a non-abstract class and provide default
(llvm_unreachable) implementations for the non constant-pool related
methods.
Differential Revision: http://llvm-reviews.chandlerc.com/D2638
llvm-svn: 200777
There was an extremely confusing proliferation of LLVM intrinsics to implement
the vacge & vacgt instructions. This combines them all into two polymorphic
intrinsics, shared across both backends.
llvm-svn: 200768
A bunch of test cases needed to be cleaned up for this, many my fault -
when implementid imported modules I updated test cases by simply
duplicating the prior metadata field - which wasn't always the empty
metadata entry.
llvm-svn: 200731
Some of the SHA instructions take a scalar i32 as one argument (largely because
they work on 160-bit hash fragments). This wasn't reflected in the IR
previously, with ARM and AArch64 choosing different types (<4 x i32> and <1 x
i32> respectively) which was ugly.
This makes all the affected intrinsics take a uniform "i32", allowing them to
become non-polymorphic at the same time.
llvm-svn: 200706
This changes the PrologueEpilogInserter and LocalStackSlotAllocation passes to
follow the extended stack layout rules for sspstrong and sspreq.
The sspstrong layout rules are:
1. Large arrays and structures containing large arrays (>= ssp-buffer-size)
are closest to the stack protector.
2. Small arrays and structures containing small arrays (< ssp-buffer-size) are
2nd closest to the protector.
3. Variables that have had their address taken are 3rd closest to the
protector.
Differential Revision: http://llvm-reviews.chandlerc.com/D2546
llvm-svn: 200601
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.
The previous attempt at r200431 was reverted at r200434 because of
two testing case failures. I modified my patch a little, but forgot
to re-run "make check-all".
Testing case CodeGen/ARM/lsr-unfolded-offset.ll is updated because of
the patch's impact on branch probability which causes changes in
spill placement.
llvm-svn: 200502
This commit only handles IfConvertTriangle. To update edge weights
of a successor, one interface is added to MachineBasicBlock:
/// Set successor weight of a given iterator.
setSuccWeight(succ_iterator I, uint32_t weight)
An existing testing case test/CodeGen/Thumb2/v8_IT_5.ll is updated,
since we now correctly update the edge weights, the cold block
is placed at the end of the function and we jump to the cold block.
llvm-svn: 200428
After all hard work to implement the EHABI and with the test-suite
passing, it's time to turn it on by default and allow users to
disable it as a work-around while we fix the eventual bugs that show
up.
This commit also remove the -arm-enable-ehabi-descriptors, since we
want the tables to be printed every time the EHABI is turned on
for non-Darwin ARM targets.
Although MCJIT EHABI is not working yet (needs linking with the right
libraries), this commit also fixes some relocations on MCJIT regarding
the EH tables/lib calls, and update some tests to avoid using EH tables
when none are needed.
The EH tests in the test-suite that were previously disabled on ARM
now pass with these changes, so a follow-up commit on the test-suite
will re-enable them.
llvm-svn: 200388
Summary:
This commit gives an address mode to the PLD instruction. We
were getting an assertion failure in the frame lowering code
because we had code that was doing a pld of a stack allocated
address. The frame lowering was checking the address mode and
then asserting because pld had none defined.
This commit fixes pld for arm mode. There was a previous fix for
thumb mode in a separate commit. The commit for thumb mode
added a test in a separate file because it would otherwise fail
for arm. This commit moves the thumb test back into the prefetch.ll
file and adds the corresponding arm test.
Differential Revision: http://llvm-reviews.chandlerc.com/D2622
llvm-svn: 200248
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.
We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.
llvm-svn: 200058
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.
llvm-svn: 200034
This pass identifies expensive constants to hoist and coalesces them to
better prepare it for SelectionDAG-based code generation. This works around the
limitations of the basic-block-at-a-time approach.
First it scans all instructions for integer constants and calculates its
cost. If the constant can be folded into the instruction (the cost is
TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
consider it expensive and leave it alone. This is the default behavior and
the default implementation of getIntImmCost will always return TCC_Free.
If the cost is more than TCC_BASIC, then the integer constant can't be folded
into the instruction and it might be beneficial to hoist the constant.
Similar constants are coalesced to reduce register pressure and
materialization code.
When a constant is hoisted, it is also hidden behind a bitcast to force it to
be live-out of the basic block. Otherwise the constant would be just
duplicated and each basic block would have its own copy in the SelectionDAG.
The SelectionDAG recognizes such constants as opaque and doesn't perform
certain transformations on them, which would create a new expensive constant.
This optimization is only applied to integer constants in instructions and
simple (this means not nested) constant cast experessions. For example:
%0 = load i64* inttoptr (i64 big_constant to i64*)
Reviewed by Eric
llvm-svn: 200022
Originally, BLX was passed as operand #0 in MachineInstr and as operand
#2 in MCInst. But now, it's operand #2 in both cases.
This patch also removes unnecessary FileCheck in the test case added by r199127.
llvm-svn: 199928
With constant-sharing, litpool loads consume 4 + N*2 bytes of code, but
movw/movt pairs consume 8*N. This means litpools are better than movw/movt even
with just one use. Other materialisation strategies can still be better though,
so the logic is a little odd.
llvm-svn: 199891
This patch restores the ARM mode if the user's inline assembly
does not. In the object streamer, it ensures that instructions
following the inline assembly are encoded correctly and that
correct mapping symbols are emitted. For the asm streamer, it
emits a .arm or .thumb directive.
This patch does not ensure that the inline assembly contains
the ADR instruction to switch modes at runtime.
The problem we need to solve is code like this:
int foo(int a, int b) {
int r = a + b;
asm volatile(
".align 2 \n"
".arm \n"
"add r0,r0,r0 \n"
: : "r"(r));
return r+1;
}
If we compile this function in thumb mode then the inline assembly
will switch to arm mode. We need to make sure that we switch back to
thumb mode after emitting the inline assembly or we will incorrectly
encode the instructions that follow (i.e. the assembly instructions
for return r+1).
Based on patch by David Peixotto
Change-Id: Ib57f6d2d78a22afad5de8693fba6230ff56ba48b
llvm-svn: 199818
When expanding neon pseudo stores, it may miss the implicit uses of sub
regs, which may cause post RA scheduler reorder instructions that
breakes anti dependency.
For example:
VST1d64QPseudo %R0<kill>, 16, %Q9_Q10, pred:14, pred:%noreg
will be expanded to
VST1d64Q %R0<kill>, 16, %D18, pred:14, pred:%noreg;
An instruction that defines %D20 may be scheduled before the store by
mistake.
This patches adds implicit uses for such case. For the example above, it
emits:
VST1d64Q %R0<kill>, 8, %D18, pred:14, pred:%noreg, %Q9_Q10<imp-use>
llvm-svn: 199282
The changes caused by folding an sp-adjustment into a "pop" previously
disrupted the forward search for the final real instruction in a
terminating block. This switches to a backward search (skipping debug
instrs).
This fixes PR18399.
Patch by Zhaoshi.
llvm-svn: 199266
Parse tag names as well as expressions. The former is part of the
specification, the latter is for improved compatibility with the GNU assembler.
Fix attribute value handling to be comformant to the specification.
llvm-svn: 198662
The ARM backend has been using most of the MachO related subtarget
checks almost interchangeably, and since the only target it's had to
run on has been IOS (which is all three of MachO, Darwin and IOS) it's
worked out OK so far.
But we'd like to support embedded targets under the "*-*-none-macho"
triple, which means everything starts falling apart and inconsistent
behaviours emerge.
This patch should pick a reasonably sensible set of behaviours for the
new triple (and any others that come along, with luck). Some choices
were debatable (notably FP == r7 or r11), but we can revisit those
later when deficiencies become apparent.
llvm-svn: 198617
Longer term, we want to move users to "*-*-*-macho" for embedded work, but for
now people are relying on the last thing we told them, which is unfortunately
"*-*-darwin-eabi".
rdar://problem/15703934
llvm-svn: 198602
Schedule more conservatively to account for stalls on floating point
resources and latency. Use the AGU resource to model latency stalls
since it's shared between FP and LD/ST instructions. This might not be
completely accurate but should work well in practice.
llvm-svn: 198125
This changes the MachineFrameInfo API to use the new SSPLayoutKind information
produced by the StackProtector pass (instead of a boolean flag) and updates a
few pass dependencies (to preserve the SSP analysis).
The stack layout follows the same approach used prior to this change - i.e.,
only LargeArray stack objects will be placed near the canary and everything
else will be laid out normally. After this change, structures containing large
arrays will also be placed near the canary - a case previously missed by the
old implementation.
Out of tree targets will need to update their usage of
MachineFrameInfo::CreateStackObject to remove the MayNeedSP argument.
The next patch will implement the rules for sspstrong and sspreq. The end goal
is to support ssp-strong stack layout rules.
WIP.
Differential Revision: http://llvm-reviews.chandlerc.com/D2158
llvm-svn: 197653
Given vsel_cc, op1, op2, since vsel has no LE/LT, to generate vsel for
such selection, it needs to inverse cc and swap op1 and op2. To inverse
cc, both L/G and E bits should be flipped.
llvm-svn: 197615
Clang sets the float-abi target option manually, but no longer
annotates each function with its ABI. This can lead to confusing
mistmatch between "clang -emit-llvm | llc" and normal clang
invocations.
Besides which, gnueabihf actually *is* hard-float. Defaulting to soft
was just perverse.
llvm-svn: 197554
This reapplies r197438 and fixes the link-time circular dependency between
IR and Support. The fix consists in moving the diagnostic support into IR.
The patch adds a new LLVMContext::diagnose that can be used to communicate to
the front-end, if any, that something of interest happened.
The diagnostics are supported by a new abstraction, the DiagnosticInfo class.
The base class contains the following information:
- The kind of the report: What this is about.
- The severity of the report: How bad this is.
This patch also adds 2 classes:
- DiagnosticInfoInlineAsm: For inline asm reporting. Basically, this diagnostic
will be used to switch to the new diagnostic API for LLVMContext::emitError.
- DiagnosticStackSize: For stack size reporting. Comes as a replacement of the
hard coded warning in PEI.
This patch also features dynamic diagnostic identifiers. In other words plugins
can use this infrastructure for their own diagnostics (for more details, see
getNextAvailablePluginDiagnosticKind).
This patch introduces a new DiagnosticHandlerTy and a new DiagnosticContext in
the LLVMContext that should be set by the front-end to be able to map these
diagnostics in its own system.
http://llvm-reviews.chandlerc.com/D2376
<rdar://problem/15515174>
llvm-svn: 197508
The patch adds a new LLVMContext::diagnose that can be used to communicate to
the front-end, if any, that something of interest happened.
The diagnostics are supported by a new abstraction, the DiagnosticInfo class.
The base class contains the following information:
- The kind of the report: What this is about.
- The severity of the report: How bad this is.
This patch also adds 2 classes:
- DiagnosticInfoInlineAsm: For inline asm reporting. Basically, this diagnostic
will be used to switch to the new diagnostic API for LLVMContext::emitError.
- DiagnosticStackSize: For stack size reporting. Comes as a replacement of the
hard coded warning in PEI.
This patch also features dynamic diagnostic identifiers. In other words plugins
can use this infrastructure for their own diagnostics (for more details, see
getNextAvailablePluginDiagnosticKind).
This patch introduces a new DiagnosticHandlerTy and a new DiagnosticContext in
the LLVMContext that should be set by the front-end to be able to map these
diagnostics in its own system.
http://llvm-reviews.chandlerc.com/D2376
<rdar://problem/15515174>
llvm-svn: 197438
The tests were no longer using fast-isel at all (MachO needs an "ios" rather
than "darwin" triple at the moment and Linux needs ARM mode). Once that was
corrected, the verifier complained about a t2ADDri created for the alloca.
llvm-svn: 197046
When trying to eliminate an "sub sp, sp, #N" instruction by folding
it into an existing push/pop using dummy registers, we need to account
for the fact that this might affect precisely how "fp" gets set in the
prologue.
We were attempting this, but assuming that *whenever* we performed a
fold it would make a difference. This is false, for example, in:
push {r4, r7, lr}
add fp, sp, #4
vpush {d8}
sub sp, sp, #8
we can fold the "sub" into the "vpush", forming "vpush {d7, d8}".
However, in that case the "add fp" instruction mustn't change, which
we were getting wrong before.
Should fix PR18160.
llvm-svn: 196725
The current peephole optimizing for compare inst assumes an instr that
uses CPSR has an MO for ARM Cond code.However, for VSEL instructions
(vseqeq, vselgt, vselgt, vselvs), there is no such operand nor do
they support the modification of Cond Code.
llvm-svn: 196588
We were trying to fold the stack adjustment into the wrong instruction in the
situation where the entire basic-block was epilogue code. Really, it can only
ever be valid to do the folding precisely where the "add sp, ..." would be
placed so there's no need for a separate iterator to track that.
Should fix PR18136.
llvm-svn: 196493
ARM symbol variants are written with parens instead of @ like this:
.word __GLOBAL_I_a(target1)
This commit adds support for parsing these symbol variants in
expressions. We introduce a new flag to MCAsmInfo that indicates the
parser should use parens to parse the symbol variant. The expression
parser is modified to look for symbol variants using parens instead
of @ when the corresponding MCAsmInfo flag is true.
The MCAsmInfo parens flag is enabled only for ARM on ELF.
By adding this flag to MCAsmInfo, we are able to get rid of
redundant ARM-specific symbol variants and use the generic variants
instead (e.g. VK_GOT instead of VK_ARM_GOT). We use the new
UseParensForSymbolVariant attribute in MCAsmInfo to correctly print
the symbol variants for arm.
To achive this we need to keep a handle to the MCAsmInfo in the
MCSymbolRefExpr class that we can check when printing the symbol
variant.
Updated Tests:
Changed case of symbol variant to match the generic kind.
test/CodeGen/ARM/tls-models.ll
test/CodeGen/ARM/tls1.ll
test/CodeGen/ARM/tls2.ll
test/CodeGen/Thumb2/tls1.ll
test/CodeGen/Thumb2/tls2.ll
PR18080
llvm-svn: 196424
These are used by MachO only at the moment, and (much like the existing
MOVW/MOVT set) work around the fact that the labels used in the actual
instructions often contain PC-dependent components, which means that repeatedly
materialising the same global can't be CSEed.
With small modifications, it could be adapted to how ELF finds the address of
_GLOBAL_OFFSET_TABLE_, which would give similar benefits in PIC mode there.
llvm-svn: 196090
Previously, we clobbered callee-saved registers when folding an "add
sp, #N" into a "pop {rD, ...}" instruction. This change checks whether
a register we're going to add to the "pop" could actually be live
outside the function before doing so and should fix the issue.
This should fix PR18081.
llvm-svn: 196046
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.
Make tests more robust by removing hard-coded metadata numbers in CHECK lines.
llvm-svn: 195535
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.
llvm-svn: 195504
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow. Results are different when:
sext(a) + sext(b) != sext(a + b)
Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.
<rdar://problem/15292280>
Patch by Duncan Exon Smith!
llvm-svn: 194840
In ELF and COFF an alias is just another offset in a section. There is no way
to represent an alias to something in another file.
In MachO, the spec has the N_INDR type which should allow for exactly that, but
is not currently implemented. Given that it is specified but not implemented,
we error in codegen to avoid miscompiling but don't reject aliases to
declarations in the verifier to leave the option open of implementing it.
In the past we have used alias to declarations as a way of implementing
weakref, which is why it exists in some old tests which this patch updates.
llvm-svn: 194705
By default, the behavior of IT block generation will be determinated
dynamically base on the arch (armv8 vs armv7). This patch adds backend
options: -arm-restrict-it and -arm-no-restrict-it. The former one
restricts the generation of IT blocks (the same behavior as thumbv8) for
both arches. The later one allows the generation of legacy IT block (the
same behavior as ARMv7 Thumb2) for both arches.
Clang will support -mrestrict-it and -mno-restrict-it, which is
compatible with GCC.
llvm-svn: 194592
ARM prologues usually look like:
push {r7, lr}
sub sp, sp, #4
If code size is extremely important, this can be optimised to the single
instruction:
push {r6, r7, lr}
where we don't actually care about the contents of r6, but pushing it subtracts
4 from sp as a side effect.
This should implement such a conversion, predicated on the "minsize" function
attribute (-Oz) since I've yet to find any code it actually makes faster.
llvm-svn: 194264
Add a Virtualization ARM subtarget feature along with adding proper build
attribute emission for Tag_Virtualization_use (encodes Virtualization and
TrustZone) and Tag_MPextension_use.
Also rework test/CodeGen/ARM/2010-10-19-mc-elf-objheader.ll testcase to
something that is more maintainable. This changes the focus of this
testcase away from testing CPU defaults (which is tested elsewhere), onto
specifically testing that attributes are encoded correctly.
llvm-svn: 193859
Fix Tag_ABI_HardFP_use build attribute to handle single precision FP,
replace deprecated Tag_ABI_HardFP_use value of 3 with 0 and also add
some tests for Tag_ABI_VFP_args.
llvm-svn: 193856
This commit allows the ARM integrated assembler to parse
and assemble the code with .eabi_attribute, .cpu, and
.fpu directives.
To implement the feature, this commit moves the code from
AttrEmitter to ARMTargetStreamers, and several new test
cases related to cortex-m4, cortex-r5, and cortex-a15 are
added.
Besides, this commit also change the Subtarget->isFPOnlySP()
to Subtarget->hasD16() to match the usage of .fpu directive.
This commit changes the test cases:
* Several .eabi_attribute directives in
2010-09-29-mc-asm-header-test.ll are removed because the .fpu
directive already cover the functionality.
* In the Cortex-A15 test case, the value for
Tag_Advanced_SIMD_arch has be changed from 1 to 2,
which is more precise.
llvm-svn: 193524
There's a barrier instruction so that should still be used, but most actual
atomic operations are going to need a platform decision on the correct
behaviour (either nop if single-threaded or OS-support otherwise).
rdar://problem/15287210
llvm-svn: 193399
ARM processors without ldrex/strex need to be able to make libcalls for all
atomic operations, including the newer min/max versions.
The alternative would probably be expanding these operations in terms of
cmpxchg (as x86 does always), but in the configurations where this matters
code-size tends to be paramount so the libcall is more desirable.
llvm-svn: 193398
The compiler-rt functions __adddf3vfp and so on exist purely to allow Thumb1
code to make use of VFP instructions by switching back to ARM mode, they make
no sense for M-class processors which don't even have an ARM mode.
Given that justification, in practice this is a platform ABI decision so the
actual check is based on that rather than CPU features.
rdar://problem/15302004
llvm-svn: 193327
This commit implements the correct lowering of the
COPY_STRUCT_BYVAL_I32 pseudo-instruction for thumb1 targets.
Previously, the lowering of COPY_STRUCT_BYVAL_I32 generated the
post-increment forms of ldr/ldrh/ldrb instructions. Thumb1 does not
have the post-increment form of these instructions so the generated
assembly contained invalid instructions.
Passing the generated assembly to gcc caused it to complain with an
error like this:
Error: cannot honor width suffix -- `ldrb r3,[r0],#1'
and the integrated assembler would generate an object file with an
invalid instruction encoding.
This commit contains a small test case that demonstrates the problem
with thumb1 targets as well as an expanded test case that more
throughly tests the lowering of byval struct passing for arm,
thumb1, and thumb2 targets.
llvm-svn: 192916
Per original comment, the intention of this loop
is to go ahead and break the critical edge
(in order to sink this instruction) if there's
reason to believe doing so might "unblock" the
sinking of additional instructions that define
registers used by this one. The idea is that if
we have a few instructions to sink "together"
breaking the edge might be worthwhile.
This commit makes a few small changes
to help better realize this goal:
First, modify the loop to ignore registers
defined by this instruction. We don't
sink definitions of physical registers,
and sinking an SSA definition isn't
going to unblock an upstream instruction.
Second, ignore uses of physical registers.
Instructions that define physical registers are
rejected for sinking, and so moving this one
won't enable moving any defining instructions.
As an added bonus, while virtual register
use-def chains are generally small due
to SSA goodness, iteration over the uses
and definitions (used by hasOneNonDBGUse)
for physical registers like EFLAGS
can be rather expensive in practice.
(This is the original reason for looking at this)
Finally, to keep things simple continue
to only consider this trick for registers that
have a single use (via hasOneNonDBGUse),
but to avoid spuriously breaking critical edges
only do so if the definition resides
in the same MBB and therefore this one directly
blocks it from being sunk as well.
If sinking them together is meant to be,
let the iterative nature of this pass
sink the definition into this block first.
Update tests to accomodate this change,
add new testcase where sinking avoids pipeline stalls.
llvm-svn: 192608
When if converting something like:
true:
... = R0<kill>
false:
... = R0<kill>
then the instructions of the true block must not have a <kill> flag
anymore, as the instruction of the false block follow and do still read
the R0 value.
Specifically this patch determines the set of register live-in in the
false block (possibly after simulating the liveness changes of the
duplicated instructions). Each of these live-in registers mustn't be
killed.
llvm-svn: 192482
This reverts r192454
Apparently FileCheck isn't as smart as I though and does not enforce a
topological order between variable defs+uses.
llvm-svn: 192472
When we had a sequence like:
s1 = VLDRS [r0, 1], Q0<imp-def>
s3 = VLDRS [r0, 2], Q0<imp-use,kill>, Q0<imp-def>
s0 = VLDRS [r0, 0], Q0<imp-use,kill>, Q0<imp-def>
s2 = VLDRS [r0, 4], Q0<imp-use,kill>, Q0<imp-def>
we were gathering the {s0, s1} loads below the s3 load. This is fine,
but confused the verifier since now the s3 load had Q0<imp-use> with
no definition above it.
This should mark such uses <undef> as well. The liveness structure at
the beginning and end of the block is unaffected, and the true sN
definitions should prevent any dodgy reorderings being introduced
elsewhere.
rdar://problem/15124449
llvm-svn: 192344
from struct byval to registers.
We used to pass 0 which means the alignment of PtrVT. Even when the alignment
of the struct is smaller than 4, the LOADs would have alignment of 4, and
further optimizations could combine the LOADs into a ldm, which would
cause crash.
The fix is to pass the alignment of the struct byval.
rdar://problem/15144402
llvm-svn: 192126