The input offset to needsFrameBaseReg is a negative value below the top of the
stack frame, but when converting to a positive offset from the bottom of the
stack frame this value was negated, causing the final offset to be too large
by twice the input offset's magnitude. Fix that by not negating the offset.
Patch by John Brawn
Differential Revision: http://reviews.llvm.org/D8316
llvm-svn: 232513
ARMv6K is another layer between ARMV6 and ARMV6T2. This is the LLVM
side of the changes.
ARMV6 family LLVM implementation.
+-------------------------------------+
| ARMV6 |
+----------------+--------------------+
| ARMV6M (thumb) | ARMV6K (arm,thumb) | <- From ARMV6K and ARMV6M processors
+----------------+--------------------+ have support for hint instructions
| ARMV6T2 (arm,thumb,thumb2) | (SEV/WFE/WFI/NOP/YIELD). They can
+-------------------------------------+ be either real or default to NOP.
| ARMV7 (arm,thumb,thumb2) | The two processors also use
+-------------------------------------+ different encoding for them.
Patch by Vinicius Tinti.
llvm-svn: 232468
Summary:
This is instead of doing this in target independent code and is the last
non-functional change before targets begin to distinguish between
different memory constraints when selecting code for the ISD::INLINEASM
node.
Next, each target will individually move away from the idea that all
memory constraints behave like 'm'.
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8173
llvm-svn: 232373
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.
This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break
anything.
The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate
Constraint_* values.
PR22883 was caused the matching operands copying the whole of the operand flags
for the matched operand. This included the constraint id which needed to be
replaced with the operand number. This has been fixed with a conversion
function. Following on from this, matching operands also used the operand
number as the constraint id. This has been fixed by looking up the matched
operand and taking it from there.
llvm-svn: 232165
This (r232027) has caused PR22883; so it seems those bits might be used by
something else after all. Reverting until we can figure out what else to do.
Original commit message:
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.
This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.
The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.
llvm-svn: 232093
Summary:
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.
This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.
The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8171
llvm-svn: 232027
Summary:
I don't know why every singled backend had to redeclare its own DataLayout.
There was a virtual getDataLayout() on the common base TargetMachine, the
default implementation returned nullptr. It was not clear from this that
we could assume at call site that a DataLayout will be available with
each Target.
Now getDataLayout() is no longer virtual and return a pointer to the
DataLayout member of the common base TargetMachine. I plan to turn it into
a reference in a future patch.
The only backend that didn't have a DataLayout previsouly was the CPPBackend.
It now initializes the default DataLayout. This commit is NFC for all the
other backends.
Test Plan: clang+llvm ninja check-all
Reviewers: echristo
Subscribers: jfb, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8243
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231987
time. The target independent code was passing in one all the
time and targets weren't checking validity before using. Update
a few calls to pass in a MachineFunction where necessary.
llvm-svn: 231970
The main issue being fixed here is that APCS targets handling a "byval align N"
parameter with N > 4 were miscounting what objects were where on the stack,
leading to FrameLowering setting the frame pointer incorrectly and clobbering
the stack.
But byval handling had grown over many years, and had multiple layers of cruft
trying to compensate for each other and calculate padding correctly. This only
really needs to be done once, in the HandleByVal function. Elsewhere should
just do what it's told by that call.
I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits
byvals with the correct C ABI alignment), which simplified HandleByVal.
rdar://20095672
llvm-svn: 231959
In theory this allows the compiler to skip materializing the array on
the stack. In practice clang often fails to do that, but that's a
different story. NFC.
llvm-svn: 231571
to disable lane switching if we don't actually have the instruction
set we want to switch to. Models the earlier check above the
conditional for the pass.
The testcase is one that triggered with the assert that's added
as part of the fix, use it to avoid adding a new testcase as it
highlights the same problem.
llvm-svn: 231539
This commit enables forming vector extloads for ARM.
It only does so for legal types, and when we can't fold the extension
in a wide/long form of the user instruction.
Enabling it for larger types isn't as good an idea on ARM as it is on
X86, because:
- we pretend that extloads are legal, but end up generating vld+vmov
- we have instructions like vld {dN, dM}, which can't be generated
when we "manually expand" extloads to vld+vmov.
For legal types, the combine doesn't fire that often: in the
integration tests only in a big endian testcase, where it removes a
pointless AND.
Related to rdar://19723053
Differential Revision: http://reviews.llvm.org/D7423
llvm-svn: 231396
Summary:
In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all.
This does not represent a behavioural change and as such no tests were added.
Patch by: Richard Diamond.
Reviewers: jfb
Reviewed By: jfb
Subscribers: jfb, aemerson, t.p.northover, llvm-commits
Differential Revision: http://reviews.llvm.org/D7713
llvm-svn: 231250
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.
llvm-svn: 230699
In case of "krait" CPU, asm printer doesn't emit any ".cpu" so the
features bits are not computed. This patch lets the asm printer
emit ".cpu cortex-a9" directive for krait and the hwdiv feature is
enabled through ".arch_extension". In short, krait is treated
as "cortex-a9" with hwdiv. We can not emit ".krait" as CPU since
it is not supported bu GNU GAS yet
llvm-svn: 230651
This patch is in response to r223147 where the avaiable features are
computed based on ".cpu" directive. This will work clean for the standard
variants like cortex-a9. For custom variants which rely on standard cpu names
for assembly, the additional features of a CPU should be propagated. This can be
done via ".arch_extension" as long as the assembler supports it. The
implementation for krait along with unit test will be submitted in next patch.
llvm-svn: 230650
This required plumbing a TargetRegisterInfo through computeRegisterProperties
and into findRepresentativeClass which uses it for register class
iteration. This required passing a subtarget into a few target specific
initializations of TargetLowering.
llvm-svn: 230583
Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR,
STR, and ADD only allow offsets that are a multiple of 4. Make some changes
to better make use of these instructions:
* Use word loads for anyext byte and halfword loads from the stack.
* Enforce 4-byte alignment on objects accessed in this way, to ensure that
the offset is valid.
* Do the same for objects whose frame index is used, in order to avoid having
to use more than one ADD to generate the frame index.
* Correct how many bits of offset we think AddrModeT1_s has.
Patch by John Brawn.
llvm-svn: 230496
The logic is almost there already, with our special homogeneous aggregate
handling. Tweaking it like this allows front-ends to emit AAPCS compliant code
without ever having to count registers or add discarded padding arguments.
Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to
apply the logic to all integer arrays for more consistency.
llvm-svn: 230348
This is a follow up to r230233 to fix something that I noticed by
inspection. The AddrModeT2_i8s4 addressing mode does not support
negative offsets. I spent a good chunk of the day trying to come up with
a testcase for this but was not successful. This addressing mode is used
to spill and restore GPRPair registers in Thumb2 code and that does not
happen often. We also make very limited used of negative offsets when
lowering frame indexes. I am going ahead with the change anyway, because
I am pretty confident that it is correct. I also added a missing assertion
to check that the low bits of the scaled offset are zero.
llvm-svn: 230297
It was previously using the subtarget to get values for the global
offset without actually checking each function as it was generating
code. Go ahead and solidify the current behavior and make the
existing FIXMEs more prominent.
As a note the ARM backend previously had a thumb1 and non-thumb1
set of defaults. Only the former was tested so I've changed the
behavior to only use that for now.
llvm-svn: 230245
The natural way to handle this addressing mode would be to say that it has
8 bits and gets scaled by 4, but since the MC layer is expecting the scaling
to be already reflected in the immediate value, we have been setting the
Scale to 1. That's fine, but then NumBits needs to be adjusted to reflect
the effective increase in the range of the immediate. That adjustment was
missing.
The consequence is that the register scavenger can fail.
The estimateRSStackSizeLimit() function in ARMFrameLowering.cpp correctly
assumes that the AddrModeT2_i8s4 address mode can handle scaled offsets up to
1020. Under just the right circumstances, we fail to reserve space for the
scavenger because it thinks that nothing will be needed. However, the overly
pessimistic behavior in rewriteT2FrameIndex causes some frame indexes to be
out of range and require scavenged registers, and so the scavenger asserts.
Unfortunately I have not been able to come up with a testcase for this. I
can only reproduce it on an internal branch where the frame layout and
register allocation is slightly different than trunk. We really need a
way to serialize MachineInstr-level IR to write reasonable tests for things
like this.
rdar://problem/19909005
llvm-svn: 230233
Everyone except R600 was manually passing the length of a static array
at each callsite, calculated in a variety of interesting ways. Far
easier to let ArrayRef handle that.
There should be no functional change, but out of tree targets may have
to tweak their calls as with these examples.
llvm-svn: 230118
This re-applies r223862, r224198, r224203, and r224754, which were
reverted in r228129 because they exposed Clang misalignment problems
when self-hosting.
The combine caused the crashes because we turned ISD::LOAD/STORE nodes
to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we
were very lax for the former, and only emitted the alignment operand
(as in "[r1:128]") when it was larger than the standard alignment of
the memory type.
However, for ARMISD nodes, we just used the MMO alignment, no matter
what. In our case, we turned ISD nodes to ARMISD nodes, and this
caused the alignment operands to start being emitted.
And that's how we exposed alignment problems that were ignored before
(but I believe would have been caught with SCTRL.A==1?).
To fix this, we can just mirror the hack done for ISD nodes: only
take into account the MMO alignment when the access is overaligned.
Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.
We can do the same thing for generic load/stores.
Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).
rdar://19717869, rdar://14062261.
llvm-svn: 229932
In preparation for a future patch:
- rename isLoad to isLoadOp: the former is confusing, and can be taken
to refer to the fact that the node is an ISD::LOAD. (it isn't, yet.)
- change formatting here and there.
- add some comments.
- const-ify bools.
llvm-svn: 229929
Previously, subtarget features were a bitfield with the underlying type being uint64_t.
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.
Differential Revision: http://reviews.llvm.org/D7065
llvm-svn: 229831
A null MCTargetStreamer allows IRObjectFile to ignore target-specific
directives. Previously we were crashing.
Differential Revision: http://reviews.llvm.org/D7711
llvm-svn: 229797
Add some of the missing M and R class Cortex CPUs, namely:
Cortex-M0+ (called Cortex-M0plus for GCC compatibility)
Cortex-M1
SC000
SC300
Cortex-R5
llvm-svn: 229660
initialization. Initialize the subtarget once per function and
migrate Emit{Start|End}OfAsmFile to either use attributes on the
TargetMachine or get information from the subtarget we'd use
for assembling. One bit (getISAEncoding) touched the general
AsmPrinter and the debug output. Handle this one by passing
the function for the subprogram down and updating all callers
and users.
The top-level-ness of the ARM attribute output for assembly is,
by nature, contrary to how we'd want to do this for an LTO
situation where we have multiple cpu architectures so this
solution is good enough for now.
llvm-svn: 229528
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.
Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.
Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering
llvm-svn: 229413
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
=> getFnAttribute(Kind)
getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
=> hasFnAttribute(Kind)
llvm-svn: 229220
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.
This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.
The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".
llvm-svn: 229094
The changes in r223113 (ARM modified-immediate syntax) have broken
instructions like:
mov r0, #~0xffffff00
The problem is that I've added a spurious range check on the immediate
operand to ensure that it lies between INT32_MIN and UINT32_MAX. While
this range check is correct in theory, it causes problems because the
operand is stored in an int64_t (by MC). So valid 32-bit constants like
\#~0xffffff00 become out of range. The solution is to simply remove this
range check. It is not possible to validate the range of the immediate
operand with the current setup because: 1) The operand is stored in an
int64_t by MC, 2) The immediate can be of the forms #imm, #-imm, #~imm
or even #((~imm)) etc. So we just chop the value to 32 bits and use it.
Also noted that the original range check was note tested by any of the
unit tests. I've added a new test to cover #~imm kind of operands.
Change-Id: I411e90d84312a2eff01b732bb238af536c4a7599
llvm-svn: 228920
While various DAG combines try to guarantee that a vector SETCC
operation will have the same output size as input, there's nothing
intrinsic to either creation or LegalizeTypes that actually guarantees
it, so the function needs to be ready to handle a mismatch.
Fortunately this is easy enough, just extend or truncate the naturally
compared result.
I couldn't reproduce the failure in other backends that I know have
SIMD, so it's probably only an issue for these two due to shared
heritage.
Should fix PR21645.
llvm-svn: 228518
Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation. By default, it's not an expensive operation. This keeps the default behavior the same as before. The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point.
Reviewers: chandlerc, echristo
Reviewed By: echristo
Subscribers: t.p.northover, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D6936
llvm-svn: 228263
This is a bug that was caused due to storing the feature bitset in a 32-bit
variable when it is a 64-bit mask, discarding the top half of the feature set.
llvm-svn: 228151
Currently, Cortex-A72 is modelled as an Cortex-A57 except the fp
load balancing pass isn't enabled for Cortex-A72 as it's not
profitable to have it enabled for this core.
Patch by Ranjeet Singh.
llvm-svn: 228140
This reverts patches 223862, 224198, 224203, and 224754, which were all
related to the vector load/store combining and were reverted/reaplied
a few times due to the same alignment problems we're seeing now.
Further tests, mainly self-hosting Clang, will be needed to reapply this
patch in the future.
llvm-svn: 228129
The ARM assembler allows register alias redefinitions as long as it
targets the same register. r222319 broke that. In the AArch64 case
it would just produce a new warning, but in the ARM case it would
error out on previously accepted assembler.
llvm-svn: 228109
Summary:
Previously it only avoided optimizing signed comparisons to 0.
Sometimes the DAGCombiner will optimize the unsigned comparisons
to 0 before it gets to the peephole pass, but sometimes it doesn't.
Fix for PR22373.
Test Plan: test/CodeGen/ARM/sub-cmp-peephole.ll
Reviewers: jfb, manmanren
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D7274
llvm-svn: 227809
now that we have a correct and cached subtarget specific to the
function.
Also, finish providing a cached per-function subtarget in the core
LLVMTargetMachine -- that layer hadn't switched over yet.
The only use of the TargetMachine was to re-lookup a subtarget for
a particular function to work around the fact that TTI was immutable.
Now that it is per-function and we haved a cached subtarget, use it.
This still leaves a few interfaces with real warts on them where we were
passing Function objects through the TTI interface. I'll remove these
and clean their usage up in subsequent commits now that this isn't
necessary.
llvm-svn: 227738
intermediate TTI implementation template and instead query up to the
derived class for both the TargetMachine and the TargetLowering.
Most of the derived types had a TLI cached already and there is no need
to store a less precisely typed target machine pointer.
This will in turn make it much cleaner to look up the TLI via
a per-function subtarget instead of the generic subtarget, and it will
pave the way toward pulling the subtarget used for unroll preferences
into the same form once we are *always* using the function to look up
the correct subtarget.
llvm-svn: 227737
TargetIRAnalysis access path directly rather than implementing getTTI.
This even removes getTTI from the interface. It's more efficient for
each target to just register a precise callback that creates their
specific TTI.
As part of this, all of the targets which are building their subtargets
individually per-function now build their TTI instance with the function
and thus look up the correct subtarget and cache it. NVPTX, R600, and
XCore currently don't leverage this functionality, but its trivial for
them to add it now.
llvm-svn: 227735
null.
For some reason some of the original TTI code supported a null target
machine. This seems to have been legacy, and I made matters worse when
refactoring this code by spreading that pattern further through the
various targets.
The TargetMachine can't actually be null, and it doesn't make sense to
support that use case. I've now consistently removed it and removed all
of the code trying to cope with that situation. This is probably good,
as several targets *didn't* cope with it being null despite the null
default argument in their constructors. =]
llvm-svn: 227734
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.
This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.
I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.
With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.
llvm-svn: 227685
This adds some comments and splits the flag calculation on type boundaries to
make the table more readable. Addresses some post-commit review comments to SVN
r227603. NFC.
llvm-svn: 227670
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
Now that -mstack-probe-size is piped through to the backend via the function
attribute as on Windows x86, honour the value to permit handling of non-default
values for stack probes. This is needed /Gs with the clang-cl driver or
-mstack-probe-size with the clang driver when targeting Windows on ARM.
llvm-svn: 227667
Also revert r227489 since it didn't actually fix the thing I thought I
was fixing (since the test case was targeting the wrong architecture
initially). The change might be correct & demonstrated by other test
cases, but it's not a priority for me to find those test cases right
now.
Filed PR22417 for the failure.
llvm-svn: 227632
If the original FPU specification involved a restricted VFP unit (d16), ensure
that we reset the functionality when we encounter a new FPU type. In
particular, if the user specified vfpv3-d16, but switched to a VFPv3 (which has
32 double precision registers), we would fail to reset the D16 feature, and
treat it as being equivalent to vfpv3-d16.
llvm-svn: 227603
The FPU directive permits the user to switch the target FPU, enabling
instructions that would be otherwise unavailable. However, when configuring the
new subtarget features, we would not enable the implied functions for newer
FPUs. This would result in invalid rejection of valid input. Ensure that we
inherit the implied FPU functionality when enabling newer versions of the FPU.
Fortunately, these are mostly hierarchical, unlike the CPUs.
Addresses PR22395.
llvm-svn: 227584
Any code creating an MCSectionELF knows ELF and already provides the flags.
SectionKind is an abstraction used by common code that uses a plain
MCSection.
Use the flags to compute the SectionKind. This removes a lot of
guessing and boilerplate from the MCSectionELF construction.
llvm-svn: 227476
derived classes.
Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.
*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.
llvm-svn: 227113
Windows supports a restricted set of relocations (compared to ARM ELF). In some
cases, we may end up generating an unsupported relocation. This can occur with
bad input to the assembler in particular (the frontend should never generate
code that cannot be compiled). Generate an error rather than just aborting.
The change in the API is driven by the desire to provide a slightly more helpful
message for debugging purposes.
llvm-svn: 226779
Thumbv4t does not have lo->lo copies other than MOVS,
and that can't be predicated. So emit MOVS when needed
and bail if there's a predicate.
http://reviews.llvm.org/D6592
llvm-svn: 226711
The fixes are to note that AArch64 has additional restrictions on when local
relocations can be used. In particular, ld64 requires that relocations to
cstring/cfstrings use linker visible symbols.
Original message:
In an assembly expression like
bar:
.long L0 + 1
the intended semantics is that bar will contain a pointer one byte past L0.
In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.
The solution used in ELF to use relocation with symbols if there is a non-zero
addend.
In MachO before this patch we would just keep all symbols in some sections.
This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.
This patch implements the non-zero addend logic for MachO too.
llvm-svn: 226503
utils/sort_includes.py.
I clearly haven't done this in a while, so more changed than usual. This
even uncovered a missing include from the InstrProf library that I've
added. No functionality changed here, just mechanical cleanup of the
include order.
llvm-svn: 225974
Peephole optimizer is scanning a basic block forward. At some point it
needs to answer the question "given a pointer to an MI in the current
BB, is it located before or after the current instruction".
To perform this, it keeps a set of the MIs already seen during the scan,
if a MI is not in the set, it is assumed to be after.
It means that newly created MIs have to be inserted in the set as well.
This commit passes the set as an argument to the target-dependent
optimizeSelect() so that it can properly update the set with the
(potentially) newly created MIs.
llvm-svn: 225772
AAELF specifies a number of ELF specific relocation types which have custom
prefixes for the symbol reference. Switch the parser to be more table driven
with an idea of file formats for which they apply. NFC.
llvm-svn: 225758
One is that AArch64 has additional restrictions on when local relocations can
be used. We have to take those into consideration when deciding to put a L
symbol in the symbol table or not.
The other is that ld64 requires the relocations to cstring to use linker
visible symbols on AArch64.
Thanks to Michael Zolotukhin for testing this!
Remove doesSectionRequireSymbols.
In an assembly expression like
bar:
.long L0 + 1
the intended semantics is that bar will contain a pointer one byte past L0.
In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.
The solution used in ELF to use relocation with symbols if there is a non-zero
addend.
In MachO before this patch we would just keep all symbols in some sections.
This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.
This patch implements the non-zero addend logic for MachO too.
llvm-svn: 225644
This adds support for parsing and emitting the SBREL relocation variant for the
ARM target. Handling this relocation variant is necessary for supporting the
full ARM ELF specification. Addresses PR22128.
llvm-svn: 225595
The assert was being triggered when the distance between a constant pool entry
and its user exceeded the maximally allowed distance after thumb2 branch
shortening. A padding was inserted after a thumb2 branch instruction was shrunk,
which caused the user to be out of range. This is wrong as the padding should
have been inserted by the layout algorithm so that the distance between two
instructions doesn't grow later during thumb2 instruction optimization.
This commit fixes the code in ARMConstantIslands::createNewWater to call
computeBlockSize and set BasicBlock::Unalign when a branch instruction is
inserted to create new water after a basic block. A non-zero Unalign causes
the worst-case padding to be inserted when adjustBBOffsetsAfter is called to
recompute the basic block offsets.
rdar://problem/19130476
llvm-svn: 225467
This partially fixes PR13007 (ARM CodeGen fails with large stack
alignment): for ARM and Thumb2 targets, but not for Thumb1, as it
seems stack alignment for Thumb1 targets hasn't been supported at
all.
Producing an aligned stack pointer is done by zero-ing out the lower
bits of the stack pointer. The BIC instruction was used for this.
However, the immediate field of the BIC instruction only allows to
encode an immediate that can zero out up to a maximum of the 8 lower
bits. When a larger alignment is requested, a BIC instruction cannot
be used; llvm was silently producing incorrect code in this case.
This commit fixes code generation for large stack aligments by
using the BFC instruction instead, when the BFC instruction is
available. When not, it uses 2 instructions: a right shift,
followed by a left shift to zero out the lower bits.
The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code
that unconditionally uses BIC to realign the stack pointer, so it
very likely has the same problem. However, I wasn't able to
produce a test case for that. This commit adds an assert so that
the compiler will fail the assert instead of silently generating
wrong code if this is ever reached.
llvm-svn: 225446
type (in addition to the memory type).
The *LoadExt* legalization handling used to only have one type, the
memory type. This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.
However, this isn't always the case. For instance, on X86, with AVX,
this is legal:
v4i32 load, zext from v4i8
but this isn't:
v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.
Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.
Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.
Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior. The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)
No functional change intended.
Differential Revision: http://reviews.llvm.org/D6532
llvm-svn: 225421
The change in r225266 was reviewed under D6722. But the commit r225266 has a
typo, causing some MCHammer failures. This patch fixes it.
Change-Id: I573efcff25003af7478ac02548ebbe929fc7f5fd
llvm-svn: 225347
This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin.
Reverting to get the bots green while I track down the source of the changed
behavior.
llvm-svn: 225311
No functional changes. Support for ARM's modified immediate syntax was added
in r223113 and r223115 (review: D6408). That patch introduced the mod_imm*
tblegen definitions which renders the existing so_imm* definitions redundant.
This patch gets rid of them completely.
Reviewed as: D6722
llvm-svn: 225266
Tag_compatibility takes two arguments, but before this patch it would
erroneously accept just one, it now produces an error in that case.
Change-Id: I530f918587620d0d5dfebf639944d6083871ef7d
llvm-svn: 225167
Claim conformance to version 2.09 of the ARM ABI.
This build attribute must be emitted first amongst the build attributes when
written to an object file. This is to simplify conformance detection by
consumers.
Change-Id: If9eddcfc416bc9ad6e5cc8cdcb05d0031af7657e
llvm-svn: 225166
Weak externals are resolved statically, so we can actually generate the tail
call on PE/COFF targets without breaking the requirements. It is questionable
whether we want to propagate the current behaviour for MachO as the requirements
are part of the ARM ELF specifications, and it seems that prior to the SVN
r215890, we would have tail'ed the call. For now, be conservative and only
permit it on PE/COFF where the call will always be fully resolved.
llvm-svn: 225119
Make sure they all have llvm_unreachable on the default path out of the switch. Remove unnecessary "default: break". Remove a 'return' after unreachable. Fix some indentation.
llvm-svn: 225114
The issues was that AArch64 has additional restrictions on when local
relocations can be used. We have to take those into consideration when
deciding to put a L symbol in the symbol table or not.
Original message:
Remove doesSectionRequireSymbols.
In an assembly expression like
bar:
.long L0 + 1
the intended semantics is that bar will contain a pointer one byte past L0.
In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.
The solution used in ELF to use relocation with symbols if there is a non-zero
addend.
In MachO before this patch we would just keep all symbols in some sections.
This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.
This patch implements the non-zero addend logic for MachO too.
llvm-svn: 225048
In an assembly expression like
bar:
.long L0 + 1
the intended semantics is that bar will contain a pointer one byte past L0.
In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.
The solution used in ELF to use relocation with symbols if there is a non-zero
addend.
In MachO before this patch we would just keep all symbols in some sections.
This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.
This patch implements the non-zero addend logic for MachO too.
llvm-svn: 224985
Bob Wilson pointed out the unnecessary checks that had been committed to the
instruction check predicates. The check was meant to ensure that the check was
not accidentally applied to non-ARM instructions. This is better served as an
assertion rather than a condition check.
llvm-svn: 224825
r223862/r224203 tried to also combine base-updating load/stores.
There was a mistake there: the alignment was added as is as an operand to
the ARMISD::VLD/VST node. However, the VLD/VST selection logic doesn't care
about less-than-standard alignment attributes.
For example, no matter the alignment of a v2i64 load (say 1), SelectVLD picks
VLD1q64 (because of the memory type). But VLD1q64 ("vld1.64 {dXX, dYY}") is
8-aligned, per ARMARMv7a 3.2.1.
For the 1-aligned load, what we really want is VLD1q8.
This commit introduces bitcasts if necessary, and changes the vld/vst type to
one whose standard alignment matches the original load/store alignment.
Differential Revision: http://reviews.llvm.org/D6759
llvm-svn: 224754
Followup to r224294:
ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.
llvm-svn: 224743
The ARM ARM states:
LDM/LDMIA/LDMFD:
The SP can be in the list. However, ARM deprecates using these instructions
with SP in the list.
ARM deprecates using these instructions with both the LR and the PC in the
list.
LDMDA/LDMFA/LDMDB/LDMEA/LDMIB/LDMED:
The SP can be in the list. However, instructions that include the SP in the
list are deprecated.
Instructions that include both the LR and the PC in the list are deprecated.
POP:
The SP can only be in the list before ARMv7. ARM deprecates any use of ARM
instructions that include the SP, and the value of the SP after such an
instruction is UNKNOWN.
ARM deprecates the use of this instruction with both the LR and the PC in
the list.
Attempt to diagnose use of deprecated forms of these instructions. This mirrors
the previous changes to diagnose use of the deprecated forms of STM in ARM mode.
llvm-svn: 224682
Fix an off-by-one access introduced in 224502 for push.w and pop.w with single
register operands. Add test cases for both scenarios.
Thanks to Asiri Rathnayake for pointing out the failure!
llvm-svn: 224521
The ARM Architecture Reference Manual states the following:
LDM{,IA,DB}:
The SP cannot be in the list.
The PC can be in the list.
If the PC is in the list:
• the LR must not be in the list
• the instruction must be either outside any IT block, or the last
instruction in an IT block.
POP:
The PC can be in the list.
If the PC is in the list:
• the LR must not be in the list
• the instruction must be either outside any IT block, or the last
instruction in an IT block.
PUSH:
The SP and PC can be in the list in ARM instructions, but not in Thumb
instructions.
STM:{,IA,DB}:
The SP and PC can be in the list in ARM instructions, but not in Thumb
instructions.
llvm-svn: 224502
of the abi we should be using. For targets that don't use the
option there's no change, otherwise this allows external users
to set the ABI via string and avoid some of the -backend-option
pain in clang.
Use this option to move the ABI for the ARM port from the
Subtarget to the TargetMachine and update the testcases
accordingly since it's no longer valid to set via -mattr.
llvm-svn: 224492
same. This will change the "bare metal" ABI from APCS to AAPCS.
The only difference between the front and back end code is that
the code for Triple::GNU was added for environment. That will migrate
to the front end shortly.
Tests updated with the ABI they were originally testing in the case
of bare metal (e.g. -mtriple armv7) or with a -gnu for arm-linux
triples.
llvm-svn: 224489
The use of SP and PC in the register list for stores is deprecated on ARM
(ARM ARM A.8.8.199):
ARM deprecates the use of ARM instructions that include the SP or the PC in
the list.
Provide a deprecation warning from the assembler in the case that the syntax is
ever seen.
llvm-svn: 224319
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.
llvm-svn: 224294
Add in definedness checks for shift operators, null checks when
pointers are assumed by the code to be non-null, and explicit
unreachables.
llvm-svn: 224255
r223862 tried to also combine base-updating load/stores.
r224198 reverted it, as "it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown."
Reapply, with a fix to ignore non-normal load/stores.
Truncstores are handled elsewhere (you can actually write a pattern for
those, whereas for postinc loads you can't, since they return two values),
but it should be possible to also combine extloads base updates, by checking
that the memory (rather than result) type is of the same size as the addend.
Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.
We can do the same thing for generic load/stores.
Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).
Differential Revision: http://reviews.llvm.org/D6585
llvm-svn: 224203
This reverts commit r223862, as it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown. We'll investigate the issue and re-apply
when safe.
llvm-svn: 224198
The __fp16 type is unconditionally exposed. Since -mfp16-format is not yet
supported, there is not a user switch to change this behaviour. This build
attribute should capture the default behaviour of the compiler, which is to
expose the IEEE 754 version of __fp16.
When -mfp16-format is emitted, that will be the way to control the value of
this build attribute.
Change-Id: I8a46641ff0fd2ef8ad0af5f482a6d1af2ac3f6b0
llvm-svn: 224115
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.
To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.
This is the 2nd attempt at this after realizing that PassManager::add() may
actually delete the pass.
llvm-svn: 224059
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.
To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.
llvm-svn: 224042
The distinction is mostly useful in the front-end. By the time we get here,
there are very few situations where we actually want different behaviour for
Darwin and IOS (in fact Darwin mostly just exists in a few tests). So this
should reduce any surprising weirdness for anyone using it.
No functional change on anything anyone actually cares about.
llvm-svn: 224035
Quite a major error here: the expansions for the Pseudos with and without
folded load were mixed up. Fortunately it only affects ARM-mode, when not using
movw/movt, on Darwin. I'm guessing no-one actually uses that combination.
llvm-svn: 223986
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.
We can do the same thing for generic load/stores.
Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).
Differential Revision: http://reviews.llvm.org/D6585
llvm-svn: 223862
Move the combiner-state check into another function, add a few
small comments, and use a more general type in a cast<>.
In preparation for a future patch.
llvm-svn: 223834
It was missing from the VLD1/VST1 handling logic, even though the
corresponding instructions exist (same form as v2i64).
In preparation for a future patch.
llvm-svn: 223832
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532. Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.
I have a follow-up patch prepared for `clang`. If this breaks other
sub-projects, I apologize in advance :(. Help me compile it on Darwin
I'll try to fix it. FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.
This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.
Here's a quick guide for updating your code:
- `Metadata` is the root of a class hierarchy with three main classes:
`MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from
the `Value` class hierarchy. It is typeless -- i.e., instances do
*not* have a `Type`.
- `MDNode`'s operands are all `Metadata *` (instead of `Value *`).
- `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.
If you're referring solely to resolved `MDNode`s -- post graph
construction -- just use `MDNode*`.
- `MDNode` (and the rest of `Metadata`) have only limited support for
`replaceAllUsesWith()`.
As long as an `MDNode` is pointing at a forward declaration -- the
result of `MDNode::getTemporary()` -- it maintains a side map of its
uses and can RAUW itself. Once the forward declarations are fully
resolved RAUW support is dropped on the ground. This means that
uniquing collisions on changing operands cause nodes to become
"distinct". (This already happened fairly commonly, whenever an
operand went to null.)
If you're constructing complex (non self-reference) `MDNode` cycles,
you need to call `MDNode::resolveCycles()` on each node (or on a
top-level node that somehow references all of the nodes). Also,
don't do that. Metadata cycles (and the RAUW machinery needed to
construct them) are expensive.
- An `MDNode` can only refer to a `Constant` through a bridge called
`ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).
As a side effect, accessing an operand of an `MDNode` that is known
to be, e.g., `ConstantInt`, takes three steps: first, cast from
`Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
third, cast down to `ConstantInt`.
The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
metadata schema owners transition away from using `Constant`s when
the type isn't important (and they don't care about referring to
`GlobalValue`s).
In the meantime, I've added transitional API to the `mdconst`
namespace that matches semantics with the old code, in order to
avoid adding the error-prone three-step equivalent to every call
site. If your old code was:
MDNode *N = foo();
bar(isa <ConstantInt>(N->getOperand(0)));
baz(cast <ConstantInt>(N->getOperand(1)));
bak(cast_or_null <ConstantInt>(N->getOperand(2)));
bat(dyn_cast <ConstantInt>(N->getOperand(3)));
bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));
you can trivially match its semantics with:
MDNode *N = foo();
bar(mdconst::hasa <ConstantInt>(N->getOperand(0)));
baz(mdconst::extract <ConstantInt>(N->getOperand(1)));
bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2)));
bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3)));
bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));
and when you transition your metadata schema to `MDInt`:
MDNode *N = foo();
bar(isa <MDInt>(N->getOperand(0)));
baz(cast <MDInt>(N->getOperand(1)));
bak(cast_or_null <MDInt>(N->getOperand(2)));
bat(dyn_cast <MDInt>(N->getOperand(3)));
bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));
- A `CallInst` -- specifically, intrinsic instructions -- can refer to
metadata through a bridge called `MetadataAsValue`. This is a
subclass of `Value` where `getType()->isMetadataTy()`.
`MetadataAsValue` is the *only* class that can legally refer to a
`LocalAsMetadata`, which is a bridged form of non-`Constant` values
like `Argument` and `Instruction`. It can also refer to any other
`Metadata` subclass.
(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)
llvm-svn: 223802
Instructions of the form [ADD Rd, pc, #imm] are manually aliased
in processInstruction() to use ADR. To accomodate this, mod_imm handling
had to be tweaked a bit. Turns out it was the manual aliasing that must
be tweaked to accommodate mod_imms instead. More information about the
parsed instruction is available at the point where processInstruction()
is invoked, which makes it easier to detect a mod_imm at that point rather
than trying to detect a potential alias when a mod_imm is being prepped.
Added a test case and fixed some white spaces as well.
llvm-svn: 223772
The test file test/CodeGen/ARM/build-attributes.ll was missing several
floating-point build attribute tests. The intention of this commit is that for
each CPU / architecture currently tested, there are now tests that make sure
the following attributes are sufficiently checked,
* Tag_ABI_FP_rounding
* Tag_ABI_FP_denormal
* Tag_ABI_FP_exceptions
* Tag_ABI_FP_user_exceptions
* Tag_ABI_FP_number_model
Also in this commit, the -unsafe-fp-math flag has been augmented with the full
suite of flags Clang sends to LLVM when you pass -ffast-math to Clang. That is,
`-unsafe-fp-math' has been changed to `-enable-unsafe-fp-math -disable-fp-elim
-enable-no-infs-fp-math -enable-no-nans-fp-math -fp-contract=fast'
Change-Id: I35d766076bcbbf09021021c0a534bf8bf9a32dfc
llvm-svn: 223454
r223113 added support for ARM modified immediate assembly syntax. Which
assumes all immediate operands are prefixed with a '#'. This assumption
is wrong as per the ARMARM - which recommends that all '#' characters be
treated optional. The current patch fixes this regression and adds a test
case. A follow-up patch will expand the test coverage to other instructions.
llvm-svn: 223381
So there are a couple of issues with indirect calls on thumbv4t. First, the most
'obvious' instruction, 'blx' isn't available until v5t. And secondly, the
next-most-obvious sequence: 'mov lr, pc; bx rN' doesn't DTRT in thumb code
because the saved off pc has its thumb bit cleared, so when the callee returns
we end up in ARM mode.... yuck.
The solution is to 'bl' to a nearby landing pad with a 'bx rN' in it.
We could cut down on code size by sharing the landing pads between call sites
that are close enough, but for the moment let's do correctness first and look at
performance later.
Patch by: Iain Sandoe
http://reviews.llvm.org/D6519
llvm-svn: 223380
r223113 added support for ARM modified immediate assembly syntax. That patch
has broken support for immediate expressions, as in:
add r0, #(4 * 4)
It wasn't caught because we don't have any tests for this feature. This patch
fixes this regression and adds test cases.
llvm-svn: 223366
Use the MCAsmInfo instead of the DataLayout, and allow
specifying a custom prefix for labels specifically. HSAIL
requires that labels begin with @, but global symbols with &.
llvm-svn: 223323
LLVM understands a -enable-sign-dependent-rounding-fp-math codegen option. When
the user has specified this option, the Tag_ABI_FP_rounding attribute should be
emitted with value 1. This option currently does not appear to disable
transformations and optimizations that assume default floating point rounding
behavior, AFAICT, but the intention should be recorded in the build attributes,
regardless of what the compiler actually does with the intention.
Change-Id: If838578df3dc652b6f2796b8d152545674bcb30e
llvm-svn: 223218
Previously .cpu directive in ARM assembler didnt switch to the new CPU and
therefore acted as a nop. This implemented real action for .cpu and eg.
allows to assembler FreeBSD kernel with -integrated-as.
llvm-svn: 223147
Removing an unused function which is causing one of the build bots to fail.
This was introduced in the commit r223113. A proper cleanup of the so_imm
tblgen defintion (made redundant by the mod_imm definition) needs to happen
soon.
llvm-svn: 223115
Certain ARM instructions accept 32-bit immediate operands encoded as a 8-bit
integer value (0-255) and a 4-bit rotation (0-30, even). Current ARM assembly
syntax support in LLVM allows the decoded (32-bit) immediate to be specified
as a single immediate operand for such instructions:
mov r0, #4278190080
The ARMARM defines an extended assembly syntax allowing the encoding to be made
more explicit, as in:
mov r0, #255, #8 ; (same 32-bit value as above)
The behaviour of the two instructions can be different w.r.t flags, which is
documented under "Modified immediate constants" in ARMARM. This patch enables
support for this extended syntax at the MC layer.
llvm-svn: 223113
The default ARM floating-point mode does not support IEEE 754 mode exactly. Of
relevance to this patch is that input denormals are flushed to zero. The way in
which they're flushed to zero depends on the architecture,
* For VFPv2, it is implementation defined as to whether the sign of zero is
preserved.
* For VFPv3 and above, the sign of zero is always preserved when a denormal
is flushed to zero.
When FP support has been disabled, the strategy taken by this patch is to
assume the software support will mirror the behaviour of the hardware support
for the target *if it existed*. That is, for architectures which can only have
VFPv2, it is assumed the software will flush to positive zero. For later
architectures it is assumed the software will flush to zero preserving sign.
Change-Id: Icc5928633ba222a4ba3ca8c0df44a440445865fd
llvm-svn: 223110
Add checkDecodedInstruction for post-decode checking of instructions, to catch
the corner cases like HVC that don't fit into the general pattern. Needed to
check for an invalid condition field in instruction encoding despite HVC not
taking a predicate.
Patch by Matthew Wahab.
Change-Id: I48e28de981d7a9e43569594da3c45fb478b4f795
llvm-svn: 222992
The AAPCS treats small structs and homogeneous floating (or vector) aggregates
specially, and guarantees they either get passed as a contiguous block of
registers, or prevent any future use of those registers and get passed on the
stack.
This concept can fit quite neatly into LLVM's own type system, mapping an HFA
to [N x float] and so on, and small structs to [N x i64]. Doing so allows
front-ends to emit AAPCS compliant code without having to duplicate the
register counting logic.
llvm-svn: 222903
The string data for string-valued build attributes were being unconditionally
uppercased. There is no mention in the ARM ABI addenda about case conventions,
so it's technically implementation defined as to whether the data are
capitialised in some way or not. However, there are good reasons not to
captialise the data.
* It's less work.
* Some vendors may legitimately have case-sensitive checks for these
attributes which would fail on LLVM generated object files.
* There could be locale issues with uppercasing.
The original reasons for uppercasing appear to have stemmed from an
old codesourcery toolchain behaviour, see
http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133
This patch makes the object file emitted no longer captialise string
data, it encodes as seen in the assembly source.
Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538
llvm-svn: 222882
These recently all grew a unique_ptr<TargetLoweringObjectFile> member in
r221878. When anyone calls a virtual method of a class, clang-cl
requires all virtual methods to be semantically valid. This includes the
implicit virtual destructor, which triggers instantiation of the
unique_ptr destructor, which fails because the type being deleted is
incomplete.
This is just part of the ongoing saga of PR20337, which is affecting
Blink as well. Because the MSVC ABI doesn't have key functions, we end
up referencing the vtable and implicit destructor on any virtual call
through a class. We don't actually end up emitting the dtor, so it'd be
good if we could avoid this unneeded type completion work.
llvm-svn: 222480
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.
This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...
llvm-svn: 222334
Having two ways to do this doesn't seem terribly helpful and
consistently using the insert version (which we already has) seems like
it'll make the code easier to understand to anyone working with standard
data structures. (I also updated many references to the Entry's
key and value to use first() and second instead of getKey{Data,Length,}
and get/setValue - for similar consistency)
Also removes the GetOrCreateValue functions so there's less surface area
to StringMap to fix/improve/change/accommodate move semantics, etc.
llvm-svn: 222319
This was motivated by a bug which caused code like this to be
miscompiled:
declare void @take_ptr(i8*)
define void @test() {
%addr1.32 = alloca i8
%addr2.32 = alloca i32, i32 1028
call void @take_ptr(i8* %addr1)
ret void
}
This was emitting the following assembly to get the value of %addr1:
add r0, sp, #1020
add r0, r0, #8
However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this
could not be assembled. The generated object file contained this,
resulting in r0 holding SP+8 rather tha SP+1028:
add r0, sp, #1020
add r0, sp, #8
This function looked like it could have caused miscompilations for
other combinations of registers and offsets (though I don't think it is
currently called with these), and the heuristic it used did not match
the emitted code in all cases.
llvm-svn: 222125
We use to track quite a few "adjusted" offsets through the FrameLowering code
to account for changes in the prologue instructions as we went and allow the
emission of correct CFA annotations. However, we were missing a couple of cases
and the code was almost impenetrable.
It's easier to just add any stack-adjusting instruction to a list and emit them
together.
llvm-svn: 222057
When we folded the DPR alignment gap into a push, we weren't noting the extra
distance from the beginning of the push to the FP, and so FP ended up pointing
at an incorrect offset.
The .cfi_def_cfa_offset directives are still wrong in this case, but I think
that can be improved by refactoring.
llvm-svn: 222056
Normally entries can only move to a lower address, but when that wasn't viable,
the user's block was considered anyway. Unfortunately, it went via
createNewWater which wasn't designed to handle the case where there's already
an island after the block.
Unfortunately, the test we have is slow and fragile, and I couldn't reduce it
to anything sane even with the @llvm.arm.space intrinsic. The test change here
is recreating the previous one after the change.
rdar://problem/18545506
llvm-svn: 221905
We were using a naive heuristic to determine whether a basic block already had
an unconditional branch at the end. This mostly corresponded to reality
(assuming branches got optimised) because there's not much point in a branch to
the next block, but could go wrong.
llvm-svn: 221904
Creating tests for the ConstantIslands pass is very difficult, since it depends
on precise layout details. Having the ability to precisely inject a number of
bytes into the stream helps greatly.
llvm-svn: 221903
With this patch MCDisassembler::getInstruction takes an ArrayRef<uint8_t>
instead of a MemoryObject.
Even on X86 there is a maximum size an instruction can have. Given
that, it seems way simpler and more efficient to just pass an ArrayRef
to the disassembler instead of a MemoryObject and have it do a virtual
call every time it wants some extra bytes.
llvm-svn: 221751
This commit adds a new pass that can inject checks before indirect calls to
make sure that these calls target known locations. It supports three types of
checks and, at compile time, it can take the name of a custom function to call
when an indirect call check fails. The default failure function ignores the
error and continues.
This pass incidentally moves the function JumpInstrTables::transformType from
private to public and makes it static (with a new argument that specifies the
table type to use); this is so that the CFI code can transform function types
at call sites to determine which jump-instruction table to use for the check at
that site.
Also, this removes support for jumptables in ARM, pending further performance
analysis and discussion.
Review: http://reviews.llvm.org/D4167
llvm-svn: 221708
This fixes a few cases of:
* Wrong variable name style.
* Lines longer than 80 columns.
* Repeated names in comments.
* clang-format of the above.
This make the next patch a lot easier to read.
llvm-svn: 221615
Some ARM FPUs only have 16 double-precision registers, rather than the
normal 32. LLVM represents this with the D16 target feature. This is
currently used by CodeGen to avoid using high registers when they are
not available, but the assembler and disassembler do not.
I fix this in the assmebler and disassembler rather than the
InstrInfo.td files, as the latter would require a large number of
changes everywhere one of the floating-point instructions is referenced
in the backend. This solution is similar to the one used for
co-processor numbers and MSR masks.
llvm-svn: 221341
We currently try to push an even number of registers to preserve 8-byte
alignment during a function's prologue, but only when the stack alignment is
prcisely 8. Many of the reasons for doing it apply also when that alignment > 8
(the extra store is often free, and can save another stack adjustment, though
less frequently for 16-byte stack alignment).
llvm-svn: 221321
We were making an attempt to do this by adding an extra callee-saved GPR (so
that there was an even number in the list), but when that failed we went ahead
and pushed anyway.
This had a couple of potential issues:
+ The .cfi directives we emit misplaced dN because they were based on
PrologEpilogInserter's calculation.
+ Unaligned stores can be less efficient.
+ Unaligned stores can actually fault (likely only an issue in niche cases,
but possible).
This adds a final explicit stack adjustment if all other options fail, so that
the actual locations of the registers match up with where they should be.
llvm-svn: 221320
register class tGPRRegClass if the target is thumb1.
This commit fixes a crash that occurs during register allocation which was
triggered when a virtual register defined by an inline-asm instruction had to
be spilled.
rdar://problem/18740489
llvm-svn: 221178
This CPU definition is redundant. The Cortex-A9 is defined as
supporting multiprocessing extensions. Remove its definition and
update appropriate tests.
LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only
difference between the two CPU definitions in ARM.td is that
cortex-a9-mp contains the feature FeatureMP for multiprocessing
extensions.
This is redundant since the Cortex-A9 is defined as having
multiprocessing extensions in the TRMs. armcc also defines the
Cortex-A9 as having multiprocessing extensions by default.
Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7
llvm-svn: 221166
This removes calls to isMaterializable in the following cases:
* It was redundant with a call to isDeclaration now that isDeclaration returns
the correct answer for materializable functions.
* It was followed by a call to Materialize. Just call Materialize and check EC.
llvm-svn: 221050
It appears to ignore or find ambiguous MachineInstrBuilder's conversion
operators that allow conversion to MachineInstr* and
MachineBasicBlock::bundle_iterator.
As a workaround, add an explicit way to get the MachineInstr.
llvm-svn: 221017
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.
** Context **
Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.
** Motivating Example **
Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
%in1 = load <2 x i32>* %addr1, align 8
%extract = extractelement <2 x i32> %in1, i32 1
%out = or i32 %extract, 1
store i32 %out, i32* %dest, align 4
ret void
}
As it is, this IR generates the following assembly on armv7:
vldr d16, [r0] @vector load
vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles
orr r0, r0, #1 @ scalar bitwise or
str r0, [r1] @ scalar store
bx lr
Whereas we could generate much faster code:
vldr d16, [r0] @ vector load
vorr.i32 d16, #0x1 @ vector bitwise or
vst1.32 {d16[1]}, [r1:32] @ vector extract + store
bx lr
Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.
** Proposed Solution **
To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.
Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).
The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.
For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.
[1] We can imagine targets that can combine an extractelement with other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.
Differential Revision: http://reviews.llvm.org/D5921
<rdar://problem/14170854>
llvm-svn: 220978
Currently, the ARM backend will select the VMAXNM and VMINNM for these C
expressions:
(a < b) ? a : b
(a > b) ? a : b
but not these expressions:
(a > b) ? b : a
(a < b) ? b : a
This patch allows all of these expressions to be matched.
llvm-svn: 220671
This updates check for double precision zero floating point constant to allow
use of instruction with immediate value rather than temporary register.
Currently "a == 0.0", where "a" is of "double" type generates:
vmov.i32 d16, #0x0
vcmpe.f64 d0, d16
With this change it becomes:
vcmpe.f64 d0, #0
Patch by Sergey Dmitrouk.
llvm-svn: 220486
Currently, the ARM disassembler will disassemble the Thumb2 memory hint
instructions (PLD, PLDW and PLI), even for targets which do not have
these instructions. This patch adds the required checks to the
disassmebler.
llvm-svn: 220472
The 32-bit variants of the NEON scalar<->GPR move instructions are
also available in VFPv2. The 8- and 16-bit variants do require NEON.
Note that the checks in the test file are all -DAG because they are
checking a mixture of stdout and stderr, and the ordering is not
guaranteed.
llvm-svn: 220288
The previous code had a few problems, motivating the choices here.
1. It could create instructions clobbering CPSR, but the incoming MachineInstr
didn't reflect this. A potential source of corruption. This is why the patch
has a new PseudoInst for before lowering.
2. Similarly, there was some code to handle the incoming instruction not being
ARMCC::AL, but this would have caused massive problems if it was actually
invoked when a complex offset needing more than one instruction was requested.
3. It wasn't designed to handle unaligned pointers (or offsets). These should
probably be minimised anyway, but the code needs to deal with them properly
regardless.
4. It had some rather dubious ad-hoc code to avoid calling
emitThumbRegPlusImmediate, a function which should be designed to do precisely
this job.
We seem to cover the common cases correctly now, and hopefully can enhance
emitThumbRegPlusImmediate to handle any extra optimisations we need to add in
future.
llvm-svn: 220236
The current instruction selection patterns for SMULW[BT] and SMLAW[BT]
are incorrect. These instructions multiply a 32-bit and a 16-bit value
(both signed) and return the top 32 bits of the 48-bit result. This
preserves the 16 bits of overflow, whereas the patterns they currently
match truncate the result to 16 bits then sign extend.
To select these instructions, we would need to match an ISD::SMUL_LOHI,
a sign extend, two shifts and an or. There is no way to match SMUL_LOHI
in an instruction pattern as it defines multiple values, so this would
have to be done in C++. I have raised
http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct
selection of these instructions.
This fixes http://llvm.org/bugs/show_bug.cgi?id=19396
llvm-svn: 220196
This function can, for some offsets from the SP, split one instruction
into two. Since it re-uses the original instruction as the first
instruction of the result, we need ensure its result register is not
marked as dead before we use it in the second instruction.
llvm-svn: 220194
The bug is in ARMConstantIslands::createNewWater where the upper bound of the
new water split point is computed:
// This could point off the end of the block if we've already got constant
// pool entries following this block; only the last one is in the water list.
// Back past any possible branches (allow for a conditional and a maximally
// long unconditional).
if (BaseInsertOffset + 8 >= UserBBI.postOffset()) {
BaseInsertOffset = UserBBI.postOffset() - UPad - 8;
DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset));
}
The split point is supposed to be somewhere between the machine instruction that
loads from the constant pool entry and the end of the basic block, before branch
instructions. The code above is fine if the basic block is large enough and
there are a sufficient number of instructions following the machine instruction.
However, if the machine instruction is near the end of the basic block,
BaseInsertOffset can point to the machine instruction or another instruction
that precedes it, and this can lead to convergence failure.
This commit fixes this bug by ensuring BaseInsertOffset is larger than the
offset of the instruction following the constant-loading instruction.
rdar://problem/18581150
llvm-svn: 220015
Early attempts to support AAPCS bare metal MachO targets based the decision on
the CPU being compiled for. This was not a particularly great idea and we've
got a better option now, but this check remained.
No functional change for any target we care about.
llvm-svn: 219767
Thumb1 has legitimate reasons for preferring 32-bit alignment of types
i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be
a multiple of 4. However, this is a trade-off betweem code size and RAM usage;
the DataLayout string is not the best place to represent it even if desired.
So this patch removes the extra Thumb requirements, hopefully making ARM and
Thumb completely compatible in this respect.
llvm-svn: 219734
There's no hard requirement on LLVM to align local variable to 32-bits, so the
Thumb1 frame handling needs to be able to deal with variables that are only
naturally aligned without falling over.
llvm-svn: 219733
Before, ARM and Thumb mode code had different preferred alignments, which could
lead to some rather unexpected results. There's justification for reducing it
from the default 64-bits (wasted space), but I don't think there is for going
below 32-bits.
There's no actual ABI change here, just to reassure people.
llvm-svn: 219719
On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member
variables private and clean up uses to go through the existing accessors.
NFC.
llvm-svn: 219573
This must be enforced for all v6M cores, not just the cortex-m0,
irregardless of the user-specified alignment.
Patch by Charlie Turner.
llvm-svn: 219300
These will make it easier to test further changes to the
code generation and optimization pipelines as those are
moved to subtargets initialized with target feature and
target cpu.
llvm-svn: 219106
That commit was introduced in order to help investigate a problem in ARM
codegen breaking from commit 202304 (Add a limit to the heuristic that register
allocates instructions in local order). Recent analisys indicated that the
problem no longer exists, so I'm reverting this change.
See PR18996.
llvm-svn: 218981
As with x86 and AArch64, certain situations can arise where we need to spill
CPSR in the middle of a calculation. These should be avoided where possible
(MRS/MSR is rather expensive), which ARM is actually better at than the other
two since it tries to Glue defs to uses, but as a last ditch effort, copying is
better than crashing.
rdar://problem/18011155
llvm-svn: 218789
Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions
when targeting ARMv8, but they are actually present on any target with
FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an
M-profile core, but they have the same instructions so we model them
both as FPARMv8 in the ARM backend.
llvm-svn: 218763
The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and
FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be
modelled using the same target feature, and all double-precision
operations are already disabled by the fp-only-sp target features.
llvm-svn: 218747
This patch makes the ARM backend transform 3 operand instructions such as
'adds/subs' to the 2 operand version of the same instruction if the first
two register operands are the same.
Example: 'adds r0, r0, #1' will is transformed to 'adds r0, #1'.
Currently for some instructions such as 'adds' if you try to assemble
'adds r0, r0, #8' for thumb v6m the assembler would throw an error message
because the immediate cannot be encoded using 3 bits.
The backend should be smart enough to transform the instruction to
'adds r0, #8', which allows for larger immediate constants.
Patch by Ranjeet Singh.
llvm-svn: 218521
On ARM NEON, VAND with immediate (16/32 bits) is an alias to VBIC ~imm with
the same type size. Adding that logic to the parser, and generating VBIC
instructions from VAND asm files.
This patch also fixes the validation routines for NEON splat immediates which
were wrong.
Fixes PR20702.
llvm-svn: 218450
The Thumb2 BXJ instruction (Branch and Exchange Jazelle) is not
defined for v7M or v8A. It is defined for all other Thumb2-supporting
architectures (v6T2, v7A and v7R).
llvm-svn: 218445
If it's safe to clobber the condition flags, we can do a few extra things:
it's then possible to reset the base register writeback using a SUBS, so
we can try to merge even if the base register isn't dead after the merged
instruction.
This is effectively a (heavily bug-fixed) rewrite of r208992.
llvm-svn: 218386
v7M only allows the 16-bit encoding of the 'cps' (Change Processor
State) instruction, and does not have the 32-bit encoding which is
valid from v6T2 onwards.
llvm-svn: 218382
Summary:
The goal is to eventually remove all the code related to getInsertFencesForAtomic
in SelectionDAGBuilder as it is wrong (designed for ARM, not really portable, works
mostly by accident because the backends are overly conservative), and repeats the
same logic that goes in emitLeading/TrailingFence.
In this patch, I make AtomicExpandPass insert the fences as it knows better
where to put them. Because this requires getting the fences and not just
passing an IRBuilder around, I had to change the return type of
emitLeading/TrailingFence.
This code only triggers on ARM for now. Because it is earlier in the pipeline
than SelectionDAGBuilder, it triggers and lowers atomic accesses to atomic so
SelectionDAGBuilder does not add barriers anymore on ARM.
If this patch is accepted I plan to implement emitLeading/TrailingFence for all
backends that setInsertFencesForAtomic(true), which will allow both making them
less conservative and simplifying SelectionDAGBuilder once they are all using
this interface.
This should not cause any functionnal change so the existing tests are used
and not modified.
Test Plan: make check-all, benefits from existing tests of atomics on ARM
Reviewers: jfb, t.p.northover
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5179
llvm-svn: 218329
The fix is slightly different then x86 (see r216117) because the number of values
attached to a return can vary even for a single returned value (e.g., f64 yields
two returned values).
<rdar://problem/18352998>
llvm-svn: 218076
Summary:
This patch was originally in D5304 (I could not find a way to reopen that revision).
It was accepted, commited and broke the build bots because the overloading of
the constructor of ArrayRef for braced initializer lists is not supported by all
toolchains. I then reverted it, and propose this fixed version that uses a plain
C array instead in makeDMB (that array is then converted implicitly to an
ArrayRef, but that is not behind an ifdef). Could someone confirm me whether
initialization lists for plain C arrays are supported by every toolchain used
to build llvm ? Otherwise I can just initialize the array in the old way:
args[0] = ...; .. ; args[5] = ...;
Below is the description of the original patch:
```
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.
Thanks to luqmana for having noticed this bug.
```
Test Plan: Added more cases to atomic-load-store.ll + make check-all
Reviewers: jfb, t.p.northover, luqmana
Subscribers: llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D5386
llvm-svn: 218066
Certain directives are unsupported on Windows (some of which could/should be
supported). We would not diagnose the use but rather crash during the emission
as we try to access the Target Streamer. Add an assertion to prevent creating a
NULL reference (which is not permitted under C++) as well as a test to ensure
that we can diagnose the disabled directives.
llvm-svn: 218014
Rather than relying on support for a specific directive to determine if we are
targeting MachO, explicitly check the output format.
As an additional bonus, cleanup the caret diagnostic for the non-MachO case and
avoid the spurious error caused by not discarding the statement.
llvm-svn: 218012
It is breaking the build on the buildbots but works fine on my machine, I revert
while trying to understand what happens (it appears to depend on the compiler used
to build, I probably used a C++11 feature that is not perfectly supported by some
of the buildbots).
This reverts commit feb3176c4d006f99af8b40373abd56215a90e7cc.
llvm-svn: 217973
Summary:
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.
Thanks to luqmana for having noticed this bug.
Test Plan: Added more cases to atomic-load-store.ll + make check-all
Reviewers: jfb, t.p.northover, luqmana
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5304
llvm-svn: 217965
This required a new hook called hasLoadLinkedStoreConditional to know whether
to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to
CmpXchg (X86).
Apart from that, the new code in AtomicExpandPass is mostly moved from
X86AtomicExpandPass. The main result of this patch is to get rid of that
pass, which had lots of code duplicated with AtomicExpandPass.
llvm-svn: 217928
ADDS/SUBS unless it's safe to clobber the condition flags.
If the merged instructions are in a range where the CPSR is live,
e.g. between a CMP -> Bcc, we can't safely materialize a new base
register.
This problem is quite rare, I couldn't come up with a test case and I've
never actually seen this happen in the tests I'm running - there is a
potential trigger for this in LNT/oggenc (spills being inserted between
a CMP/Bcc), but at the moment this isn't being merged. I'll try to
reduce that into a small test case once I've committed my upcoming patch
to make merging less conservative.
llvm-svn: 217881
objects. There were a few FIXMEs in ARMAsmBackend.cpp suggesting the class
definitions should be in a separate file. Starting with ARMAsmBackend, the
class definition has been put in a header file, and #includes reduced. Each
sub-type of ARMAsmBackend is now in its own header file.
Derived types have been painted with a different color of bike-shed:
s/DarwinARMAsmBackend/ARMAsmBackendDarwin/g
s/ARMWinCOFFAsmBackend/ARMAsmBackendWinCOFF/g
s/ELFARMAsmBackend/ARMAsmBackendELF/g
Finally, clang-format has been run across ARMAsmBackend.cpp
llvm-svn: 217866
Cross-class copies being expensive is actually a trait of the microarchitecture, but as I haven't yet seen an example of a microarchitecture where they're cheap it seems best to just enable this by default, covering the non-mcpu build case.
llvm-svn: 217674
"Unroll" is not the appropriate name for this variable. Clang already uses
the term "interleave" in pragmas and metadata for this.
Differential Revision: http://reviews.llvm.org/D5066
llvm-svn: 217528
The only Thumb-1 multi-store capable of using LR is the PUSH instruction, which
translates to STMDB, so we shouldn't convert STMIAs.
Patch by Sergey Dmitrouk.
llvm-svn: 217498
Summary:
Split shouldExpandAtomicInIR() into different versions for Stores/Loads/RMWs/CmpXchgs.
Makes runOnFunction cleaner (no more redundant checking/casting), and will help moving
the X86 backend to this pass.
This requires a way of easily detecting which instructions are atomic.
I followed the pattern of mayReadFromMemory, mayWriteOrReadMemory, etc.. in making
isAtomic() a method of Instruction implemented by a switch on the opcodes.
Test Plan: make check
Reviewers: jfb
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D5035
llvm-svn: 217080
Fixes two latent bugs:
- There was no fence inserted before expanded seq_cst load (unsound on Power)
- There was only a fence release before seq_cst stores (again unsound, in particular on Power)
It is not even clear if this is correct on ARM swift processors (where release fences are
DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift
as it is not clear whether it is incorrect. I would love to get documentation stating
whether it is correct or not.
These two bugs were not triggered because Power is not (yet) using this pass, and these
behaviours happen to be (mostly?) working on ARM
(although they completely butchered the semantics of the llvm IR).
See:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html
for an example of the problems that can be caused by the second of these bugs.
I couldn't see a way of fixing these in a completely target-independent way without
adding lots of unnecessary fences on ARM, hence the target-dependent parts of this
patch.
This patch implements the new target-dependent parts only for ARM (the default
of not doing anything is enough for AArch64), other architectures will use this
infrastructure in later patches.
llvm-svn: 217076
This is the final round of renaming. This changes tblgen to emit lower-case
function names for FastEmitInst_* and FastEmit_*, and updates all its uses
in the source code.
Reviewed by Eric
llvm-svn: 217075
This removes static initializers from the backends which generate this data, and also makes this struct match the other Tablegen generated structs in behaviour
Reviewed by Andy Trick and Chandler C
llvm-svn: 216919
This patch implements a few changes related to the Thumb2 M-class MSR instruction:
* better handling of unpredictable encodings,
* recognition of the _g and _nzcvqg variants by the asm parser only if the DSP
extension is available, preferred output of MSR APSR moves with the _<bits>
suffix for v7-M.
Patch by Petr Pavlu.
llvm-svn: 216874