assembler should also accept a two arg form, as the docuemntation specifies that
the first (destination) register is optional.
This patch uses TwoOperandAliasConstraint to add the two argument form.
It also fixes an 80-column formatting problem in:
test/MC/ARM/neon-bitwise-encoding
<rdar://problem/12909419> Clang rejects ARM NEON assembly instructions
llvm-svn: 175221
The parser will now accept instructions with alignment specifiers written like
vld1.8 {d16}, [r0:64]
, while also still accepting the incorrect syntax
vld1.8 {d16}, [r0, :64]
llvm-svn: 175164
Lower reverse shuffles to a vrev64 and a vext instruction instead of the default
legalization of storing and loading to the stack. This is important because we
generate reverse shuffles in the loop vectorizer when we reverse store to an
array.
uint8_t Arr[N];
for (i = 0; i < N; ++i)
Arr[N - i - 1] = ...
radar://13171760
llvm-svn: 174929
function is successfully handled by fast-isel. That's because function
arguments are *always* handled by SDISel. Introduce FastLowerArguments to
allow each target to provide hook to handle formal argument lowering.
As a proof-of-concept, add ARMFastIsel::FastLowerArguments to handle
functions with 4 or fewer scalar integer (i8, i16, or i32) arguments. It
completely eliminates the need for SDISel for trivial functions.
rdar://13163905
llvm-svn: 174855
Adds a function to target transform info to query for the cost of address
computation. The cost model analysis pass now also queries this interface.
The code in LoopVectorize adds the cost of address computation as part of the
memory instruction cost calculation. Only there, we know whether the instruction
will be scalarized or not.
Increase the penality for inserting in to D registers on swift. This becomes
necessary because we now always assume that address computation has a cost and
three is a closer value to the architecture.
radar://13097204
llvm-svn: 174713
Use the validateTargetOperandClass() hook to match literal '#0' operands in
InstAlias definitions. Previously this required per-instruction C++ munging of the
operand list, but not is handled as a natural part of the matcher. Much better.
No additional tests are required, as the pre-existing tests for these instructions
exercise the new behaviour as being functionally equivalent to the old.
llvm-svn: 174488
Swift has a renaming dependency if we load into D subregisters. We don't have a
way of distinguishing between insertelement operations of values from loads and
other values. Therefore, we are pessimistic for now (The performance problem
showed up in example 14 of gcc-loops).
radar://13096933
llvm-svn: 174300
infrastructure on MCStreamer to test for whether there is an
MCELFStreamer object available.
This is just a cleanup on the AsmPrinter side of things, moving ad-hoc
tests of random APIs to a direct type query. But the AsmParser
completely broken. There were no tests, it just blindly cast its
streamer to an MCELFStreamer and started manipulating it.
I don't have a test case -- this actually failed on LLVM's own
regression test suite. Unfortunately the failure only appears when the
stars, compilers, and runtime align to misbehave when we read a pointer
to a formatted_raw_ostream as-if it were an MCAssembler. =/
UBSan would catch this immediately.
Many thanks to Matt for doing about 80% of the debugging work here in
GDB, Jim for helping to explain how exactly to fix this, and others for
putting up with the hair pulling that ensued during debugging it.
llvm-svn: 174118
isa<> and dyn_cast<>. In several places, code is already hacking around
the absence of this, and there seem to be several interfaces that might
be lifted and/or devirtualized using this.
This change was based on a discussion with Jim Grosbach about how best
to handle testing for specific MCStreamer subclasses. He said that this
was the correct end state, and everything else was too hacky so
I decided to just make it so.
No functionality should be changed here, this is just threading the kind
through all the constructors and setting up the classof overloads.
llvm-svn: 174113
This patch adds support for AArch64 (ARM's 64-bit architecture) to
LLVM in the "experimental" category. Currently, it won't be built
unless requested explicitly.
This initial commit should have support for:
+ Assembly of all scalar (i.e. non-NEON, non-Crypto) instructions
(except the late addition CRC instructions).
+ CodeGen features required for C++03 and C99.
+ Compilation for the "small" memory model: code+static data <
4GB.
+ Absolute and position-independent code.
+ GNU-style (i.e. "__thread") TLS.
+ Debugging information.
The principal omission, currently, is performance tuning.
This patch excludes the NEON support also reviewed due to an outbreak of
batshit insanity in our legal department. That will be committed soon bringing
the changes to precisely what has been approved.
Further reviews would be gratefully received.
llvm-svn: 174054
and update ELF header e_flags.
Currently gathering information such as symbol,
section and data is done by collecting it in an
MCAssembler object. From MCAssembler and MCAsmLayout
objects ELFObjectWriter::WriteObject() forms and
streams out the ELF object file.
This patch just adds a few members to the MCAssember
class to store and access the e_flag settings. It
allows for runtime additions to the e_flag by
assembler directives. The standalone assembler can
get to MCAssembler from getParser().getStreamer().getAssembler().
This patch is the generic infrastructure and will be
followed by patches for ARM and Mips for their target
specific use.
Contributer: Jack Carter
llvm-svn: 173882
Changing ARMBaseTargetMachine to return ARMTargetLowering intead of
the generic one (similar to x86 code).
Tests showing which instructions were added to cast when necessary
or cost zero when not. Downcast to 16 bits are not lowered in NEON,
so costs are not there yet.
llvm-svn: 173849
The ARM and Thumb variants of LDREXD and STREXD have different constraints and
take different operands. Previously the code expanding atomic operations didn't
take this into account and asserted in Thumb mode.
llvm-svn: 173780
conditions are met:
1. They share the same operand and are in the same BB.
2. Both outputs are used.
3. The target has a native instruction that maps to ISD::FSINCOS node or
the target provides a sincos library call.
Implemented the generic optimization in sdisel and enabled it for
Mac OSX. Also added an additional optimization for x86_64 Mac OSX by
using an alternative entry point __sincos_stret which returns the two
results in xmm0 / xmm1.
rdar://13087969
PR13204
llvm-svn: 173755
This was an experimental option, but needs to be defined
per-target. e.g. PPC A2 needs to aggressively hide latency.
I converted some in-order scheduling tests to A2. Hal is working on
more test cases.
llvm-svn: 171946
This is necessary not only for representing empty ranges, but for handling
multibyte characters in the input. (If the end pointer in a range refers to
a multibyte character, should it point to the beginning or the end of the
character in a char array?) Some of the code in the asm parsers was already
assuming this anyway.
llvm-svn: 171765
Absent a Contributor's License Agreement (CLA) with an LLVM legal entity and as
reviewed and agreed with Chris Lattner, add a patent license covering future
contributions from ARM until there is a CLA. This is to make explicit ARM's
grant of patent rights to recipients of LLVM containing ARM-contributed
material.
llvm-svn: 171721
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
utils/sort_includes.py script.
Most of these are updating the new R600 target and fixing up a few
regressions that have creeped in since the last time I sorted the
includes.
llvm-svn: 171362
directly.
This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.
llvm-svn: 171253
This affords us to use std::string's allocation routines and use the destructor
for the memory management. Switching to that also means that we can use
operator==(const std::string&, const char *) to perform the string comparison
rather than resorting to libc functionality (i.e. strcmp).
Patch by Saleem Abdulrasool!
Differential Revision: http://llvm-reviews.chandlerc.com/D230
llvm-svn: 171042
This function is often used to decorate dangling instructions, so a
context reference is required to allocate memory for the operands.
Also add a corresponding MachineInstrBuilder method.
llvm-svn: 170797
are more expensive than the non-flag setting variant. Teach thumb2 size
reduction pass to avoid generating them unless we are optimizing for size.
rdar://12892707
llvm-svn: 170728
MC disassembler clients (LLDB) are interested in querying if an
instruction may affect control flow other than by virtue of being
an explicit branch instruction. For example, instructions which
write directly to the PC on some architectures.
llvm-svn: 170610
Use the version that also takes an MF reference instead.
It would technically be possible to extract an MF reference from the MI
as MI->getParent()->getParent(), but that would not work for MIs that
are not inserted into any basic block.
Given the reasonably small number of places this constructor was used at
all, I preferred the compile time check to a run time assertion.
llvm-svn: 170588
((x & 0xff00) >> 8) << 2
to
(x >> 6) & 0x3fc
This is general goodness since it folds a left shift into the mask. However,
the trailing zeros in the mask prevents the ARM backend from using the bit
extraction instructions. And worse since the mask materialization may require
an addition instruction. This comes up fairly frequently when the result of
the bit twiddling is used as memory address. e.g.
= ptr[(x & 0xFF0000) >> 16]
We want to generate:
ubfx r3, r1, #16, #8
ldr.w r3, [r0, r3, lsl #2]
vs.
mov.w r9, #1020
and.w r2, r9, r1, lsr #14
ldr r2, [r0, r2]
Add a late ARM specific isel optimization to
ARMDAGToDAGISel::PreprocessISelDAG(). It folds the left shift to the
'base + offset' address computation; change the mask to one which doesn't have
trailing zeros and enable the use of ubfx.
Note the optimization has to be done late since it's target specific and we
don't want to change the DAG normalization. It's also fairly restrictive
as shifter operands are not always free. It's only done for lsh 1 / 2. It's
known to be free on some cpus and they are most common for address
computation.
This is a slight win for blowfish, rijndael, etc.
rdar://12870177
llvm-svn: 170581
To not over constrain the scheduler for ARM in thumb mode, some optimizations for code size reduction, specific to ARM thumb, are blocked when they add a dependency (like write after read dependency).
Disables this check when code size is the priority, i.e., code is compiled with -Oz.
llvm-svn: 170462
instruction.
This isn't strictly necessary at the moment because Thumb2SizeReduction
also copies all MI flags from the old instruction to the new. However, a
future patch will make that kind of direct flag tampering illegal.
llvm-svn: 170395
TargetLowering::getRegClassFor).
Some isSimple() guards were missing, or getSimpleVT() were hoisted too
far, resulting in asserts on valid LLVM assembly input.
llvm-svn: 170336
immediate generates the narrow version. Needed when doing round-trip
assemble/disassemble testing using the alternate syntax that specifies
'pc' directly.
llvm-svn: 170255
Accordingly, add helper funtions getSimpleValueType (in parallel to
getValueType) in SDValue, SDNode, and TargetLowering.
This is the first, in a series of patches.
This is the second attempt. In the first attempt (r169837), a few
getSimpleVT() were hoisted too far, detected by bootstrap failures.
llvm-svn: 170104
Add R_ARM_NONE and R_ARM_PREL31 relocation types
to MCExpr. Both of them will be used while
generating .ARM.extab and .ARM.exidx sections.
llvm-svn: 169965
mention the inline memcpy / memset expansion code is a mess?
This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset.
The first indicates whether it is expanding a memset or a memcpy / memmove.
The later is whether the memset is a memset of zero. It's totally possible
(likely even) that targets may want to do different things for memcpy and
memset of zero.
llvm-svn: 169959
Also added more comments to explain why it is generally ok to return true.
- Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to
be true for loaded source (memcpy) or zero constants (memset). The poor name
choice is probably some kind of legacy issue.
llvm-svn: 169954
ScalarTargetTransformInfo::getIntImmCost() instead. "Legal" is a poorly defined
term for something like integer immediate materialization. It is always possible
to materialize an integer immediate. Whether to use it for memcpy expansion is
more a "cost" conceern.
llvm-svn: 169929
Accordingly, add helper funtions getSimpleValueType (in parallel to
getValueType) in SDValue, SDNode, and TargetLowering.
This is the first, in a series of patches.
llvm-svn: 169837
This shouldn't affect codegen for -O0 compiles as tail call markers are not
emitted in unoptimized compiles. Testing with the external/internal nightly
test suite reveals no change in compile time performance. Testing with -O1,
-O2 and -O3 with fast-isel enabled did not cause any compile-time or
execution-time failures. All tests were performed on my x86 machine.
I'll monitor our arm testers to ensure no regressions occur there.
In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue
and objc_retainAutoreleaseReturnValue as tail calls unconditionally. While
it's theoretically true that this is just an optimization, it's an
optimization that we very much want to happen even at -O0, or else ARC
applications become substantially harder to debug.
Part of rdar://12553082
llvm-svn: 169796
1. Teach it to use overlapping unaligned load / store to copy / set the trailing
bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
if it's not possible to materialize an integer immediate with a single
instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
Also increase the threshold to something reasonable (8 for memset, 4 pairs
for memcpy).
This significantly improves Dhrystone, up to 50% on ARM iOS devices.
rdar://12760078
llvm-svn: 169791
Before this patch, when you objdump an LLVM-compiled file, objdump tried to
decode data-in-code sections as if they were code. This patch adds the missing
Mapping Symbols, as defined by "ELF for the ARM Architecture" (ARM IHI 0044D).
Patch based on work by Greg Fitzgerald.
llvm-svn: 169609
understand target implementation of any_extend / extload, just generate
zero_extend in place of any_extend for liveouts when the target knows the
zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz).
rdar://12771555
llvm-svn: 169536
The encoding of NOP in ARMAsmBackend.cpp is missing a trailing zero, which
causes the emission of a coprocessor instruction rather than "mov r0, r0"
as indicated in the comment. The test also checks for the wrong encoding.
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121203/157919.html
llvm-svn: 169420
This is for the lldb team so most of but not all of the values are
to be printed as hex with this option. Some small values like the
scale in an X86 address were requested to printed in decimal
without the leading 0x.
There may be some tweaks need to places that may still be in
decimal that they want in hex. Specially for arm. I made my best
guess. Any tweaks from here should be simple.
I also did the best I know now with help from the C++ gurus
creating the cleanest formatImm() utility function and containing
the changes. But if someone has a better idea to make something
cleaner I'm all ears and game for changing the implementation.
rdar://8109283
llvm-svn: 169393
textually as NativeClient. Also added a link to the native client project for
readers unfamiliar with it.
A Clang patch will follow shortly.
llvm-svn: 169291
missed in the first pass because the script didn't yet handle include
guards.
Note that the script is now able to handle all of these headers without
manual edits. =]
llvm-svn: 169224
This provides the same functionality as getRawAllocationOrder() for the
even/odd hints, but without the many constant register arrays.
llvm-svn: 169169
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
Codegen was failing with an assertion because of unexpected vector
operands when legalizing the selection DAG for a MUL instruction.
The asserting code was legalizing multiplies for vectors of size 128
bits. It uses a custom lowering to try and detect cases where it can
use a VMULL instruction instead of a VMOVL + VMUL. The code was
looking for input operands to the MUL that had been sign or zero
extended. If it found the extended operands it would drop the
sign/zero extension and use the original vector size as input to a
VMULL instruction.
The code assumed that the original input vector was 64 bits so that
after dropping the extension it would fit directly into a D register
and could be used as an operand of a VMULL instruction. The input
code that trigger the failure used a vector of <4 x i8> that was
sign extended to <4 x i32>. It was not safe to drop the sign
extension in this case because the original vector is only 32 bits
wide. The fix is to insert a sign extension for the vector to reach
the required 64 bit size. In this particular example, the vector would
need to be sign extented to a <4 x i16>.
llvm-svn: 169024
classes. The vast majority of the remaining issues are due to uses of
invalid registers, which are defined by getRegForValue(). Those will be
a little more challenging to cleanup.
rdar://12719844
llvm-svn: 168735
This patch replaces the hard coded GPR pair [R0, R1] of
Intrinsic:arm_ldrexd and [R2, R3] of Intrinsic:arm_strexd with
even/odd GPRPair reg class.
Similar to the lowering of atomic_64 operation.
llvm-svn: 168207
This patch changes the definition of negative from -0..-255 to -1..-255. I am changing this because of
a bug that we had in some of the patterns that assumed that "subs" of zero does not set the carry flag.
rdar://12028498
llvm-svn: 167963
This infrastructure is generally useful for any target that wants to
strongly prefer two instructions to be adjacent after scheduling.
A following checkin will add target-specific hooks with unit
tests. Then this feature will be enabled by default with misched.
llvm-svn: 167742
mov lr, pc
b.w _foo
The "mov" instruction doesn't set bit zero to one, it's putting incorrect
value in lr. It messes up backtraces.
rdar://12663632
llvm-svn: 167657
Improve ARM build attribute emission for architectures types.
This also changes the default architecture emitted for a generic CPU to "v7".
llvm-svn: 167574
registers. Previously, the register we being marked as implicitly defined, but
not killed. In some cases this would cause the register scavenger to spill a
dead register.
Also, use an empty register mask to simplify the logic and to reduce the memory
footprint.
rdar://12592448
llvm-svn: 167499
getIntPtrType support for multiple address spaces via a pointer type,
and also introduced a crasher bug in the constant folder reported in
PR14233.
These commits also contained several problems that should really be
addressed before they are re-committed. I have avoided reverting various
cleanups to the DataLayout APIs that are reasonable to have moving
forward in order to reduce the amount of churn, and minimize the number
of commits that were reverted. I've also manually updated merge
conflicts and manually arranged for the getIntPtrType function to stay
in DataLayout and to be defined in a plausible way after this revert.
Thanks to Duncan for working through this exact strategy with me, and
Nick Lewycky for tracking down the really annoying crasher this
triggered. (Test case to follow in its own commit.)
After discussing with Duncan extensively, and based on a note from
Micah, I'm going to continue to back out some more of the more
problematic patches in this series in order to ensure we go into the
LLVM 3.2 branch with a reasonable story here. I'll send a note to
llvmdev explaining what's going on and why.
Summary of reverted revisions:
r166634: Fix a compiler warning with an unused variable.
r166607: Add some cleanup to the DataLayout changes requested by
Chandler.
r166596: Revert "Back out r166591, not sure why this made it through
since I cancelled the command. Bleh, sorry about this!
r166591: Delete a directory that wasn't supposed to be checked in yet.
r166578: Add in support for getIntPtrType to get the pointer type based
on the address space.
llvm-svn: 167221
When the operand is a plain immediate rather than a label, print it
as [pc, #imm] like we do for the Thumb2 wide encoding variant.
rdar://12154503
llvm-svn: 166991
is 24 bits not 20 and the decoding needed to correctly handle converting the
J1 and J2 bits to their I1 and I2 values to reconstruct the displacement.
llvm-svn: 166982
Keep the integer_insertelement test case, the new coalescer can handle
this kind of lane insertion without help from pseudo-instructions.
llvm-svn: 166835