Use the zero register directly when possible to avoid an unnecessary register
copy and a wasted register at -O0. This also uses integer stores to store a
positive floating-point zero. This saves us from materializing the positive zero
in a register and then storing it.
llvm-svn: 216617
This teaches the AArch64 backend to deal with the operations required
to deal with the operations on v4f16 and v8f16 which are exposed by
NEON intrinsics, plus the add, sub, mul and div operations.
llvm-svn: 216555
When a shift with extension or an add with shift and extension cannot be folded
into the memory operation, then the address calculation has to be materialized
separately. While doing so the code forgot to consider a possible sign-/zero-
extension. This fix folds now also the sign-/zero-extension into the add or
shift instruction which is used to materialize the address.
This fixes rdar://problem/18141718.
llvm-svn: 216511
It seems on Darwin the illegal round-trip ::iterator -> MachineInstr* -> ::iterator breaks execution horribly when the iterator is not a real MachineInstr, like ::end().
llvm-svn: 216455
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)
llvm-svn: 216371
AtomicExpandLoadLinked is currently rather ARM-specific. This patch is the first of
a group that aim at making it more target-independent. See
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075873.html
for details
The command line option is "atomic-expand"
llvm-svn: 216231
This is mostly achieved by providing the correct register class manually,
because getRegClassFor always returns the GPR*AllRegClass for MVT::i32 and
MVT::i64.
Also cleanup the code to use the FastEmitInst_* method whenever possible. This
makes sure that the operands' register class is properly constrained. For all
the remaining cases this adds the missing constrainOperandRegClass calls for
each operand.
llvm-svn: 216225
The AdvSIMD pass may produce copies that are not coalescer-friendly. The
peephole optimizer knows how to fix that as demonstrated in the test case.
<rdar://problem/12702965>
llvm-svn: 216200
This fixes a bug I introduced in a previous commit (r216033). Sign-/Zero-
extension from i1 cannot be folded into the ADDS/SUBS instructions. Instead both
operands have to be sign-/zero-extended with separate instructions.
Related to <rdar://problem/17913111>.
llvm-svn: 216073
Use FMOVWSr/FMOVXDr instead of FMOVSr/FMOVDr, which have the proper register
class to be used with the zero register. This makes the MachineInstruction
verifier happy again.
This is related to <rdar://problem/18027157>.
llvm-svn: 216040
Factor out the ADDS/SUBS instruction emission code into helper functions and
make the helper functions more clever to support most of the different ADDS/SUBS
instructions the architecture support. This includes better immedediate support,
shift folding, and sign-/zero-extend folding.
This fixes <rdar://problem/17913111>.
llvm-svn: 216033
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.
Original commit message:
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.
For Example:
lsl x1, x1, #3 --> ldr x0, [x0, x1, lsl #3]
ldr x0, [x0, x1]
sxtw x1, w1
lsl x1, x1, #3 --> ldr x0, [x0, x1, sxtw #3]
ldr x0, [x0, x1]
llvm-svn: 216013
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.
Original commit message:
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.
Fixes <rdar://problem/17924413>.
llvm-svn: 216009
This fixes a few BuildMI callsites where the result register was added by
using addReg, which is per default a use and therefore an operand register.
Also use the zero register as result register when emitting a compare
instruction (SUBS with unused result register).
llvm-svn: 215997
Summary:
Make use of isAtLeastRelease/Acquire in the ARM/AArch64 backends
These helper functions are introduced in D4844.
Depends D4844
Test Plan: make check-all passes
Reviewers: jfb
Subscribers: aemerson, llvm-commits, mcrosier, reames
Differential Revision: http://reviews.llvm.org/D4937
llvm-svn: 215902
Externally-defined functions with weak linkage should not be
tail-called on ARM or AArch64, as the AAELF spec requires normal calls
to undefined weak functions to be replaced with a NOP or jump to the
next instruction. The behaviour of branch instructions in this
situation (as used for tail calls) is implementation-defined, so we
cannot rely on the linker replacing the tail call with a return.
llvm-svn: 215890
ARM in particular is getting dangerously close to exceeding 32 bits worth of
possible subtarget features. When this happens, various parts of MC start to
fail inexplicably as masks get truncated to "unsigned".
Mostly just refactoring at present, and there's probably no way to test.
llvm-svn: 215887
The floating-point value positive zero (+0.0) is a valid immedate value
according to isFPImmLegal. As a result AArch64 FastISel went ahead and
used the immediate version of fmov to materialize the constant.
The problem is that the immediate version of fmov cannot encode an imediate for
postive zero. Instead a fmov from the zero register was supposed to be used in
this case.
This fix adds handling for this special case and uses fmov from the zero
register to materialize a positive zero (negative zeroes go to the constant
pool).
There is no test case for this, because this code is currently dead. It will be
enabled in a future commit and I will add a test case in a separate commit
after that.
This fixes <rdar://problem/18027157>.
llvm-svn: 215753
Note: This reapplies r215582 without any modifications. The refactoring wasn't
responsible for the buildbot failures.
Original commit message:
Cleanup and prepare constant materialization code for future commits.
llvm-svn: 215752
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."
llvm-svn: 215673
Certain functions such as objc_autoreleaseReturnValue have to be called as
tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls,
we have to fall back to SelectionDAG to select calls that are marked as tail.
<rdar://problem/17991614>
llvm-svn: 215600
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.
For Example:
lsl x1, x1, #3 --> ldr x0, [x0, x1, lsl #3]
ldr x0, [x0, x1]
sxtw x1, w1
lsl x1, x1, #3 --> ldr x0, [x0, x1, sxtw #3]
ldr x0, [x0, x1]
llvm-svn: 215597
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.
Fixes <rdar://problem/17924413>.
llvm-svn: 215591
This is a cleaner solution to the problem described in r215431.
When instructions are combined a dangling DBG_VALUE is removed.
This resolves bug 20598.
llvm-svn: 215587
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)
Changes made by clang-tidy with minor tweaks.
llvm-svn: 215558
The combiner ignored DBG nodes when checking
the uses of a virtual register.
It combined a sequence like
%vreg1 = madd %vreg2, %vreg3,...
DBG_VALUE (%vreg1 ...)
%vreg4 = add %vreg1,...
to
%vreg4 = madd %vreg2, %vreg3
leaving behind a dangling DBG_VALUE with
a definition. This triggered an assertion
in the MachineTraceMetrics.cpp module.
llvm-svn: 215431
std::map invalidates the iterator to any element that gets deleted, which means
we can't increment it correctly afterwards. This was causing Darwin test
failures.
llvm-svn: 215233
For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations.
This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU.
Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness).
This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected.
llvm-svn: 215199
This short-circuited our error reporting for incorrectly specified
target triples (you'd get AArch64 code instead).
Should fix PR20567.
llvm-svn: 215191
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
llvm-svn: 215154
Re-commit of r214832,r21469 with a work-around that
avoids the previous problem with gcc build compilers
The work-around is to use SmallVector instead of ArrayRef
of basic blocks in preservesResourceLen()/MachineCombiner.cpp
llvm-svn: 215151
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.
Thanks to Lang Hames for making MCJIT a good replacement!
llvm-svn: 215111
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.
llvm-svn: 214988
For triple aarch64-linux-gnu we were incorrectly setting IRIX.
For triple aarch64 we are correctly setting SYSV.
Patch by Ana Pazos <apazos@codeaurora.org>.
llvm-svn: 214974
Specifically Cortex-A57. This probably applies to Cyclone too but I haven't enabled it for that as I can't test it.
This gives ~4% improvement on SPEC 174.vpr, and ~1% in 471.omnetpp.
llvm-svn: 214957
Instruction prefetch is not implemented for AArch64, it is incorrectly
translated into data prefetch instruction.
Differential Revision: http://reviews.llvm.org/D4777
llvm-svn: 214860
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.
llvm-svn: 214859
The original code would fail for unsupported value types like i1, i8, and i16.
This fix changes the code to only create a sub-register copy for i64 value types
and all other types (i1/i8/i16/i32) just use the source register without any
modifications.
getRegClassFor() is now guarded by the i64 value type check, that guarantees
that we always request a register for a valid value type.
llvm-svn: 214848
This implements basic argument lowering for AArch64 in FastISel. It only
handles a small subset of the C calling convention. It supports simple
arguments that can be passed in GPR and FPR registers.
This should cover most of the trivial cases without falling back to
SelectionDAG.
This fixes <rdar://problem/17890986>.
llvm-svn: 214846
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.
llvm-svn: 214838
sequence on AArch64
Re-commit of r214669 without changes to test cases
LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and
LLVM:: CodeGen/AArch64/dp-3source.ll
This resolves the reported compfails of the original commit.
llvm-svn: 214832
This fix changes the parameters #r and #s that are passed to the UBFM/SBFM
instruction to get the zero/sign-extension for free.
The original problem was that the shift left would use the 32-bit shift even for
i8/i16 value types, which could leave the upper bits set with "garbage" values.
The arithmetic shift right on the other side would use the wrong MSB as sign-bit
to determine what bits to shift into the value.
This fixes <rdar://problem/17907720>.
llvm-svn: 214788
scalar integer instruction pass.
This is a patch I had lying around from a few months ago. The pass is
currently disabled by default, so nothing to interesting.
llvm-svn: 214779
sequence - AArch64 target support
This patch turns off madd/msub generation in the DAGCombiner and generates
them in the MachineCombiner instead. It replaces the original code sequence
with the combined sequence when it is beneficial to do so.
When there is no machine model support it always generates the madd/msub
instruction. This is true also when the objective is to optimize for code
size: when the combined sequence is shorter is always chosen and does not
get evaluated.
When there is a machine model the combined instruction sequence
is evaluated for critical path and resource length using machine
trace metrics and the original code sequence is replaced when it is
determined to be faster.
rdar://16319955
llvm-svn: 214669
Add branch weights to branch instructions, so that the following passes can
optimize based on it (i.e. basic block ordering).
Fixes <rdar://problem/17887137>.
llvm-svn: 214537
The tbz/tbnz checks the sign bit to convert
op w1, w1, w10
cmp w1, #0
b.lt .LBB0_0
to
op w1, w1, w10
tbnz w1, #31, .LBB0_0
Differential Revision: http://reviews.llvm.org/D4440
llvm-svn: 214518
ADDS and SUBS cannot encode negative immediates or immediates larger than 12bit.
This fix checks if the immediate version can be used under this constraints and
if we can convert ADDS to SUBS or vice versa to support negative immediates.
Also update the test cases to test the immediate versions.
llvm-svn: 214470
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.
This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.
llvm-svn: 214449
Currently the large code model for MachO uses the GOT to make function calls.
Emit the required adrp and ldr instructions to load the address from the GOT.
Related to <rdar://problem/17733076>.
llvm-svn: 214381
UNDEF arguments are not ment to be touched - especially for the webkit_js
calling convention. This fix reproduces the already existing behavior of
SelectionDAG in FastISel.
llvm-svn: 214366
This improves the code generation for the XALU intrinsics when the
condition is feeding a select instruction.
This also updates and enables the XALU unit tests for FastISel.
This fixes <rdar://problem/17831117>.
llvm-svn: 214350
This improves the code generation for the XALU intrinsics when the
condition is feeding a branch instruction.
This is related to <rdar://problem/17831117>.
llvm-svn: 214349
This commit adds support for the {s|u}{add|sub|mul}.with.overflow intrinsics.
The unit tests for FastISel will be enabled in a later commit, once there is
also branch and select folding support.
This is related to <rdar://problem/17831117>.
llvm-svn: 214348
Currently the shift-immediate versions are not supported by tblgen and
hopefully this can be later removed, once the required support has been
added to tblgen.
llvm-svn: 214345
Rename to allowsMisalignedMemoryAccess.
On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.
llvm-svn: 214055