VMOVs are not strictly speaking cheap, but they are as expensive as a vector
copy (VORR), so we should prefer rematerialization over splitting when it
applies.
rdar://problem/23754176
llvm-svn: 257545
ARMv8.2-A adds 16-bit floating point versions of all existing SIMD
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
Note that VFP without SIMD is not a valid combination for any version of
ARMv8-A, but I have ensured that these instructions all depend on both
FeatureNEON and FeatureFullFP16 for consistency.
Differential Revision: http://reviews.llvm.org/D15039
llvm-svn: 255764
After much discussion, ending here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html
it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.
r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
llvm-svn: 255387
Summary:
The mid-end was generating vector smin/smax/umin/umax nodes, but
we were using vbsl to generatate the code. This adds the vmin/vmax
patterns and a test to check that we are now generating vmin/vmax
instructions.
Reviewers: rengolin, jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D12105
llvm-svn: 245439
This overrides the default to more closely resemble the hand-crafted matching logic in ISelLowering. It makes sense, as there is no VFP equivalent of vmin or vmax, to use them when they're available even if in general VFP ops should be preferred.
This should be NFC.
llvm-svn: 244915
Lower Intrinsic::arm_neon_vmins/vmaxs to fminnan/fmaxnan and match that instead. This is important because SDAG will soon be able to select FMINNAN itself, so we need a unified lowering path for intrinsics and SDAG.
NFCI.
llvm-svn: 244593
Lower the intrinsic to a FMINNUM/FMAXNUM node and select that instead. This is important because soon SDAG will be able to select FMINNUM/FMAXNUM itself, so we need an integrated lowering path between SDAG and intrinsics.
NFCI.
llvm-svn: 244592
[DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235989
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235977
Anton tried this 5 years ago but it was reverted due to extra VMOVs
being emitted. This can be easily fixed with a liberal application
of patterns - matching loads/stores and extractelts.
llvm-svn: 232958
The 32-bit variants of the NEON scalar<->GPR move instructions are
also available in VFPv2. The 8- and 16-bit variants do require NEON.
Note that the checks in the test file are all -DAG because they are
checking a mixture of stdout and stderr, and the ordering is not
guaranteed.
llvm-svn: 220288
On ARM NEON, VAND with immediate (16/32 bits) is an alias to VBIC ~imm with
the same type size. Adding that logic to the parser, and generating VBIC
instructions from VAND asm files.
This patch also fixes the validation routines for NEON splat immediates which
were wrong.
Fixes PR20702.
llvm-svn: 218450
The commit after this changes { } and 0bxx literals to be of type bits<n> and not int. This means we need to write exactly the right number of bits, and not rely on the values being silently zero extended for us.
llvm-svn: 215082
It's bad enough that I have to look up 5 different levels of TableGen class
definitions to work out what bits go where in a simple NEON instruction anyway,
without having to keep track of umpteen unused parameters.
llvm-svn: 207420
Added support for bytes replication feature, so it could be GAS compatible.
E.g. instructions below:
"vmov.i32 d0, 0xffffffff"
"vmvn.i32 d0, 0xabababab"
"vmov.i32 d0, 0xabababab"
"vmov.i16 d0, 0xabab"
are incorrect, but we could deal with such cases.
For first one we should emit:
"vmov.i8 d0, 0xff"
For second one ("vmvn"):
"vmov.i8 d0, 0x54"
For last two instructions it should emit:
"vmov.i8 d0, 0xab"
P.S.: In ARMAsmParser.cpp I have also fixed few nearby style issues in old code.
Just for keeping method bodies in harmony with themselves.
llvm-svn: 207080
alignments on vld/vst instructions. And report errors for
alignments that are not supported.
While this is a large diff and an big test case, the changes
are very straight forward. But pretty much had to touch
all vld/vst instructions changing the addrmode to one of the
new ones that where added will do the proper checking for
the specific instruction.
FYI, re-committing this with a tweak so MemoryOp's default
constructor is trivial and will work with MSVC 2012. Thanks
to Reid Kleckner and Jim Grosbach for help with the tweak.
rdar://11312406
llvm-svn: 205986
It doesn't build with MSVC 2012, because MSVC doesn't allow union
members that have non-trivial default constructors. This change added
'SMLoc AlignmentLoc' to MemoryOp, which made MemoryOp's default ctor
non-trivial.
This reverts commit r205930.
llvm-svn: 205944
alignments on vld/vst instructions. And report errors for
alignments that are not supported.
While this is a large diff and an big test case, the changes
are very straight forward. But pretty much had to touch
all vld/vst instructions changing the addrmode to one of the
new ones that where added will do the proper checking for
the specific instruction.
rdar://11312406
llvm-svn: 205930
The Cyclone CPU is similar to swift for most LLVM purposes, but does have two
preferred instructions for zeroing a VFP register. This teaches LLVM about
them.
llvm-svn: 205309
Similarly to the vshrn instructions, these are simple zext/sext + trunc
operations. Using normal LLVM IR should allow for better code, and more sharing
with the AArch64 backend.
llvm-svn: 201093
vshrn is just the combination of a right shift and a truncate (and the limits
on the immediate value actually mean the signedness of the shift doesn't
matter). Using that representation allows us to get rid of an ARM-specific
intrinsic, share more code with AArch64 and hopefully get better code out of
the mid-end optimisers.
llvm-svn: 201085
There was an extremely confusing proliferation of LLVM intrinsics to implement
the vacge & vacgt instructions. This combines them all into two polymorphic
intrinsics, shared across both backends.
llvm-svn: 200768
Some of the SHA instructions take a scalar i32 as one argument (largely because
they work on 160-bit hash fragments). This wasn't reflected in the IR
previously, with ARM and AArch64 choosing different types (<4 x i32> and <1 x
i32> respectively) which was ugly.
This makes all the affected intrinsics take a uniform "i32", allowing them to
become non-polymorphic at the same time.
llvm-svn: 200706
The fused multiply instructions were added in VFPv4 but are still NEON
instructions, in particular they shouldn't be available on a Cortex-M4 not
matter how floaty it is.
llvm-svn: 193342
If an alias inherits directly from InstAlias then it doesn't get any default
"Requires" values, so llvm-mc will allow it even on architectures that don't
support the underlying instruction.
This tidies up the obvious VFP and NEON cases I found.
llvm-svn: 193340
This reverts commit r189648.
Fixes for the previously failing clang-side arm_neon_intrinsics test
cases will be checked in separately.
llvm-svn: 189841
In addition to recognizing when the multiply's second argument is
coming from an explicit VDUPLANE, also look for a plain scalar
f32 reference and reference it via the corresponding vector
lane.
rdar://14870054
llvm-svn: 189619
The vqdmlal and vqdmlls instructions are really just a fused pair consisting of
a vqdmull.sN and a vqadd.sN. This adds patterns to LLVM so that we can switch
Clang's CodeGen over to generating these instead of the special vqdmlal
intrinsics.
llvm-svn: 189480
These instructions aren't particularly complicated and it's well worth having
patterns for some reasonably useful LLVM IR that will match them. Soon we
should be able to switch Clang over to producing this natural version.
llvm-svn: 189335
The instruction to convert between floating point and fixed point representations
takes an immediate operand for the number of fractional bits of the fixed point
value. ARMARM specifies that when that number of bits is zero, the assembler
should encode floating point/integer conversion instructions.
This patch adds the necessary instruction aliases to achieve this behaviour.
llvm-svn: 189009
After Ulrich's r180677 (thanks!) TableGen is intelligent enough to
handle tied constraints involving complex operands properly, so
virtually all of the ARM custom converters are now unnecessary.
llvm-svn: 186810