Commit Graph

341 Commits

Author SHA1 Message Date
Matthew Simpson 4355e404d5 [AArch64] Add DAG combine for extract extend pattern
This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) ->
(extract_vector_elt v, i). The combine enables us to better match some SMOV
patterns.

Differential Revision: http://reviews.llvm.org/D15515

llvm-svn: 255895
2015-12-17 14:30:55 +00:00
Manman Ren cbe4f9417d CXX_FAST_TLS calling convention: performance improvement for AArch64.
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given machine function and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.

The target independent portion was committed as r255353.
rdar://problem/23557469

Differential Revision: http://reviews.llvm.org/D15341

llvm-svn: 255821
2015-12-16 21:04:19 +00:00
Geoff Berry 8f5acb1bd1 Remove dead function AArch64TargetLowering::getFunctionAlignment. NFC.
Reviewers: t.p.northover, jmolloy, mcrosier

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D15458

llvm-svn: 255509
2015-12-14 17:01:10 +00:00
Hal Finkel cd8664c3c2 Revert r248483, r242546, r242545, and r242409 - absdiff intrinsics
After much discussion, ending here:

  http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html

it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.

r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation

llvm-svn: 255387
2015-12-11 23:11:52 +00:00
Pirama Arumuga Nainar 1317d5f311 Fix fptosi, fptoui from f16 vectors to i8, i16 vectors
Summary:
Convert f16 vectors to corresponding f32 vectors before doing the
conversion to int.

Add tests for v4f16, v8f16.

Reviewers: ab, jmolloy

Subscribers: llvm-commits, srhines

Differential Revision: http://reviews.llvm.org/D14936

llvm-svn: 255263
2015-12-10 17:16:49 +00:00
Ahmed Bougacha 97564c3a1b [AArch64][ARM] Don't base interleaved op legality on type alloc size.
Otherwise, we think that most types that look like they'd fit in a
legal vector type are legal (so, basically, *any* vector type with a
size between 33 and 128 bits, I think, since we use pow2 alignment;
e.g., v2i25, v3f32, ...).

DataLayout::getTypeAllocSize rounds up based on alignment.
When checking for target intrinsic legality, that's not what we want:
if rounding makes a difference, the type isn't legal, and the
target intrinsics shouldn't be used, as they are always assumed legal.

One could make the argument that alloc size is ultimately the most
relevant here, since we're dealing with LD/ST intrinsics. That's only
true if we did legalize them though; that's a problem for another day.

Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits.
Type::getSizeInBits can't be used because that'd gratuitously break
pointer vector support.

Some of these uses are currently fine, because we only hit them when
the type is already known legal (e.g., r114454). Update them for
consistency. It's faster to avoid the rounding anyway!

llvm-svn: 255089
2015-12-09 01:19:50 +00:00
Chad Rosier f3491496dc [AArch64] Expand vector SDIVREM/UDIVREM operations.
http://reviews.llvm.org/D15214
Patch by Ana Pazos <apazos@codeaurora.org>!

llvm-svn: 254773
2015-12-04 21:38:44 +00:00
Tim Northover f3be9d5c0b AArch64: fix 128-bit shifts
We mustn't introduce a shift of exactly 64-bits for any inputs, since that's an
UNDEF value (and worse, it's not what you want with the natural Arch64
implementation).

The generated code is pretty horrific, but I couldn't come up with an obviously
better alternative (if the amount is constant EXTR could help). Turns out
128-bit shifts are just nasty.

rdar://22491037

llvm-svn: 254475
2015-12-02 00:33:54 +00:00
Artyom Skrobov 314ee04268 Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC)
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.

Reviewers: MatzeB, andreadb, tstellarAMD

Subscribers: arsenm, jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D14945

llvm-svn: 254085
2015-11-25 19:41:11 +00:00
Ahmed Bougacha 88ddeae8bd [AArch64] Promote f16 SELECT_CC CC operands when op is legal.
SELECT_CC has the nasty property of having operands with unrelated
types. So if you do something like:

  f32 = select_cc f16, f16, f32, f32, cc

You'd only look for the action for <select_cc, f32>, but never f16.
If the types are all legal, but the op isn't (as for f16 on AArch64,
or for f128 on x86_64/AArch64?), then you get into trouble.
For f128, we have softenSetCCOperands to handle this case.

Similarly, for f16, we can directly promote the CC operands.

llvm-svn: 253344
2015-11-17 16:45:40 +00:00
Tim Northover 339c83e27f AArch64: add experimental support for address tagging.
AArch64 has the ability to use the top 8-bits of an "address" for extra
information, with the memory subsystem automatically masking them off for loads
and stores. When that's happening, we can sometimes skip masks on memory
operations in the compiler.

However, this requires the host OS and support stack to preserve those bits so
it can't be enabled everywhere. In principle iOS 8.0 and above do take the
required precautions and but we'll put it under a flag for now.

llvm-svn: 252573
2015-11-10 00:44:23 +00:00
Charlie Turner 7b7b06f737 [AArch64] Handle extract_subvector(..., 0) in ISel.
Summary:
Lowering this pattern early to an `EXTRACT_SUBREG` was making it impossible to match larger patterns in tblgen that use `extract_subvector(..., 0)` as part of the their input pattern.

It seems like there will exist somewhere a better way of specifying this pattern over all relevant register value types, but I didn't manage to find it.

Reviewers: t.p.northover, jmolloy

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14207

llvm-svn: 252464
2015-11-09 12:45:11 +00:00
Joseph Tremoulet f748c8937e [WinEH] Update exception pointer registers
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.

Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.

Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.


Reviewers: pgavlin, majnemer, rnk

Subscribers: jyknight, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D14344

llvm-svn: 252383
2015-11-07 01:11:31 +00:00
Charlie Turner 458e79b814 [ARM] Expand ROTL and ROTR of vector value types
Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection.

Reviewers: rengolin, t.p.northover

Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D14082

llvm-svn: 251401
2015-10-27 10:25:20 +00:00
Evgeniy Stepanov d1aad26589 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

The previous iteration of this change was reverted in r250461. This
version leaves the generic, compiler-rt based implementation in
SafeStack.cpp instead of moving it to TargetLoweringBase in order to
allow testing without a TargetMachine.

llvm-svn: 251324
2015-10-26 18:28:25 +00:00
Craig Topper 8fe40e0ed5 Change makeLibCall to take an ArrayRef<SDValue> instead of pointer and size. This removes the need to pass a hardcoded size in many places. NFC
llvm-svn: 251032
2015-10-22 17:05:00 +00:00
Charlie Turner 434d4599d4 [AArch64] Implement vector splitting on UADDV.
Summary: Fixes PR25056.

Reviewers: mcrosier, junbuml, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13466

llvm-svn: 250520
2015-10-16 15:38:25 +00:00
Evgeniy Stepanov 9addbc9fc1 Revert "[safestack] Fast access to the unsafe stack pointer on AArch64/Android."
Breaks the hexagon buildbot.

llvm-svn: 250461
2015-10-15 21:26:49 +00:00
Evgeniy Stepanov 142947e9f0 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

llvm-svn: 250456
2015-10-15 20:50:16 +00:00
Duncan P. N. Exon Smith d3b9df02b3 AArch64: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250216
2015-10-13 20:02:15 +00:00
Jun Bum Lim 0aace13d18 Improve ISel across lane float min/max reduction
In vectorized float min/max reduction code, the final "reduce" step
is sub-optimal. In AArch64, this change wll combine :

  svn0 = vector_shuffle t0, undef<2,3,u,u>
  fmin = fminnum t0,svn0
  svn1 = vector_shuffle fmin, undef<1,u,u,u>
  cc = setcc fmin, svn1, ole
  n0 = extract_vector_elt cc, #0
  n1 = extract_vector_elt fmin, #0
  n2 = extract_vector_elt fmin, #1
  result = select n0, n1,n2
into :
  result = llvm.aarch64.neon.fminnmv t0

This change extends r247575.

llvm-svn: 249834
2015-10-09 14:11:25 +00:00
Chad Rosier 7c6ac2b8f9 [AArch64] Fold a floating-point divide by power of two into fp conversion.
Part of http://reviews.llvm.org/D13442

llvm-svn: 249579
2015-10-07 17:51:37 +00:00
Chad Rosier fa30c9b436 [AArch64] Fold a floating-point multiply by power of two into fp conversion.
Part of http://reviews.llvm.org/D13442

llvm-svn: 249576
2015-10-07 17:39:18 +00:00
Jeroen Ketema aebca09543 [ARM][AArch64] Only lower to interleaved load/store if the target has NEON
Without an additional check for NEON, the compiler crashes during
legalization of NEON ldN/stN.

Differential Revision: http://reviews.llvm.org/D13508

llvm-svn: 249550
2015-10-07 14:53:29 +00:00
Chad Rosier 1f385618c0 [ARM] Typo. NFC.
llvm-svn: 249153
2015-10-02 16:42:59 +00:00
Sanjay Patel bbbf9a1a34 merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711)
This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).

The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner 
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.

This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.

The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.

Differential Revision: http://reviews.llvm.org/D12635
 

llvm-svn: 248622
2015-09-25 21:49:48 +00:00
Ahmed Bougacha 07a844d758 [AArch64] Emit clrex in the expanded cmpxchg fail block.
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.

Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.

Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".

Differential Revision: http://reviews.llvm.org/D13033

llvm-svn: 248291
2015-09-22 17:21:44 +00:00
Jun Bum Lim 34b9bd0435 Improve ISel using across lane min/max reduction
In vectorized integer min/max reduction code, the final "reduce" step
is sub-optimal. In AArch64, this change wll combine :
  %svn0 = vector_shuffle %0, undef<2,3,u,u>
  %smax0 = smax %0, svn0
  %svn3 = vector_shuffle %smax0, undef<1,u,u,u>
  %sc = setcc %smax0, %svn3, gt
  %n0 = extract_vector_elt %sc, #0
  %n1 = extract_vector_elt %smax0, #0
  %n2 = extract_vector_elt $smax0, #1
  %result = select %n0, %n1, n2
becomes :
  %1 = smaxv %0
  %result = extract_vector_elt %1, 0

This change extends r246790.

llvm-svn: 247575
2015-09-14 16:19:52 +00:00
Ahmed Bougacha 5246867384 [CodeGen] Refactor TLI/AtomicExpand interface to make LLSC explicit.
We used to have this magic "hasLoadLinkedStoreConditional()" callback,
which really meant two things:
- expand cmpxchg (to ll/sc).
- expand atomic loads using ll/sc (rather than cmpxchg).

Remove it, and, instead, introduce explicit callbacks:
- bool shouldExpandAtomicCmpXchgInIR(inst)
- AtomicExpansionKind shouldExpandAtomicLoadInIR(inst)

Differential Revision: http://reviews.llvm.org/D12557

llvm-svn: 247429
2015-09-11 17:08:28 +00:00
Ahmed Bougacha 9d677131c4 [CodeGen] Rename AtomicRMWExpansionKind to AtomicExpansionKind.
This lets us generalize its usage to the other atomic instructions.

llvm-svn: 247428
2015-09-11 17:08:17 +00:00
Jun Bum Lim 4d3c5986f2 Remove white space (test commit)
llvm-svn: 247021
2015-09-08 16:11:22 +00:00
Chad Rosier 6c36eff1d6 [AArch64] Improve ISel using across lane addition reduction.
In vectorized add reduction code, the final "reduce" step is sub-optimal.
This change wll combine :

ext  v1.16b, v0.16b, v0.16b, #8
add  v0.4s, v1.4s, v0.4s
dup  v1.4s, v0.s[1]
add  v0.4s, v1.4s, v0.4s

into

addv s0, v0.4s

PR21371
http://reviews.llvm.org/D12325
Patch by Jun Bum Lim <junbuml@codeaurora.org>!

llvm-svn: 246790
2015-09-03 18:13:57 +00:00
Ahmed Bougacha b0ff6437cb [AArch64] Lower READCYCLECOUNTER using MRS PMCCTNR_EL0.
This matches the ARM behavior. In both cases, the register is part
of the optional Performance Monitors extension, so, add the feature,
and enable it for the A-class processors we support.

Differential Revision: http://reviews.llvm.org/D12425

llvm-svn: 246555
2015-09-01 16:23:45 +00:00
Matthias Braun 0acbd08f3c AArch64: Fix loads to lower NEON vector lanes using GPR registers
The ISelLowering code turned insertion turned the element for the
lowest lane of a BUILD_VECTOR into an INSERT_SUBREG, this prohibited
the patterns for SCALAR_TO_VECTOR(Load) to match later. Restrict this
to cases without a load argument.

Reported in rdar://22223823

Differential Revision: http://reviews.llvm.org/D12467

llvm-svn: 246462
2015-08-31 18:25:15 +00:00
Silviu Baranga db1ddb32ce [AArch64] Unify the integer min/max vector selection patterns with the intrinsic ones
Summary:
This change lowers the aarch64 integer vector min/max intrinsic nodes to
generic min/max nodes and replaces the intrinsic selection patterns with
the generic ones.

There should already be testing in place for this, so no further tests
were added.

Reviewers: jmolloy

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12276

llvm-svn: 246030
2015-08-26 11:11:14 +00:00
Matthias Braun 46e5639806 AArch64: Fix cmp;ccmp ordering
When producing conditional compare sequences for or operations we need
to negate the operands and the finally tested flags. The thing is if we negate
the finally tested flags this equals a logical negation of all previously
emitted expressions. There was a case missing where we have to order OR
expressions so they get emitted first.

This fixes http://llvm.org/PR24459

llvm-svn: 245641
2015-08-20 23:33:34 +00:00
Matthias Braun 266204b7dc AArch64: Do not create CCMP on multiple users.
Create CMP;CCMP sequences from and/or trees does not gain us anything if
the and/or tree is materialized to a GP register anyway. While most of
the code already checked for hasOneUse() there was one important case
missing.

llvm-svn: 245640
2015-08-20 23:33:31 +00:00
James Molloy 88edc8243d Remove hand-rolled matching for fmin and fmax.
SDAGBuilder now does this all for us.

llvm-svn: 245198
2015-08-17 07:13:20 +00:00
James Molloy 63be198712 [AArch64] FMINNAN/FMAXNAN on f16 is not legal.
Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal.

I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet.

llvm-svn: 245035
2015-08-14 09:08:50 +00:00
Ahmed Bougacha 2a97b1bcf8 [AArch64] Also custom-lowering mismatched vector/f16 FCOPYSIGN.
We can lower them using our cool tricks if we fpext/fptrunc the second
input, like we do for f32/f64.

Follow-up to r243924, r243926, and r244858.

llvm-svn: 244860
2015-08-13 01:13:56 +00:00
Alex Lorenz e40c8a2b26 PseudoSourceValue: Replace global manager with a manager in a machine function.
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.

This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.

This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.

This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.

Reviewers: Akira Hatanaka
llvm-svn: 244693
2015-08-11 23:09:45 +00:00
James Molloy b7b2a1e9b4 [AArch64] Match fminnum/fmaxnum for vector fminnm/fmaxnm instead of an intrinsic.
Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change:

  - Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant.
  - f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall!

llvm-svn: 244595
2015-08-11 12:06:37 +00:00
James Molloy edf38f0cb0 [AArch64] Replace the custom AArch64ISD::FMIN/MAX nodes with ISD::FMINNAN/MAXNAN
NFCI. This just removes custom ISDNodes that are no longer needed.

llvm-svn: 244594
2015-08-11 12:06:33 +00:00
Benjamin Kramer df005cbe19 Fix some comment typos.
llvm-svn: 244402
2015-08-08 18:27:36 +00:00
Sanjay Patel 924879ad2c wrap OptSize and MinSize attributes for easier and consistent access (NFCI)
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).

Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.

This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.

Differential Revision: http://reviews.llvm.org/D11734

llvm-svn: 243994
2015-08-04 15:49:57 +00:00
Ahmed Bougacha e0e12db8c8 [AArch64] Add isel support for f16 indexed LD/ST.
llvm-svn: 243935
2015-08-04 01:29:38 +00:00
Ahmed Bougacha 0cbe2efcd6 [AArch64] Remove unnecessary "break". NFC.
llvm-svn: 243931
2015-08-04 00:49:08 +00:00
Ahmed Bougacha 239d635d3d [AArch64] Use SDValue bool operator. NFC.
llvm-svn: 243930
2015-08-04 00:48:02 +00:00
Ahmed Bougacha b0ae36f0d1 [AArch64] Vector FCOPYSIGN supports Custom-lowering: mark it as such.
There's a bunch of code in LowerFCOPYSIGN that does smart lowering, and
is actually already vector-aware; let's use it instead of scalarizing!

The only interesting change is that for v2f32, we previously always used
use v4i32 as the integer vector type.
Use v2i32 instead, and mark FCOPYSIGN as Custom.

llvm-svn: 243926
2015-08-04 00:42:34 +00:00
Pete Cooper 7be8f8f018 Convert some AArch64 code to foreach loops. NFC.
Also converted a cast<> to dyn_cast while i was working on the same
line of code.

llvm-svn: 243894
2015-08-03 19:04:32 +00:00