Commit Graph

27 Commits

Author SHA1 Message Date
Lucas Prates f56550cf7f [ARM] Enabling range checks on Neon intrinsics' lane arguments
Summary:
Range checks were not properly performed in the lane arguments of Neon
intrinsics implemented based on splat operations. Calls to those
intrinsics where translated to `__builtin__shufflevector` calls directly
by the pre-processor through the arm_neon.h macros, missing the chance
for the proper range checks.

This patch enables the range check by introducing an auxiliary splat
instruction in arm_neon.td, delaying the translation to shufflevector
calls to CGBuiltin.cpp in clang after the checks were performed.

Reviewers: jmolloy, t.p.northover, rsmith, olista01, ostannard

Reviewed By: ostannard

Subscribers: ostannard, dnsampaio, danielkiss, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D74619
2020-03-19 12:07:23 +00:00
Lucas Prates 7bf23563f4 Revert "[ARM] Setting missing isLaneQ attribute on Neon Intrisics definitions"
This reverts commit 62ab15ffa3.

Multiple commits were unintentionally squashed into this one. Reverting
so each of them can be pushed properly.
2020-03-19 12:01:13 +00:00
Lucas Prates 62ab15ffa3 [ARM] Setting missing isLaneQ attribute on Neon Intrisics definitions
Summary:
Some of the `*_laneq` intrinsics defined in arm_neon.td were missing the
setting of the `isLaneQ` attribute. This patch sets the attribute on the
related definitions, as they will be required to properly perform range
checks on their lane arguments.

Reviewers: jmolloy, t.p.northover, rsmith, olista01, dnsampaio

Reviewed By: dnsampaio

Subscribers: dnsampaio, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D74616
2020-03-19 11:52:41 +00:00
Sanne Wouda 2939fc13c8 [AArch64] Add IR intrinsics for sq(r)dmulh_lane(q)
Summary:
Currently, sqdmulh_lane and friends from the ACLE (implemented in arm_neon.h),
are represented in LLVM IR as a (by vector) sqdmulh and a vector of (repeated)
indices, like so:

   %shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
   %vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)

When %v's values are known, the shufflevector is optimized away and we are no
longer able to select the lane variant of sqdmulh in the backend.

This defeats a (hand-coded) optimization that packs several constants into a
single vector and uses the lane intrinsics to reduce register pressure and
trade-off materialising several constants for a single vector load from the
constant pool, like so:

   int16x8_t v = {2,3,4,5,6,7,8,9};
   a = vqdmulh_laneq_s16(a, v, 0);
   b = vqdmulh_laneq_s16(b, v, 1);
   c = vqdmulh_laneq_s16(c, v, 2);
   d = vqdmulh_laneq_s16(d, v, 3);
   [...]

In one microbenchmark from libjpeg-turbo this accounts for a 2.5% to 4%
performance difference.

We could teach the compiler to recover the lane variants, but this would likely
require its own pass.  (Alternatively, "volatile" could be used on the constants
vector, but this is a bit ugly.)

This patch instead implements the following LLVM IR intrinsics for AArch64 to
maintain the original structure through IR optmization and into instruction
selection:
- sqdmulh_lane
- sqdmulh_laneq
- sqrdmulh_lane
- sqrdmulh_laneq.

These 'lane' variants need an additional register class.  The second argument
must be in the lower half of the 64-bit NEON register file, but only when
operating on i16 elements.

Note that the existing patterns for shufflevector and sqdmulh into sqdmulh_lane
(etc.) remain, so code that does not rely on NEON intrinsics to generate these
instructions is not affected.

This patch also changes clang to emit these IR intrinsics for the corresponding
NEON intrinsics (AArch64 only).

Reviewers: SjoerdMeijer, dmgreen, t.p.northover, rovka, rengolin, efriedma

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, jdoerfert, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71469
2020-01-29 13:25:23 +00:00
Sanne Wouda cbc45e4e75 Regenerate aarch64-neon-2velem.c CHECK lines 2020-01-29 13:03:27 +00:00
Tim Northover 59f063b89c NeonEmitter: remove special 'a' type modifier.
'a' used to implement a splat in C++ code in NeonEmitter.cpp, but this
can be done directly from .td expansions now (and most ops already did).
So removing it simplifies the overall code.

https://reviews.llvm.org/D69716
2019-11-06 10:23:36 +00:00
Cameron McInally 20b8ed2c2b [IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator
Reapply r374240 with fix for Ocaml test, namely Bindings/OCaml/core.ml.

Differential Revision: https://reviews.llvm.org/D61675

llvm-svn: 374782
2019-10-14 15:35:01 +00:00
Dmitri Gribenko eaf6dd482b Revert "[IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator"
This reverts commit r374240. It broke OCaml tests:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19014

llvm-svn: 374354
2019-10-10 14:13:54 +00:00
Cameron McInally 47363a148f [IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator
Also update Clang to call Builder.CreateFNeg(...) for UnaryMinus.

Differential Revision: https://reviews.llvm.org/D61675

llvm-svn: 374240
2019-10-09 21:52:15 +00:00
Eli Friedman 4c4df44186 [ARM] Fix arm_neon.h with -flax-vector-conversions=none
Really, we were already 99% of the way there; just needed a couple minor
fixes that affected 64-bit-only builtins.  Based on D61717.

Note that the change to builtin_str changes the type of a few
__builtin_neon_* intrinsics that had the "wrong" type.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43341

Differential Revision: https://reviews.llvm.org/D68683

llvm-svn: 374191
2019-10-09 17:57:59 +00:00
Ivan A. Kosarev 2f326d453f [NEON] Support vfma_n and vfms_n intrinsics
Differential Revision: https://reviews.llvm.org/D45483

llvm-svn: 329814
2018-04-11 14:43:11 +00:00
Mehdi Amini 6aa9e9b41a IRGen: Add optnone attribute on function during O0
Amongst other, this will help LTO to correctly handle/honor files
compiled with O0, helping debugging failures.
It also seems in line with how we handle other options, like how
-fnoinline adds the appropriate attribute as well.

Differential Revision: https://reviews.llvm.org/D28404

llvm-svn: 304127
2017-05-29 05:38:20 +00:00
Matt Arsenault 7c4c1cb2f5 Fix tests after speculatable intrinsics patch
These were relying on the attribute group numbering

llvm-svn: 301996
2017-05-03 03:04:40 +00:00
David Majnemer 12b9e76b62 Update for LLVM changes
InstSimplify has gained the ability to remove needless bitcasts which
perturbed some clang codegen tests.

llvm-svn: 276728
2016-07-26 05:52:37 +00:00
Ahmed Bougacha 1d9de10130 [ARM NEON] Define vfms_f32 on ARM, and all vfms using vfma.
r259537 added vfma/vfms to armv7, but the builtin was only lowered
on the AArch64 side. Instead of supporting it on ARM, get rid of it.

The vfms builtin lowered to:
  %nb = fsub float -0.0, %b
  %r = @llvm.fma.f32(%a, %nb, %c)

Instead, define the operation in terms of vfma, and swap the
multiplicands. It now lowers to:
  %na = fsub float -0.0, %a
  %r = @llvm.fma.f32(%na, %b, %c)

This matches the instruction more closely, and lets current LLVM
generate the "natural" operand ordering:
  fmls.2s v0, v1, v2
instead of the crooked (but equivalent):
  fmls.2s v0, v2, v1
Except for theses changes, assembly is identical.

LLVM accepts both commutations, and the LLVM tests in:
  test/CodeGen/AArch64/arm64-fmadd.ll
  test/CodeGen/AArch64/fp-dp3.ll
  test/CodeGen/AArch64/neon-fma.ll
  test/CodeGen/ARM/fusedMAC.ll
already check either the new one only, or both.

Also verified against the test-suite unittests.

llvm-svn: 266807
2016-04-19 19:44:45 +00:00
Tim Northover 67181e3c3a ARM & AArch64: fix IR-converted tests.
My script was converting %a0 to [[A]]0 if it had seen %a defined before %a0.
Oops.

llvm-svn: 263056
2016-03-09 20:06:10 +00:00
Tim Northover 58672974a9 ARM & AArch64: convert asm tests to LLVM IR and restrict optimizations.
This is mostly a one-time autoconversion of tests that checked assembly after
"-Owhatever" compiles to only run "opt -mem2reg" and check the assembly. This
should make them much more stable to changes in LLVM so they won't break on
unrelated changes.

"opt -mem2reg" is a compromise designed to increase the readability of tests
that check dataflow, while minimizing dependency on LLVM. Hopefully mem2reg is
stable enough that no surpises will come along.

Should address http://llvm.org/PR26815.

llvm-svn: 263048
2016-03-09 18:54:42 +00:00
Tim Northover 831d728f9a AArch64: re-enable tests that were looking for a non-existent backend.
In the final phase of the merge, I managed to disable a bunch of Clang
tests accidentally. Fortunately none of them seem to have broken in
the interim.

llvm-svn: 211149
2014-06-18 08:37:28 +00:00
Tim Northover 25e8a6754e AArch64/ARM64: update Clang after AArch64 removal.
A few (mostly CodeGen) parts of Clang were tightly coupled to the
AArch64 backend. Now that it's gone, they will not even compile.

I've also deduplicated RUN lines in many of the AArch64 tests. This
might improve "make check-all" time noticably: some of those NEON
tests were monsters.

llvm-svn: 209578
2014-05-24 12:51:25 +00:00
James Molloy 75f5f9e629 [ARM64] Allow the disabling of NEON and crypto instructions. Update tests to pass -target-feature +neon.
llvm-svn: 206394
2014-04-16 15:33:48 +00:00
Tim Northover a2ee433c8d ARM64: initial clang support commit.
This adds Clang support for the ARM64 backend. There are definitely
still some rough edges, so please bring up any issues you see with
this patch.

As with the LLVM commit though, we think it'll be more useful for
merging with AArch64 from within the tree.

llvm-svn: 205100
2014-03-29 15:09:45 +00:00
Tim Northover 926a235fea AArch64: convert NEON tests to use CHECK-LABEL.
llvm-svn: 202703
2014-03-03 11:34:36 +00:00
Jiangning Liu 38799b1471 Add some missing test cases for ACLE intrinsics of AArch64 NEON.
llvm-svn: 197994
2013-12-25 01:23:43 +00:00
Jiangning Liu cc1da2c938 Add some missing AArch64 Neon intrinsics like vmull_high_n_s16 and friends.
llvm-svn: 196189
2013-12-03 01:28:55 +00:00
Jiangning Liu ee3e08799c Fix the AArch64 NEON bug exposed by checking constant integer argument range of ACLE intrinsics.
llvm-svn: 195844
2013-11-27 14:02:55 +00:00
Ana Pazos 6f2a47a9e5 Implemented aarch64 Neon scalar vmulx_lane intrinsics
Implemented aarch64 Neon scalar vfma_lane intrinsics
Implemented aarch64 Neon scalar vfms_lane intrinsics

Implemented legacy vmul_n_f64, vmul_lane_f64, vmul_laneq_f64
intrinsics (v1f64 parameter type) using Neon scalar instructions.

Implemented legacy vfma_lane_f64, vfms_lane_f64,
vfma_laneq_f64, vfms_laneq_f64 intrinsics (v1f64 parameter type)
using Neon scalar instructions.

llvm-svn: 194889
2013-11-15 23:33:31 +00:00
Jiangning Liu 4617e9dc85 Implement aarch64 neon instruction set AdvSIMD (3V elem).
llvm-svn: 191945
2013-10-04 09:21:17 +00:00