llvm-project/llvm/lib
Sanne Wouda 2939fc13c8 [AArch64] Add IR intrinsics for sq(r)dmulh_lane(q)
Summary:
Currently, sqdmulh_lane and friends from the ACLE (implemented in arm_neon.h),
are represented in LLVM IR as a (by vector) sqdmulh and a vector of (repeated)
indices, like so:

   %shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
   %vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)

When %v's values are known, the shufflevector is optimized away and we are no
longer able to select the lane variant of sqdmulh in the backend.

This defeats a (hand-coded) optimization that packs several constants into a
single vector and uses the lane intrinsics to reduce register pressure and
trade-off materialising several constants for a single vector load from the
constant pool, like so:

   int16x8_t v = {2,3,4,5,6,7,8,9};
   a = vqdmulh_laneq_s16(a, v, 0);
   b = vqdmulh_laneq_s16(b, v, 1);
   c = vqdmulh_laneq_s16(c, v, 2);
   d = vqdmulh_laneq_s16(d, v, 3);
   [...]

In one microbenchmark from libjpeg-turbo this accounts for a 2.5% to 4%
performance difference.

We could teach the compiler to recover the lane variants, but this would likely
require its own pass.  (Alternatively, "volatile" could be used on the constants
vector, but this is a bit ugly.)

This patch instead implements the following LLVM IR intrinsics for AArch64 to
maintain the original structure through IR optmization and into instruction
selection:
- sqdmulh_lane
- sqdmulh_laneq
- sqrdmulh_lane
- sqrdmulh_laneq.

These 'lane' variants need an additional register class.  The second argument
must be in the lower half of the 64-bit NEON register file, but only when
operating on i16 elements.

Note that the existing patterns for shufflevector and sqdmulh into sqdmulh_lane
(etc.) remain, so code that does not rely on NEON intrinsics to generate these
instructions is not affected.

This patch also changes clang to emit these IR intrinsics for the corresponding
NEON intrinsics (AArch64 only).

Reviewers: SjoerdMeijer, dmgreen, t.p.northover, rovka, rengolin, efriedma

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, jdoerfert, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71469
2020-01-29 13:25:23 +00:00
..
Analysis [AliasAnalysis] Add missing FMRB_* enums. 2020-01-28 15:47:08 -08:00
AsmParser Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
BinaryFormat DWARFDebugLine.cpp: Format unknown line number standard opcodes 2020-01-15 10:45:50 -05:00
Bitcode Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
Bitstream [Bitstream] Delete skipAbbreviatedField which duplicates readAbbreviatedField 2019-12-25 18:55:02 -08:00
CodeGen [ARM64] Debug info for structure argument missing DW_AT_location 2020-01-29 10:56:23 +01:00
DWARFLinker Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
DebugInfo [DebugInfo] Make most debug line prologue errors non-fatal to parsing 2020-01-29 10:23:41 +00:00
Demangle Revert "Add some missing includes to MicrosoftDemangle.cpp (PR44217)" 2019-12-04 11:10:07 -08:00
ExecutionEngine Another round of GCC5 fixes. 2020-01-29 02:09:24 +01:00
Frontend [OpenMP] Use the OpenMPIRBuilder for `omp parallel` 2019-12-30 13:57:13 -06:00
FuzzMutate Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
Fuzzer
IR Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
IRReader [cmake] Explicitly mark libraries defined in lib/ as "Component Libraries" 2019-11-21 10:48:08 -08:00
LTO Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
LineEditor Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
Linker [NFC] Fixes -Wrange-loop-analysis warnings 2020-01-01 20:01:37 +01:00
MC Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
MCA Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
Object [NFC] Fix unused variable warning. 2020-01-28 17:19:23 -08:00
ObjectYAML [Hexagon] Add support for Hexagon v67t microarchitecture (tiny core) 2020-01-21 11:35:10 -06:00
Option Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
Passes Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
ProfileData Another round of GCC5 fixes. 2020-01-29 02:09:24 +01:00
Remarks Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
Support Address implicit conversions detected by g++ 5 only. 2020-01-29 01:01:09 +01:00
TableGen A bunch more implicit string conversions that my Clang didn't detect. 2020-01-29 00:30:16 +01:00
Target [AArch64] Add IR intrinsics for sq(r)dmulh_lane(q) 2020-01-29 13:25:23 +00:00
Testing
TextAPI Another round of GCC5 fixes. 2020-01-29 02:09:24 +01:00
ToolDrivers Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
Transforms [Attributor][Fix] Initialize unused but loaded variable 2020-01-28 23:52:16 -06:00
WindowsManifest Revert "Temporarily revert "build: avoid hardcoding the libxml2 library name"" 2019-12-03 09:27:14 -08:00
XRay Make llvm::StringRef to std::string conversions explicit. 2020-01-28 23:25:25 +01:00
CMakeLists.txt [Dsymutil][Debuginfo][NFC] Reland: Refactor dsymutil to separate DWARF optimizing part. #2. 2020-01-08 14:15:31 +03:00
LLVMBuild.txt [Dsymutil][Debuginfo][NFC] Reland: Refactor dsymutil to separate DWARF optimizing part. #2. 2020-01-08 14:15:31 +03:00