llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanne Wouda	2939fc13c8	[AArch64] Add IR intrinsics for sq(r)dmulh_lane(q) Summary: Currently, sqdmulh_lane and friends from the ACLE (implemented in arm_neon.h), are represented in LLVM IR as a (by vector) sqdmulh and a vector of (repeated) indices, like so: %shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3> %vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle) When %v's values are known, the shufflevector is optimized away and we are no longer able to select the lane variant of sqdmulh in the backend. This defeats a (hand-coded) optimization that packs several constants into a single vector and uses the lane intrinsics to reduce register pressure and trade-off materialising several constants for a single vector load from the constant pool, like so: int16x8_t v = {2,3,4,5,6,7,8,9}; a = vqdmulh_laneq_s16(a, v, 0); b = vqdmulh_laneq_s16(b, v, 1); c = vqdmulh_laneq_s16(c, v, 2); d = vqdmulh_laneq_s16(d, v, 3); [...] In one microbenchmark from libjpeg-turbo this accounts for a 2.5% to 4% performance difference. We could teach the compiler to recover the lane variants, but this would likely require its own pass. (Alternatively, "volatile" could be used on the constants vector, but this is a bit ugly.) This patch instead implements the following LLVM IR intrinsics for AArch64 to maintain the original structure through IR optmization and into instruction selection: - sqdmulh_lane - sqdmulh_laneq - sqrdmulh_lane - sqrdmulh_laneq. These 'lane' variants need an additional register class. The second argument must be in the lower half of the 64-bit NEON register file, but only when operating on i16 elements. Note that the existing patterns for shufflevector and sqdmulh into sqdmulh_lane (etc.) remain, so code that does not rely on NEON intrinsics to generate these instructions is not affected. This patch also changes clang to emit these IR intrinsics for the corresponding NEON intrinsics (AArch64 only). Reviewers: SjoerdMeijer, dmgreen, t.p.northover, rovka, rengolin, efriedma Reviewed By: efriedma Subscribers: kristof.beyls, hiraditya, jdoerfert, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71469	2020-01-29 13:25:23 +00:00
Sanjay Patel	0b38af89e2	[AArch64] match splat of bitcasted extract subvector to DUPLANE This is another potential regression exposed by D63815. Here we peek through a bitcast to find an extract subvector and scale the splat offset based on that: splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC' Differential Revision: https://reviews.llvm.org/D71672	2019-12-22 08:37:03 -05:00
Sanjay Patel	59811f454d	[AArch64] add more tests for extract-bitcast-splat; NFC Goes with D71672 - we should be able to handle casting to a wider type as well as casting to a narrower type.	2019-12-20 08:57:23 -05:00
Sanjay Patel	b99111b3e4	[AArch64] add tests for bitcasted DUPLANE; NFC See D63815 for context/motivation.	2019-12-18 12:14:20 -05:00
Sanjay Patel	e67462a719	[AArch64] update test checks; NFC The common prefix reduces a bunch of replication; not sure why it didn't happen before.	2019-12-18 11:05:06 -05:00
Evandro Menezes	215da6606c	[clang][llvm] Obsolete Exynos M1 and M2	2019-10-30 15:02:59 -05:00
Simon Pilgrim	de1ce8230d	[AArch64] Regenerate 2velem tests. NFCI. Prep work for an upcoming patch llvm-svn: 364204	2019-06-24 16:58:19 +00:00
Xing GUO	33649349c5	[Codegen] fix typos in test case llvm-svn: 355264	2019-03-02 08:03:59 +00:00
Evandro Menezes	16d7d81e5d	[AArch64] Update test cases for Exynos M3 Update any test case relevant for Exynos M3. llvm-svn: 323775	2018-01-30 15:40:27 +00:00
Evandro Menezes	687df6380e	[AArch64] Expand test coverage of vector element shuffling to Exynos Make sure that all test cases are run for Exynos as well. Otherwise, NFC. llvm-svn: 321032	2017-12-18 22:17:39 +00:00
Abderrazek Zaafrani	2c80e4c7c3	[AArch64] Avoid SIMD interleaved store instruction for Exynos. Replace interleaved store instructions by equivalent and more efficient instructions based on latency cost model. Https://reviews.llvm.org/D38196 llvm-svn: 320123	2017-12-08 00:58:49 +00:00
Sebastian Pop	eb65d72d9c	[AArch64] Avoid generating indexed vector instructions for Exynos Avoid generating indexed vector instructions for Exynos. This is needed for fmla/fmls/fmul/fmulx. For example, the instruction fmla v0.4s, v1.4s, v2.s[1] is less efficient than the instructions dup v2.4s, v2.s[1] fmla v0.4s, v1.4s, v2.4s Patch written by Abderrazek Zaafrani. Differential Revision: https://reviews.llvm.org/D21571 llvm-svn: 283663	2016-10-08 12:30:07 +00:00
Ahmed Bougacha	cd35787217	[AArch64] Fix FMLS scalar-indexed-from-2s-after-neg patterns. We canonicalize V64 vectors to V128 through insert_subvector: the other FMLA/FMLS/FMUL/FMULX patterns match that already, but this one doesn't, so we'd fail to match fmls and generate fneg+fmla instead. The vector equivalents are already tested and functional. llvm-svn: 245107	2015-08-14 22:06:05 +00:00
Tim Northover	3b0846e8f7	AArch64/ARM64: move ARM64 into AArch64's place This commit starts with a "git mv ARM64 AArch64" and continues out from there, renaming the C++ classes, intrinsics, and other target-local objects for consistency. "ARM64" test directories are also moved, and tests that began their life in ARM64 use an arm64 triple, those from AArch64 use an aarch64 triple. Both should be equivalent though. This finishes the AArch64 merge, and everyone should feel free to continue committing as normal now. llvm-svn: 209577	2014-05-24 12:50:23 +00:00

14 Commits