llvm-project/llvm/test/Transforms/SLPVectorizer/AArch64/64-bit-vector.ll

; RUN: opt -S -slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=generic < %s | FileCheck %s
; RUN: opt -S -slp-vectorizer -mtriple=aarch64-apple-ios -mcpu=cyclone < %s | FileCheck %s
; Currently disabled for a few subtargets (e.g. Kryo):
; RUN: opt -S -slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=kryo < %s | FileCheck --check-prefix=NO_SLP %s
; RUN: opt -S -slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=generic -slp-min-reg-size=128 < %s | FileCheck --check-prefix=NO_SLP %s

define void @f(float* %r, float* %w) {
  %r0 = getelementptr inbounds float, float* %r, i64 0
  %r1 = getelementptr inbounds float, float* %r, i64 1
  %f0 = load float, float* %r0
  %f1 = load float, float* %r1
  %add0 = fadd float %f0, %f0
; CHECK:  fadd <2 x float>
; NO_SLP: fadd float
; NO_SLP: fadd float
  %add1 = fadd float %f1, %f1
  %w0 = getelementptr inbounds float, float* %w, i64 0
  %w1 = getelementptr inbounds float, float* %w, i64 1
  store float %add0, float* %w0
  store float %add1, float* %w1
  ret void
}
[SLP] Enable 64-bit wide vectorization on AArch64 ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116 2017-05-16 05:15:01 +08:00			`; RUN: opt -S -slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=generic < %s \| FileCheck %s`
			`; RUN: opt -S -slp-vectorizer -mtriple=aarch64-apple-ios -mcpu=cyclone < %s \| FileCheck %s`
			`; Currently disabled for a few subtargets (e.g. Kryo):`
			`; RUN: opt -S -slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=kryo < %s \| FileCheck --check-prefix=NO_SLP %s`
			`; RUN: opt -S -slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=generic -slp-min-reg-size=128 < %s \| FileCheck --check-prefix=NO_SLP %s`

			`define void @f(float* %r, float* %w) {`
			`%r0 = getelementptr inbounds float, float* %r, i64 0`
			`%r1 = getelementptr inbounds float, float* %r, i64 1`
			`%f0 = load float, float* %r0`
			`%f1 = load float, float* %r1`
			`%add0 = fadd float %f0, %f0`
			`; CHECK: fadd <2 x float>`
			`; NO_SLP: fadd float`
			`; NO_SLP: fadd float`
			`%add1 = fadd float %f1, %f1`
			`%w0 = getelementptr inbounds float, float* %w, i64 0`
			`%w1 = getelementptr inbounds float, float* %w, i64 1`
			`store float %add0, float* %w0`
			`store float %add1, float* %w1`
			`ret void`
			`}`