forked from OSchip/llvm-project
e29686e5c1
ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. *** Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. ** SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116 |
||
---|---|---|
.. | ||
AsmParser | ||
Disassembler | ||
InstPrinter | ||
MCTargetDesc | ||
TargetInfo | ||
Utils | ||
AArch64.h | ||
AArch64.td | ||
AArch64A53Fix835769.cpp | ||
AArch64A57FPLoadBalancing.cpp | ||
AArch64AdvSIMDScalarPass.cpp | ||
AArch64AsmPrinter.cpp | ||
AArch64CallLowering.cpp | ||
AArch64CallLowering.h | ||
AArch64CallingConvention.h | ||
AArch64CallingConvention.td | ||
AArch64CleanupLocalDynamicTLSPass.cpp | ||
AArch64CollectLOH.cpp | ||
AArch64ConditionOptimizer.cpp | ||
AArch64ConditionalCompares.cpp | ||
AArch64DeadRegisterDefinitionsPass.cpp | ||
AArch64ExpandPseudoInsts.cpp | ||
AArch64FastISel.cpp | ||
AArch64FrameLowering.cpp | ||
AArch64FrameLowering.h | ||
AArch64GenRegisterBankInfo.def | ||
AArch64ISelDAGToDAG.cpp | ||
AArch64ISelLowering.cpp | ||
AArch64ISelLowering.h | ||
AArch64InstrAtomics.td | ||
AArch64InstrFormats.td | ||
AArch64InstrInfo.cpp | ||
AArch64InstrInfo.h | ||
AArch64InstrInfo.td | ||
AArch64InstructionSelector.cpp | ||
AArch64LegalizerInfo.cpp | ||
AArch64LegalizerInfo.h | ||
AArch64LoadStoreOptimizer.cpp | ||
AArch64MCInstLower.cpp | ||
AArch64MCInstLower.h | ||
AArch64MachineFunctionInfo.h | ||
AArch64MacroFusion.cpp | ||
AArch64MacroFusion.h | ||
AArch64PBQPRegAlloc.cpp | ||
AArch64PBQPRegAlloc.h | ||
AArch64PerfectShuffle.h | ||
AArch64PromoteConstant.cpp | ||
AArch64RedundantCopyElimination.cpp | ||
AArch64RegisterBankInfo.cpp | ||
AArch64RegisterBankInfo.h | ||
AArch64RegisterBanks.td | ||
AArch64RegisterInfo.cpp | ||
AArch64RegisterInfo.h | ||
AArch64RegisterInfo.td | ||
AArch64SchedA53.td | ||
AArch64SchedA57.td | ||
AArch64SchedA57WriteRes.td | ||
AArch64SchedCyclone.td | ||
AArch64SchedFalkor.td | ||
AArch64SchedFalkorDetails.td | ||
AArch64SchedFalkorWriteRes.td | ||
AArch64SchedKryo.td | ||
AArch64SchedKryoDetails.td | ||
AArch64SchedM1.td | ||
AArch64SchedThunderX.td | ||
AArch64SchedThunderX2T99.td | ||
AArch64Schedule.td | ||
AArch64SelectionDAGInfo.cpp | ||
AArch64SelectionDAGInfo.h | ||
AArch64StorePairSuppress.cpp | ||
AArch64Subtarget.cpp | ||
AArch64Subtarget.h | ||
AArch64SystemOperands.td | ||
AArch64TargetMachine.cpp | ||
AArch64TargetMachine.h | ||
AArch64TargetObjectFile.cpp | ||
AArch64TargetObjectFile.h | ||
AArch64TargetTransformInfo.cpp | ||
AArch64TargetTransformInfo.h | ||
AArch64VectorByElementOpt.cpp | ||
CMakeLists.txt | ||
LLVMBuild.txt |