llvm-project/llvm/lib/Target/AArch64
Adam Nemet e29686e5c1 [SLP] Enable 64-bit wide vectorization on AArch64
ARM Neon has native support for half-sized vector registers (64 bits).  This
is beneficial for example for 2D and 3D graphics.  This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.

*** Performance Analysis

This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.

The results are with -O3 and PGO.  A negative percentage is an improvement.
The testsuite was run with a sample size of 4.

** SPEC

* CFP2006/482.sphinx3  -3.34%

A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.

* CFP2000/177.mesa     -3.34%
* CINT2000/256.bzip2   +6.97%

My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%.  There are also other problems with the codegen in
this loop so there is further room for improvement.

** LLVM testsuite

* SingleSource/Benchmarks/Misc/ReedSolomon               -10.75%

There are multiple small SLP vectorizations outside the hot code.  It's a bit
surprising that it adds up to 10%.  Some of this may be code-layout noise.

* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%

The opt-viewer screenshot can be seen at F3218284.  We start at a colder store
but the tree leads us into the hottest loop.

* MultiSource/Applications/lambda-0.1.3/lambda            -2.68%
* MultiSource/Benchmarks/Bullet/bullet                    -2.18%

This is using 3D vectors.

* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%

Noise, binary is unchanged.

* MultiSource/Benchmarks/Ptrdist/anagram/anagram          +4.90%

There is an additional SLP in the cold code.  The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.

* MultiSource/Applications/aha/aha                        +1.63%
* MultiSource/Applications/JM/lencod/lencod               +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark         +1.15%

Differential Revision: https://reviews.llvm.org/D31965

llvm-svn: 303116
2017-05-15 21:15:01 +00:00
..
AsmParser AArch64: diagnose unrecognized features in .cpu directive. 2017-05-15 19:42:15 +00:00
Disassembler [AArch64, Lanai] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2017-01-06 00:30:53 +00:00
InstPrinter AArch64: lower "fence singlethread" to a pure compiler barrier. 2017-04-20 21:57:45 +00:00
MCTargetDesc [AArch64] Fix a comment to match the code. NFC. 2017-05-10 10:51:32 +00:00
TargetInfo Move the global variables representing each Target behind accessor function 2016-10-09 23:00:34 +00:00
Utils [AArch64AsmParser] rewrite of function parseSysAlias 2017-03-03 08:12:47 +00:00
AArch64.h [AArch64] Remove AArch64AddressTypePromotion pass 2017-05-05 16:05:41 +00:00
AArch64.td [AArch64] Enable FeatureFuseAES on Cortex-A72. 2017-05-15 15:15:22 +00:00
AArch64A53Fix835769.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
AArch64A57FPLoadBalancing.cpp LiveRegUnits: Add accumulateBackward() function 2017-01-21 02:21:04 +00:00
AArch64AdvSIMDScalarPass.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
AArch64AsmPrinter.cpp [AArch64] ILP32 Backend Relocation Support 2017-05-02 22:01:48 +00:00
AArch64CallLowering.cpp Add extra operand to CALLSEQ_START to keep frame part set up previously 2017-05-09 13:35:13 +00:00
AArch64CallLowering.h [GlobalISel] Use the correct calling conv for calls 2017-03-20 14:40:18 +00:00
AArch64CallingConvention.h
AArch64CallingConvention.td SwiftCC: swifterror register cannot be as the base register 2017-02-09 01:52:17 +00:00
AArch64CleanupLocalDynamicTLSPass.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
AArch64CollectLOH.cpp AArch64CollectLOH: Rewrite as block-local analysis. 2017-01-06 19:22:01 +00:00
AArch64ConditionOptimizer.cpp [CodeGen] Rename MachineInstrBuilder::addOperand. NFC 2017-01-13 09:58:52 +00:00
AArch64ConditionalCompares.cpp [CodeGen] Rename MachineInstrBuilder::addOperand. NFC 2017-01-13 09:58:52 +00:00
AArch64DeadRegisterDefinitionsPass.cpp AArch64: Use DeadRegisterDefinitionsPass before regalloc. 2016-11-16 03:38:27 +00:00
AArch64ExpandPseudoInsts.cpp AArch64: lower "fence singlethread" to a pure compiler barrier. 2017-04-20 21:57:45 +00:00
AArch64FastISel.cpp Add extra operand to CALLSEQ_START to keep frame part set up previously 2017-05-09 13:35:13 +00:00
AArch64FrameLowering.cpp Move size and alignment information of regclass to TargetRegisterInfo 2017-04-24 18:55:33 +00:00
AArch64FrameLowering.h
AArch64GenRegisterBankInfo.def GlobalISel: fall back gracefully when we can't map an operand's size. 2017-02-06 21:57:06 +00:00
AArch64ISelDAGToDAG.cpp [SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits 2017-04-28 05:31:46 +00:00
AArch64ISelLowering.cpp Revert r302678 "[AArch64] Enable use of reduction intrinsics." 2017-05-15 20:59:32 +00:00
AArch64ISelLowering.h Revert r302678 "[AArch64] Enable use of reduction intrinsics." 2017-05-15 20:59:32 +00:00
AArch64InstrAtomics.td AArch64: lower "fence singlethread" to a pure compiler barrier. 2017-04-20 21:57:45 +00:00
AArch64InstrFormats.td [globalisel][tablegen] Revise API for ComplexPattern operands to improve flexibility. 2017-04-22 15:11:04 +00:00
AArch64InstrInfo.cpp [AArch64][MachineCombine] Fold FNMUL+FSUB -> FNMADD. 2017-05-11 20:07:24 +00:00
AArch64InstrInfo.h Re-commit r301040 "X86: Don't emit zero-byte functions on Windows" 2017-04-21 21:48:41 +00:00
AArch64InstrInfo.td Add extra operand to CALLSEQ_START to keep frame part set up previously 2017-05-09 13:35:13 +00:00
AArch64InstructionSelector.cpp [globalisel][tablegen] Compute available feature bits correctly. 2017-04-29 17:30:09 +00:00
AArch64LegalizerInfo.cpp GlobalISel: constrain G_INSERT to inserting just one value per instruction. 2017-03-03 23:05:47 +00:00
AArch64LegalizerInfo.h GlobalISel: legalize va_arg on AArch64. 2017-02-15 23:22:50 +00:00
AArch64LoadStoreOptimizer.cpp [AArch64] Use alias analysis in the load/store optimization pass. 2017-03-17 14:19:55 +00:00
AArch64MCInstLower.cpp Remove TargetTriple from AArch64MCInstLower as it's used in few places 2016-10-01 01:50:25 +00:00
AArch64MCInstLower.h
AArch64MachineFunctionInfo.h [AArch64, Lanai] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2017-01-06 00:30:53 +00:00
AArch64MacroFusion.cpp [AArch64] Simplify MacroFusion 2017-04-11 19:13:11 +00:00
AArch64MacroFusion.h [CodeGen] Move MacroFusion to the target 2017-02-01 02:54:34 +00:00
AArch64PBQPRegAlloc.cpp
AArch64PBQPRegAlloc.h
AArch64PerfectShuffle.h
AArch64PromoteConstant.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
AArch64RedundantCopyElimination.cpp [AArch64][Redundant Copy Elim] Add support for CMN and shifted imm. 2017-03-06 21:20:00 +00:00
AArch64RegisterBankInfo.cpp [AArch64][RegisterBankInfo] Change the default mapping of fp stores. 2017-05-10 15:19:41 +00:00
AArch64RegisterBankInfo.h [RegisterBankInfo] Uniquely allocate instruction mapping. 2017-05-05 22:48:22 +00:00
AArch64RegisterBanks.td Re-commit: [globalisel] Tablegen-erate current Register Bank Information 2017-01-19 11:15:55 +00:00
AArch64RegisterInfo.cpp AArch64RegisterInfo: Simplify getReservedReg(); NFC 2017-02-02 02:23:25 +00:00
AArch64RegisterInfo.h AArch64: Enable post-ra liveness updates 2016-12-16 23:55:43 +00:00
AArch64RegisterInfo.td [AArch64] Corrected spill size for DDD register class. NFCI 2016-10-21 09:53:42 +00:00
AArch64SchedA53.td [MachineScheduler] Reference the correct header. 2017-03-26 21:27:21 +00:00
AArch64SchedA57.td [AArch64] Add new subtarget feature to fuse AES crypto operations 2017-02-01 02:54:39 +00:00
AArch64SchedA57WriteRes.td [AArch64] Cortex-A57 FDIV/FSQRT scheduling fix (W-unit) 2016-12-23 12:51:41 +00:00
AArch64SchedCyclone.td
AArch64SchedFalkor.td [AArch64][Falkor] Fix number of microops for WriteSTIdx missed in r300892. 2017-04-21 13:37:01 +00:00
AArch64SchedFalkorDetails.td [AArch64][Falkor] Fix sched details for FMOV 2017-05-15 18:50:22 +00:00
AArch64SchedFalkorWriteRes.td [AArch64][Falkor] Fix sched details for FMOV 2017-05-15 18:50:22 +00:00
AArch64SchedKryo.td
AArch64SchedKryoDetails.td [AArch64] Refine Kryo Machine Model 2017-01-26 20:10:41 +00:00
AArch64SchedM1.td [AArch64] Add new subtarget feature to fuse AES crypto operations 2017-02-01 02:54:39 +00:00
AArch64SchedThunderX.td [AArch64] Vulcan is now ThunderXT99 2017-03-07 19:42:40 +00:00
AArch64SchedThunderX2T99.td [AArch64] Vulcan is now ThunderXT99 2017-03-07 19:42:40 +00:00
AArch64Schedule.td
AArch64SelectionDAGInfo.cpp [AArch64] Drive-by cleanup, make this code shorter. NFCI. 2017-03-22 23:37:58 +00:00
AArch64SelectionDAGInfo.h
AArch64StorePairSuppress.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
AArch64Subtarget.cpp [SLP] Enable 64-bit wide vectorization on AArch64 2017-05-15 21:15:01 +00:00
AArch64Subtarget.h [SLP] Enable 64-bit wide vectorization on AArch64 2017-05-15 21:15:01 +00:00
AArch64SystemOperands.td AArch64InstPrinter: rewrite of printSysAlias 2017-02-27 14:45:34 +00:00
AArch64TargetMachine.cpp [AArch64] Remove AArch64AddressTypePromotion pass 2017-05-05 16:05:41 +00:00
AArch64TargetMachine.h [globalisel][tablegen] Move <Target>InstructionSelector declarations to anonymous namespaces 2017-04-06 09:49:34 +00:00
AArch64TargetObjectFile.cpp CodeGen: simplify TargetMachine::getSymbol interface. NFC. 2016-11-22 16:17:20 +00:00
AArch64TargetObjectFile.h Move the Mangler from the AsmPrinter down to TLOF and clean up the 2016-09-16 07:33:15 +00:00
AArch64TargetTransformInfo.cpp Revert r302678 "[AArch64] Enable use of reduction intrinsics." 2017-05-15 20:59:32 +00:00
AArch64TargetTransformInfo.h [SLP] Enable 64-bit wide vectorization on AArch64 2017-05-15 21:15:01 +00:00
AArch64VectorByElementOpt.cpp [AArch64] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2017-01-25 00:29:26 +00:00
CMakeLists.txt [AArch64] Remove AArch64AddressTypePromotion pass 2017-05-05 16:05:41 +00:00
LLVMBuild.txt