llvm-project

History

Adam Nemet e29686e5c1 [SLP] Enable 64-bit wide vectorization on AArch64 ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116		2017-05-15 21:15:01 +00:00
..
AsmParser	AArch64: diagnose unrecognized features in .cpu directive.	2017-05-15 19:42:15 +00:00
Disassembler	[AArch64, Lanai] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).	2017-01-06 00:30:53 +00:00
InstPrinter	AArch64: lower "fence singlethread" to a pure compiler barrier.	2017-04-20 21:57:45 +00:00
MCTargetDesc	[AArch64] Fix a comment to match the code. NFC.	2017-05-10 10:51:32 +00:00
TargetInfo	Move the global variables representing each Target behind accessor function	2016-10-09 23:00:34 +00:00
Utils	[AArch64AsmParser] rewrite of function parseSysAlias	2017-03-03 08:12:47 +00:00
AArch64.h	[AArch64] Remove AArch64AddressTypePromotion pass	2017-05-05 16:05:41 +00:00
AArch64.td	[AArch64] Enable FeatureFuseAES on Cortex-A72.	2017-05-15 15:15:22 +00:00
AArch64A53Fix835769.cpp	Use StringRef in Pass/PassManager APIs (NFC)	2016-10-01 02:56:57 +00:00
AArch64A57FPLoadBalancing.cpp	LiveRegUnits: Add accumulateBackward() function	2017-01-21 02:21:04 +00:00
AArch64AdvSIMDScalarPass.cpp	Use StringRef in Pass/PassManager APIs (NFC)	2016-10-01 02:56:57 +00:00
AArch64AsmPrinter.cpp	[AArch64] ILP32 Backend Relocation Support	2017-05-02 22:01:48 +00:00
AArch64CallLowering.cpp	Add extra operand to CALLSEQ_START to keep frame part set up previously	2017-05-09 13:35:13 +00:00
AArch64CallLowering.h	[GlobalISel] Use the correct calling conv for calls	2017-03-20 14:40:18 +00:00
AArch64CallingConvention.h	…
AArch64CallingConvention.td	SwiftCC: swifterror register cannot be as the base register	2017-02-09 01:52:17 +00:00
AArch64CleanupLocalDynamicTLSPass.cpp	Use StringRef in Pass/PassManager APIs (NFC)	2016-10-01 02:56:57 +00:00
AArch64CollectLOH.cpp	AArch64CollectLOH: Rewrite as block-local analysis.	2017-01-06 19:22:01 +00:00
AArch64ConditionOptimizer.cpp	[CodeGen] Rename MachineInstrBuilder::addOperand. NFC	2017-01-13 09:58:52 +00:00
AArch64ConditionalCompares.cpp	[CodeGen] Rename MachineInstrBuilder::addOperand. NFC	2017-01-13 09:58:52 +00:00
AArch64DeadRegisterDefinitionsPass.cpp	AArch64: Use DeadRegisterDefinitionsPass before regalloc.	2016-11-16 03:38:27 +00:00
AArch64ExpandPseudoInsts.cpp	AArch64: lower "fence singlethread" to a pure compiler barrier.	2017-04-20 21:57:45 +00:00
AArch64FastISel.cpp	Add extra operand to CALLSEQ_START to keep frame part set up previously	2017-05-09 13:35:13 +00:00
AArch64FrameLowering.cpp	Move size and alignment information of regclass to TargetRegisterInfo	2017-04-24 18:55:33 +00:00
AArch64FrameLowering.h	…
AArch64GenRegisterBankInfo.def	GlobalISel: fall back gracefully when we can't map an operand's size.	2017-02-06 21:57:06 +00:00
AArch64ISelDAGToDAG.cpp	[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits	2017-04-28 05:31:46 +00:00
AArch64ISelLowering.cpp	Revert r302678 "[AArch64] Enable use of reduction intrinsics."	2017-05-15 20:59:32 +00:00
AArch64ISelLowering.h	Revert r302678 "[AArch64] Enable use of reduction intrinsics."	2017-05-15 20:59:32 +00:00
AArch64InstrAtomics.td	AArch64: lower "fence singlethread" to a pure compiler barrier.	2017-04-20 21:57:45 +00:00
AArch64InstrFormats.td	[globalisel][tablegen] Revise API for ComplexPattern operands to improve flexibility.	2017-04-22 15:11:04 +00:00
AArch64InstrInfo.cpp	[AArch64][MachineCombine] Fold FNMUL+FSUB -> FNMADD.	2017-05-11 20:07:24 +00:00
AArch64InstrInfo.h	Re-commit r301040 "X86: Don't emit zero-byte functions on Windows"	2017-04-21 21:48:41 +00:00
AArch64InstrInfo.td	Add extra operand to CALLSEQ_START to keep frame part set up previously	2017-05-09 13:35:13 +00:00
AArch64InstructionSelector.cpp	[globalisel][tablegen] Compute available feature bits correctly.	2017-04-29 17:30:09 +00:00
AArch64LegalizerInfo.cpp	GlobalISel: constrain G_INSERT to inserting just one value per instruction.	2017-03-03 23:05:47 +00:00
AArch64LegalizerInfo.h	GlobalISel: legalize va_arg on AArch64.	2017-02-15 23:22:50 +00:00
AArch64LoadStoreOptimizer.cpp	[AArch64] Use alias analysis in the load/store optimization pass.	2017-03-17 14:19:55 +00:00
AArch64MCInstLower.cpp	Remove TargetTriple from AArch64MCInstLower as it's used in few places	2016-10-01 01:50:25 +00:00
AArch64MCInstLower.h	…
AArch64MachineFunctionInfo.h	[AArch64, Lanai] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).	2017-01-06 00:30:53 +00:00
AArch64MacroFusion.cpp	[AArch64] Simplify MacroFusion	2017-04-11 19:13:11 +00:00
AArch64MacroFusion.h	[CodeGen] Move MacroFusion to the target	2017-02-01 02:54:34 +00:00
AArch64PBQPRegAlloc.cpp	…
AArch64PBQPRegAlloc.h	…
AArch64PerfectShuffle.h	…
AArch64PromoteConstant.cpp	Use StringRef in Pass/PassManager APIs (NFC)	2016-10-01 02:56:57 +00:00
AArch64RedundantCopyElimination.cpp	[AArch64][Redundant Copy Elim] Add support for CMN and shifted imm.	2017-03-06 21:20:00 +00:00
AArch64RegisterBankInfo.cpp	[AArch64][RegisterBankInfo] Change the default mapping of fp stores.	2017-05-10 15:19:41 +00:00
AArch64RegisterBankInfo.h	[RegisterBankInfo] Uniquely allocate instruction mapping.	2017-05-05 22:48:22 +00:00
AArch64RegisterBanks.td	Re-commit: [globalisel] Tablegen-erate current Register Bank Information	2017-01-19 11:15:55 +00:00
AArch64RegisterInfo.cpp	AArch64RegisterInfo: Simplify getReservedReg(); NFC	2017-02-02 02:23:25 +00:00
AArch64RegisterInfo.h	AArch64: Enable post-ra liveness updates	2016-12-16 23:55:43 +00:00
AArch64RegisterInfo.td	[AArch64] Corrected spill size for DDD register class. NFCI	2016-10-21 09:53:42 +00:00
AArch64SchedA53.td	[MachineScheduler] Reference the correct header.	2017-03-26 21:27:21 +00:00
AArch64SchedA57.td	[AArch64] Add new subtarget feature to fuse AES crypto operations	2017-02-01 02:54:39 +00:00
AArch64SchedA57WriteRes.td	[AArch64] Cortex-A57 FDIV/FSQRT scheduling fix (W-unit)	2016-12-23 12:51:41 +00:00
AArch64SchedCyclone.td	…
AArch64SchedFalkor.td	[AArch64][Falkor] Fix number of microops for WriteSTIdx missed in r300892.	2017-04-21 13:37:01 +00:00
AArch64SchedFalkorDetails.td	[AArch64][Falkor] Fix sched details for FMOV	2017-05-15 18:50:22 +00:00
AArch64SchedFalkorWriteRes.td	[AArch64][Falkor] Fix sched details for FMOV	2017-05-15 18:50:22 +00:00
AArch64SchedKryo.td	…
AArch64SchedKryoDetails.td	[AArch64] Refine Kryo Machine Model	2017-01-26 20:10:41 +00:00
AArch64SchedM1.td	[AArch64] Add new subtarget feature to fuse AES crypto operations	2017-02-01 02:54:39 +00:00
AArch64SchedThunderX.td	[AArch64] Vulcan is now ThunderXT99	2017-03-07 19:42:40 +00:00
AArch64SchedThunderX2T99.td	[AArch64] Vulcan is now ThunderXT99	2017-03-07 19:42:40 +00:00
AArch64Schedule.td	…
AArch64SelectionDAGInfo.cpp	[AArch64] Drive-by cleanup, make this code shorter. NFCI.	2017-03-22 23:37:58 +00:00
AArch64SelectionDAGInfo.h	…
AArch64StorePairSuppress.cpp	Use StringRef in Pass/PassManager APIs (NFC)	2016-10-01 02:56:57 +00:00
AArch64Subtarget.cpp	[SLP] Enable 64-bit wide vectorization on AArch64	2017-05-15 21:15:01 +00:00
AArch64Subtarget.h	[SLP] Enable 64-bit wide vectorization on AArch64	2017-05-15 21:15:01 +00:00
AArch64SystemOperands.td	AArch64InstPrinter: rewrite of printSysAlias	2017-02-27 14:45:34 +00:00
AArch64TargetMachine.cpp	[AArch64] Remove AArch64AddressTypePromotion pass	2017-05-05 16:05:41 +00:00
AArch64TargetMachine.h	[globalisel][tablegen] Move <Target>InstructionSelector declarations to anonymous namespaces	2017-04-06 09:49:34 +00:00
AArch64TargetObjectFile.cpp	CodeGen: simplify TargetMachine::getSymbol interface. NFC.	2016-11-22 16:17:20 +00:00
AArch64TargetObjectFile.h	Move the Mangler from the AsmPrinter down to TLOF and clean up the	2016-09-16 07:33:15 +00:00
AArch64TargetTransformInfo.cpp	Revert r302678 "[AArch64] Enable use of reduction intrinsics."	2017-05-15 20:59:32 +00:00
AArch64TargetTransformInfo.h	[SLP] Enable 64-bit wide vectorization on AArch64	2017-05-15 21:15:01 +00:00
AArch64VectorByElementOpt.cpp	[AArch64] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).	2017-01-25 00:29:26 +00:00
CMakeLists.txt	[AArch64] Remove AArch64AddressTypePromotion pass	2017-05-05 16:05:41 +00:00
LLVMBuild.txt	…