llvm-project

Commit Graph

Author	SHA1	Message	Date
Daniel Sanders	a1b2db7919	[globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to per-function predicates. Summary: This causes them to be re-computed more often than necessary but resolves objections that were raised post-commit on r301750. Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls Reviewed By: qcolombet Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32861 llvm-svn: 303418	2017-05-19 11:08:33 +00:00
Adam Nemet	e29686e5c1	[SLP] Enable 64-bit wide vectorization on AArch64 ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116	2017-05-15 21:15:01 +00:00
Matthew Simpson	78fd46b230	[AArch64] Consider widening instructions in cost calculations The AArch64 instruction set has a few "widening" instructions (e.g., uaddl, saddl, uaddw, etc.) that take one or more doubleword operands and produce quadword results. The operands are automatically sign- or zero-extended as appropriate. However, in LLVM IR, these extends are explicit. This patch updates TTI to consider these widening instructions as single operations whose cost is attached to the arithmetic instruction. It marks extends that are part of a widening operation "free" and applies a sub-target specified overhead (zero by default) to the arithmetic instructions. Differential Revision: https://reviews.llvm.org/D32706 llvm-svn: 302582	2017-05-09 20:18:12 +00:00
Daniel Sanders	e9fdba39e0	[globalisel][tablegen] Compute available feature bits correctly. Summary: Predicate<> now has a field to indicate how often it must be recomputed. Currently, there are two frequencies, per-module (RecomputePerFunction==0) and per-function (RecomputePerFunction==1). Per-function predicates are currently recomputed more frequently than necessary since the only predicate in this category is cheap to test. Per-module predicates are now computed in getSubtargetImpl() while per-function predicates are computed in selectImpl(). Tablegen now manages the PredicateBitset internally. It should only be necessary to add the required includes. Also fixed a problem revealed by the test case where constrainSelectedInstRegOperands() would attempt to tie operands that BuildMI had already tied. Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: rovka Subscribers: kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32491 llvm-svn: 301750	2017-04-29 17:30:09 +00:00
Tim Northover	879a0b2e1b	AArch64: support nonlazybind It's almost certainly not a good idea to actually use it in most cases (there's a pretty large code size overhead on AArch64), but we can't do those experiments until it's supported. llvm-svn: 300462	2017-04-17 17:27:56 +00:00
Petr Hosek	9eb0a1e09b	[AArch64][Fuchsia] Allow -mcmodel=kernel for --target=aarch64-fuchsia This mode is just like -mcmodel=small except that it moves the thread pointer from TPIDR_EL0 to TPIDR_EL1. Patch by Roland McGrath. Differential Revision: https://reviews.llvm.org/D31624 llvm-svn: 299462	2017-04-04 19:51:53 +00:00
Balaram Makam	2aba753e84	[AArch64] Add new subtarget feature to fold LSL into address mode. Summary: This feature enables folding of logical shift operations of up to 3 places into addressing mode on Kryo and Falkor that have a fastpath LSL. Reviewers: mcrosier, rengolin, t.p.northover Subscribers: junbuml, gberry, llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D31113 llvm-svn: 299240	2017-03-31 18:16:53 +00:00
Sanne Wouda	d4658ee634	[AArch64] [Assembler] option to disable negative immediate conversions Summary: Similar to the ARM target in https://reviews.llvm.org/rL298380, this patch adds identical infrastructure for disabling negative immediate conversions, and converts the existing aliases to the new infrastucture. Reviewers: rengolin, javed.absar, olista01, SjoerdMeijer, samparker Reviewed By: samparker Subscribers: samparker, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D31243 llvm-svn: 298908	2017-03-28 10:02:56 +00:00
Davide Italiano	a2c4e4b929	[Target] Remove some code probably copy/pasted from another backend. llvm-svn: 298825	2017-03-26 21:45:04 +00:00
Joel Jones	2852088126	[AArch64] Vulcan is now ThunderXT99 Broadcom Vulcan is now Cavium ThunderX2T99. LLVM Bugzilla: http://bugs.llvm.org/show_bug.cgi?id=32113 Minor fixes for the alignments of loops and functions for ThunderX T81/T83/T88 (better performance). Patch was tested with SpecCPU2006. Patch by Stefan Teleman Differential Revision: https://reviews.llvm.org/D30510 llvm-svn: 297190	2017-03-07 19:42:40 +00:00
Petr Hosek	a7d5916308	[Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stack The Fuchsia ABI defines slots from the thread pointer where the stack-guard value for stack-protector, and the unsafe stack pointer for safe-stack, are stored. This parallels the Android ABI support. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D30237 llvm-svn: 296081	2017-02-24 03:10:10 +00:00
Joel Jones	ab0f3b43e3	[AArch64] Add Cavium ThunderX support This set of patches adds support for Cavium ThunderX ARM64 processors: * ThunderX * ThunderX T81 * ThunderX T83 * ThunderX T88 Patch by Stefan Teleman Differential Revision: https://reviews.llvm.org/D28891 llvm-svn: 295475	2017-02-17 18:34:24 +00:00
Evandro Menezes	455382ea22	[AArch64] Add new target feature to fuse literal generation This feature enables the fusion of such operations on Cortex A57, as recommended in its Software Optimisation Guide, sections 4.14 and 4.15. Differential revision: https://reviews.llvm.org/D28698 llvm-svn: 293739	2017-02-01 02:54:42 +00:00
Evandro Menezes	b21fb29c26	[AArch64] Add new subtarget feature to fuse AES crypto operations This feature enables the fusion of such operations on Cortex A57, as recommended in its Software Optimisation Guide, section 4.13, and on Exynos M1. Differential revision: https://reviews.llvm.org/D28491 llvm-svn: 293738	2017-02-01 02:54:39 +00:00
Evandro Menezes	7784cacd91	[AArch64] Rename 'no-quad-ldst-pairs' to 'slow-paired-128' In order to follow the pattern of the existing 'slow-misaligned-128store' option, rename the option 'no-quad-ldst-pairs' to 'slow-paired-128'. llvm-svn: 292954	2017-01-24 17:34:31 +00:00
Chad Rosier	58fb5f5e58	[AArch64] Falkor supports Rounding Double Multiply Add/Subtract instructions. Falkor only partially implements the ARMv8.1a extensions, so this patch refactors the support for the SQRDML[A\|S]H instruction into a separate feature. Differential Revision: https://reviews.llvm.org/D28681 llvm-svn: 292142	2017-01-16 16:28:43 +00:00
Joel Jones	75818bc8f7	[AArch64] Refactor LSE support as feature separate from V8.1a support. Summary: This is preparation for ThunderX processors that have Large System Extension (LSE) atomic instructions, but not the other instructions introduced by V8.1a. This will mimic changes to GCC as described here: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00388.html LSE instructions are: LD/ST<op>, CAS*, SWP Reviewers: t.p.northover, echristo, jmolloy, rengolin Subscribers: aemerson, mehdi_amini Differential Revision: https://reviews.llvm.org/D26621 llvm-svn: 288279	2016-11-30 22:25:24 +00:00
Dean Michael Berris	3234d3a4bd	[XRay] Support AArch64 in LLVM This patch adds XRay support in LLVM for AArch64 targets. This patch is one of a series: Clang: https://reviews.llvm.org/D26415 compiler-rt: https://reviews.llvm.org/D26413 Author: rSerge Reviewers: rengolin, dberris Subscribers: amehsan, aemerson, llvm-commits, iid_iunknown Differential Revision: https://reviews.llvm.org/D26412 llvm-svn: 287209	2016-11-17 05:15:37 +00:00
Chad Rosier	201fc1ed26	[AArch64] Add support for Qualcomm's Falkor CPU. Differential Revision: https://reviews.llvm.org/D26673 llvm-svn: 287036	2016-11-15 21:34:12 +00:00
Chad Rosier	10c7aaaee9	[AArch64] Enable merging of adjacent zero stores for all subtargets. This optimization merges adjacent zero stores into a wider store. e.g., strh wzr, [x0] strh wzr, [x0, #2] ; becomes str wzr, [x0] e.g., str wzr, [x0] str wzr, [x0, #4] ; becomes str xzr, [x0] Previously, this was only enabled for Kryo and Cortex-A57. Differential Revision: https://reviews.llvm.org/D26396 llvm-svn: 286592	2016-11-11 14:10:12 +00:00
Chad Rosier	d6daac4746	[AArch64] Removed the narrow load merging code in the ld/st optimizer. This feature has been disabled for some time now, so remove cruft. Differential Revision: https://reviews.llvm.org/D26248 llvm-svn: 286110	2016-11-07 15:27:22 +00:00
Evandro Menezes	eff2bd9d4f	[AArch64] Optionally use the Newton series for reciprocal estimation Add support for estimating the square root or its reciprocal and division or reciprocal using the combiner generic Newton series. Differential revision: https://reviews.llvm.org/D25291 llvm-svn: 284986	2016-10-24 16:14:58 +00:00
Tim Northover	69fa84a6e9	GlobalISel: rename legalizer components to match others. The previous names were both misleading (the MachineLegalizer actually contained the info tables) and inconsistent with the selector & translator (in having a "Machine") prefix. This should make everything sensible again. The only functional change is the name of a couple of command-line options. llvm-svn: 284287	2016-10-14 22:18:18 +00:00
Matthias Braun	46a5238682	AArch64: Macrofusion: Split features, add missing combinations. AArch64InstrInfo::shouldScheduleAdjacent() determines whether two instruction can benefit from macroop fusion on apple CPUs. The list turned out to be incomplete: - the "rr" variants of the instructions were missing - even the "rs" variants can have shift value == 0 and behave like the "rr" variants This also splits the MacropFusion target feature into ArithmeticBccFusion and ArithmeticCbzFusion. Differential Revision: https://reviews.llvm.org/D25142 llvm-svn: 283243	2016-10-04 19:28:21 +00:00
Matthias Braun	a827ed8891	AArch64Subtarget: Remove unused CPUString field llvm-svn: 283142	2016-10-03 20:17:02 +00:00
Evandro Menezes	e45de8a5ec	Add support to optionally limit the size of jump tables. Many high-performance processors have a dedicated branch predictor for indirect branches, commonly used with jump tables. As sophisticated as such branch predictors are, they tend to have well defined limits beyond which their effectiveness is hampered or even nullified. One such limit is the number of possible destinations for a given indirect branches that such branch predictors can handle. This patch considers a limit that a target may set to the number of destination addresses in a jump table. Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar <aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>. Differential revision: https://reviews.llvm.org/D21940 llvm-svn: 282412	2016-09-26 15:32:33 +00:00
Evandro Menezes	9b5d89513b	Revert part of "AArch64: Do not test for CPUs, use SubtargetFeatures" This reverts part of commit 119e358d9635c8d1f3e7aee67e3ea3b8a62f8db6 by removing FeatureUseRSqrt et al per request by Eric Christopher <echristo@gmail.com> (v. http://bit.ly/2cmz6kW). llvm-svn: 282001	2016-09-20 19:02:09 +00:00
Ahmed Bougacha	6756a2c953	[GlobalISel] Introduce an instruction selector. And implement it for AArch64, supporting x/w ADD/OR. Differential Revision: https://reviews.llvm.org/D22373 llvm-svn: 276875	2016-07-27 14:31:55 +00:00
Ahmed Bougacha	5e402eec7b	[AArch64] Mark various *Info classes as 'final'. NFC. llvm-svn: 276874	2016-07-27 14:31:46 +00:00
Tim Northover	33b07d6725	GlobalISel: implement legalization pass, with just one transformation. This adds the actual MachineLegalizeHelper to do the work and a trivial pass wrapper that legalizes all instructions in a MachineFunction. Currently the only transformation supported is splitting up a vector G_ADD into one acting on smaller vectors. llvm-svn: 276461	2016-07-22 20:03:43 +00:00
Duncan P. N. Exon Smith	632987296f	Target: Remove unused arguments from overrideSchedPolicy, NFC TargetSubtargetInfo::overrideSchedPolicy takes two MachineInstr* arguments (begin and end) that invite implicit conversions from MachineInstrBundleIterator. One option would be to change their type to an iterator, but since they don't seem to have been used since the API was added in 2010, I'm deleting the dead code. llvm-svn: 274304	2016-07-01 00:23:27 +00:00
Silviu Baranga	aee40fc61c	[AArch64] Restore codegen for AArch64 Cortex-A72/A73 after NFCI Summary: Code generation for Cortex-A72/Cortex-A73 was accidentally changed by r271555, which was a NFCI. The isCortexA57() predicate was not true for Cortex-A72/Cortex-A73 before r271555 (since it was checking the CPU string). Because Cortex-A72/Cortex-A73 inherit all features from Cortex-A57, all decisions previously guarded by isCortexA57() are now taken. This change restores the behaviour before r271555 by adding separate ProcA72/ProcA73, which have the required features to preserve code generation. Reviewers: kristof.beyls, aadg, mcrosier, rengolin Subscribers: mcrosier, llvm-commits, aemerson, t.p.northover, MatzeB, rengolin Differential Revision: http://reviews.llvm.org/D21182 llvm-svn: 273277	2016-06-21 15:53:54 +00:00
Pankaj Gode	0aab2e398a	[AARCH64] Add support for Broadcom Vulcan Adding core tuning support for new Broadcom Vulcan core (ARMv8.1A). Differential Revision: http://reviews.llvm.org/D21500 llvm-svn: 273148	2016-06-20 11:13:31 +00:00
Evandro Menezes	a3a0a60cff	[AArch64] Add preferred alignments for Exynos M1 Differential Revision: http://reviews.llvm.org/D21203 llvm-svn: 272400	2016-06-10 16:00:18 +00:00
Sjoerd Meijer	d906bf1369	RAS extensions are part of ARMv8.2-A. This change enables them by introducing a new instruction to ARM and AArch64 targets and several system registers. Patch by: Roger Ferrer Ibanez and Oliver Stannard Differential Revision: http://reviews.llvm.org/D20282 llvm-svn: 271670	2016-06-03 14:03:27 +00:00
Matthias Braun	651cff42c4	AArch64: Do not test for CPUs, use SubtargetFeatures Testing for specific CPUs has a number of problems, better use subtarget features: - When some tweak is added for a specific CPU it is often desirable for the next version of that CPU as well, yet we often forget to add it. - It is hard to keep track of checks scattered around the target code; Declaring all target specifics together with the CPU in the tablegen file is a clear representation. - Subtarget features can be tweaked from the command line. To discourage people from using CPU checks in the future I removed the isCortexXX(), isCyclone(), ... functions. I added an getProcFamily() function for exceptional circumstances but made it clear in the comment that usage is discouraged. Reformat feature list in AArch64.td to have 1 feature per line in alphabetical order to simplify merging and sorting for out of tree tweaks. No functional change intended. Differential Revision: http://reviews.llvm.org/D20762 llvm-svn: 271555	2016-06-02 18:03:53 +00:00
Matthias Braun	27b6692fe2	AArch64Subtarget: Use default member initializers llvm-svn: 271057	2016-05-27 22:14:09 +00:00
Tom Stellard	cef0fe4245	[GlobalISel] Move GISelAccessor class into public headers Reviewers: qcolombet Subscribers: joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19120 llvm-svn: 266348	2016-04-14 17:45:38 +00:00
Quentin Colombet	c17f744001	[AArch64] Teach the subtarget how to get to the RegisterBankInfo. Rework the access to GlobalISel APIs to contain how much of the APIs we need to access for the final executable to build when GlobalISel is not built. This prevents massive usage of ifdefs in various places. Now, all the GlobalISel ifdefs will be happing only in AArch64TargetMachine.cpp. llvm-svn: 265567	2016-04-06 17:26:03 +00:00
Quentin Colombet	ba2a01645b	[GlobalISel] Re-apply r260922-260923 with MSVC-friendly code. Original message: Get rid of the ifdefs in TargetLowering. Introduce a new API used only by GlobalISel: CallLowering. This API will contain target hooks dedicated to call lowering. llvm-svn: 260998	2016-02-16 19:26:02 +00:00
Aaron Ballman	fc64ef1a15	Reverting r260922-260923; they cause link failures with MSVC. http://lab.llvm.org:8011/builders/lldb-x86-windows-msvc2015/builds/15436/steps/build/logs/stdio http://bb.pgr.jp/builders/msbuild-llvmclang-x64-msc18-DA/builds/961/steps/build_llvm/logs/stdio llvm-svn: 260972	2016-02-16 15:29:06 +00:00
Quentin Colombet	1ce38545fb	[GlobalISel] Get rid of the ifdefs in TargetLowering. Introduce a new API used only by GlobalISel: CallLowering. This API will contain target hooks dedicated to call lowering. llvm-svn: 260922	2016-02-16 00:57:44 +00:00
Chad Rosier	026f15e687	[AArch64] Enable post-RA MI scheduler for Kryo. This should have landed in r260686. llvm-svn: 260739	2016-02-12 21:27:33 +00:00
Chad Rosier	cd2be7f084	[AArch64] Add support for Qualcomm Kryo CPU. Machine model description by Dave Estes <cestes@codeaurora.org>. llvm-svn: 260686	2016-02-12 15:51:51 +00:00
MinSeong Kim	a7385ebf78	[AArch64] Add support for Samsung Exynos-M1 Adds core tuning support for new Samsung Exynos-M1 core (ARMv8-A). Differential Revision: http://reviews.llvm.org/D15663 llvm-svn: 256828	2016-01-05 12:51:59 +00:00
Chad Rosier	d016574df8	[AArch64] Enable PostRAScheduler for AArch64 generic build. Disable post-ra scheduler for perturbed tests to appease the bots and to preserve the history of the tests. http://reviews.llvm.org/D15652 llvm-svn: 256158	2015-12-21 14:43:45 +00:00
Rafael Espindola	9e1cae510f	Revert "[AArch64] Enable PostRAScheduler for AArch64 generic build" This reverts commit r255896. It broke the tests. llvm-svn: 255899	2015-12-17 15:12:26 +00:00
MinSeong Kim	d05e9fd194	[AArch64] Enable PostRAScheduler for AArch64 generic build This patch enables PostRAScheduler specifically for AArch64 generic build, which is beneficial from the performance perspective. Speedups up to 2 to 7% for some benchmarks on A57 and A53 are observed. Also benchmarks from LLVM test-suite did not regress. Differential Revision: http://reviews.llvm.org/D15557 llvm-svn: 255896	2015-12-17 14:51:22 +00:00
Christof Douma	8b5dc2c94e	[AArch64]: Add support for Cortex-A35 Adds support for the new Cortex-A35 ARMv8-A core. llvm-svn: 254503	2015-12-02 11:53:44 +00:00
Oliver Stannard	a34e47066e	[AArch64] Add ARMv8.2-A Statistical Profiling Extension The Statistical Profiling Extension is an optional extension to ARMv8.2-A. Since it is an optional extension, I have added the FeatureSPE subtarget feature to control it. The assembler-visible parts of this extension are the new "psb csync" instruction, which is equivalent to "hint #17", and a number of system registers. Differential Revision: http://reviews.llvm.org/D15021 llvm-svn: 254401	2015-12-01 10:48:51 +00:00

1 2 3

105 Commits