llvm-project

Commit Graph

Author	SHA1	Message	Date
David Blaikie	3f833edc7c	Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the layering of its implementation. llvm-svn: 317647	2017-11-08 01:01:31 +00:00
Jessica Paquette	13593843f6	[MachineOutliner] Disable outlining from LinkOnceODRs by default Say you have two identical linkonceodr functions, one in M1 and one in M2. Say that the outliner outlines A,B,C from one function, and D,E,F from another function (where letters are instructions). Now those functions are not identical, and cannot be deduped. Locally to M1 and M2, these outlining choices would be good-- to the whole program, however, this might not be true! To mitigate this, this commit makes it so that the outliner sees linkonceodr functions as unsafe to outline from. It also adds a flag, -enable-linkonceodr-outlining, which allows the user to specify that they want to outline from such functions when they know what they're doing. Changing this handles most code size regressions in the test suite caused by competing with linker dedupe. It also doesn't have a huge impact on the code size improvements from the outliner. There are 6 tests that regress > 5% from outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most tests either improve or are not impacted. Not outlined vs outlined without linkonceodrs: https://hastebin.com/raw/qeguxavuda Not outlined vs outlined with linkonceodrs: https://hastebin.com/raw/edepoqoqic Outlined with linkonceodrs vs outlined without linkonceodrs: https://hastebin.com/raw/awiqifiheb Numbers generated using compare.py with -m size.__text. Tests run for AArch64 with -Oz -mllvm -enable-machine-outliner -mno-red-zone. llvm-svn: 315136	2017-10-07 00:16:34 +00:00
Jessica Paquette	4cf187b5b4	[MachineOutliner] AArch64: Avoid saving + restoring LR if possible This commit allows the outliner to avoid saving and restoring the link register on AArch64 when it is dead within an entire class of candidates. This introduces changes to the way the outliner interfaces with the target. For example, the target now interfaces with the outliner using a MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and getOutliningFrameOverhead. This also improves several comments on the outliner's cost model. https://reviews.llvm.org/D36721 llvm-svn: 314341	2017-09-27 20:47:39 +00:00
Stanislav Mekhanoshin	7fe9a5d9b4	Allow target to decide when to cluster loads/stores in misched MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208	2017-09-13 22:20:47 +00:00
Evandro Menezes	509516d200	[AArch64] Adjust the cost model for Exynos M1 and M2 Add new predicate to more accurately model the cost of arithmetic and logical operations shifted left. Differential revision: https://reviews.llvm.org/D37151 llvm-svn: 311943	2017-08-28 22:51:32 +00:00
Jessica Paquette	d87f54493d	[MachineOutliner] NFC: Change IsTailCall to a call class + frame class This commit - Removes IsTailCall and replaces it with a target-defined unsigned - Refactors getOutliningCallOverhead and getOutliningFrameOverhead so that they don't use IsTailCall - Adds a call class + frame class classification to OutlinedFunction and Candidate respectively This accomplishes a couple things. Firstly, we don't need the notion of tail call in the general outlining algorithm. Secondly, we now can have different "outlining classes" for each candidate within a set of candidates. This will make it easy to add new ways to outline sequences for certain targets and dynamically choose an appropriate cost model for a sequence depending on the context that that sequence lives in. Ultimately, this should get us closer to being able to do something like, say avoid saving the link register when outlining AArch64 instructions. llvm-svn: 309475	2017-07-29 02:55:46 +00:00
Jessica Paquette	809d708b8a	[MachineOutliner] NFC: Split up getOutliningBenefit This is some more cleanup in preparation for some actual functional changes. This splits getOutliningBenefit into two cost functions: getOutliningCallOverhead and getOutliningFrameOverhead. These functions return the number of instructions that would be required to call a specific function and the number of instructions that would be required to construct a frame for a specific funtion. The actual outlining benefit logic is moved into the outliner, which calls these functions. The goal of refactoring getOutliningBenefit is to: - Get us closer to getting rid of the IsTailCall flag - Further split up "target-specific" things and "general algorithm" things llvm-svn: 309356	2017-07-28 03:21:58 +00:00
Adrian Prantl	8f4b353ee1	Remove unused function from AArch64 backend (NFC) llvm-svn: 309336	2017-07-27 23:52:06 +00:00
Hiroshi Inoue	a9ee279e70	fix typos in comments; NFC llvm-svn: 308126	2017-07-16 07:48:48 +00:00
Geoff Berry	b1e8714af9	[AArch64][Falkor] Avoid HW prefetcher tag collisions (step 1) Summary: This patch is the first step in reducing HW prefetcher instruction tag collisions in inner loops for Falkor. It adds a pass that annotates IR loads with metadata to indicate that they are known to be strided loads, and adds a target lowering hook that translates this metadata to a target-specific MachineMemOperand flag. A follow on change will use this MachineMemOperand flag to re-write instructions to reduce tag collisions. Reviewers: mcrosier, t.p.northover Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34963 llvm-svn: 308059	2017-07-14 21:44:12 +00:00
Geoff Berry	6748abe24d	[MIR] Add support for printing and parsing target MMO flags Summary: Add target hooks for printing and parsing target MMO flags. Targets may override getSerializableMachineMemOperandTargetFlags() to return a mapping from string to flag value for target MMO values that should be serialized/parsed in MIR output. Add implementation of this hook for AArch64 SuppressPair MMO flag. Reviewers: bogner, hfinkel, qcolombet, MatzeB Subscribers: mcrosier, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D34962 llvm-svn: 307877	2017-07-13 02:28:54 +00:00
Joel Jones	7466ccfc59	Doxygen formatting. NFCI llvm-svn: 307597	2017-07-10 22:11:50 +00:00
Chad Rosier	6db9ff64a8	[AArch64] Prefer Bcc to CBZ/CBNZ/TBZ/TBNZ when NZCV flags can be set for "free". This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a conditional branch (Bcc), when the NZCV flags can be set for "free". This is preferred on targets that have more flexibility when scheduling Bcc instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are equal). This can reduce register pressure and is also the default behavior for GCC. A few examples: add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS. cbz w8, .LBB_2 -> b.eq .LBB0_2 ; single def/use of w8 removed. add w8, w0, w1 -> adds w8, w0, w1 ; w8 has multiple uses. cbz w8, .LBB1_2 -> b.eq .LBB1_2 sub w8, w0, w1 -> subs w8, w0, w1 ; w8 has multiple uses. tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2 In looking at all current sub-target machine descriptions, this transformation appears to be either positive or neutral. Differential Revision: https://reviews.llvm.org/D34220. llvm-svn: 306144	2017-06-23 19:20:12 +00:00
Geoff Berry	d6ac96f953	[AArch64][Falkor] Refine sched details for LSLfast/ASRfast. llvm-svn: 303682	2017-05-23 19:57:45 +00:00
Hans Wennborg	9b9a5358dd	Re-commit r301040 "X86: Don't emit zero-byte functions on Windows" In addition to the original commit, tighten the condition for when to pad empty functions to COFF Windows. This avoids running into problems when targeting e.g. Win32 AMDGPU, which caused test failures when this was committed initially. llvm-svn: 301047	2017-04-21 21:48:41 +00:00
Hans Wennborg	04593000d8	Revert r301040 "X86: Don't emit zero-byte functions on Windows" This broke almost all bots. Reverting while fixing. llvm-svn: 301041	2017-04-21 21:10:37 +00:00
Hans Wennborg	cb3e810714	X86: Don't emit zero-byte functions on Windows Empty functions can lead to duplicate entries in the Guard CF Function Table of a binary due to multiple functions sharing the same RVA, causing the kernel to refuse to load that binary. We had a terrific bug due to this in Chromium. It turns out we were already doing this for Mach-O in certain situations. This patch expands the code for that in AsmPrinter::EmitFunctionBody() and renames TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it seems it was used for not just Mach-O anyway. Differential Revision: https://reviews.llvm.org/D32330 llvm-svn: 301040	2017-04-21 20:58:12 +00:00
Balaram Makam	b4419f9d30	[AArch64] Refine Falkor Machine Model - Part 3 This concludes the refinements to Falkor Machine Model. It includes SchedPredicates for immediate zero and LSL Fast. Forwarding logic is also modeled for vector multiply and accumulate only. llvm-svn: 299810	2017-04-08 03:30:15 +00:00
Jessica Paquette	ea8cc09be0	[Outliner] Add outliner for AArch64 This commit adds the necessary target hooks for outlining in AArch64. It also refactors the switch statement used in `getMemOpBaseRegImmOfsWidth` into a more general function, `getMemOpInfo`. This allows the outliner to share that code without copying and pasting it. The AArch64 outliner can be run using -mllvm -enable-machine-outliner, as with the X86-64 outliner. The test for this pass verifies that the outliner does, in fact outline functions, fixes up the stack accesses properly, and can correctly generate a tail call. In the future, this test should be replaced with a MIR test, so that we can properly test immediate offset overflows in fixed-up instructions. llvm-svn: 298162	2017-03-17 22:26:55 +00:00
Matthias Braun	e959544517	TargetInstrInfo: Provide default implementation of isTailCall(). In fact this default implementation should be the only implementation, keep it virtual for now to accomodate targets that don't model flags correctly. Differential Revision: https://reviews.llvm.org/D30747 llvm-svn: 297980	2017-03-16 20:02:30 +00:00
Evandro Menezes	94edf02923	[CodeGen] Move MacroFusion to the target This patch moves the class for scheduling adjacent instructions, MacroFusion, to the target. In AArch64, it also expands the fusion to all instructions pairs in a scheduling block, beyond just among the predecessors of the branch at the end. Differential revision: https://reviews.llvm.org/D28489 llvm-svn: 293737	2017-02-01 02:54:34 +00:00
Serge Rogatch	bc2d34394d	[XRay][AArch64] More staging for tail call support in XRay on AArch64 - in LLVM Summary: This patch prepares more for tail call support in XRay. Until the logging part supports tail calls, this is just staging, so it seems LLVM part is mostly ready with this patch. Related: https://reviews.llvm.org/D28948 (compiler-rt) Reviewers: dberris, rengolin Reviewed By: dberris Subscribers: llvm-commits, iid_iunknown, aemerson Differential Revision: https://reviews.llvm.org/D28947 llvm-svn: 293080	2017-01-25 20:21:49 +00:00
Geoff Berry	d46b6e8096	[AArch64] Fold some filled/spilled subreg COPYs Summary: Extend AArch64 foldMemoryOperandImpl() to handle folding spills of subreg COPYs with read-undef defs like: %vreg0:sub_32<def,read-undef> = COPY %WZR; GPR64:%vreg0 by widening the spilled physical source reg and generating: STRXui %XZR <fi#0> as well as folding fills of similar COPYs like: %vreg0:sub_32<def,read-undef> = COPY %vreg1; GPR64:%vreg0, GPR32:%vreg1 by generating: %vreg0:sub_32<def,read-undef> = LDRWui <fi#0> Reviewers: MatzeB, qcolombet Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27425 llvm-svn: 291180	2017-01-05 21:51:42 +00:00
Matthias Braun	115efcd3d1	MachineScheduler: Export function to construct "default" scheduler. This makes the createGenericSchedLive() function that constructs the default scheduler available for the public API. This should help when you want to get a scheduler and the default list of DAG mutations. This also shrinks the list of default DAG mutations: {Load\|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer added by default. Targets can easily add them if they need them. It also makes it easier for targets to add alternative/custom macrofusion or clustering mutations while staying with the default createGenericSchedLive(). It also saves the callback back and forth in TargetInstrInfo::enableClusterLoads()/enableClusterStores(). Differential Revision: https://reviews.llvm.org/D26986 llvm-svn: 288057	2016-11-28 20:11:54 +00:00
Matt Arsenault	0a3ea89e85	AArch64: Move remaining target specific BranchRelaxation bits to TII llvm-svn: 283458	2016-10-06 15:38:09 +00:00
Matt Arsenault	1b9fc8ed65	Finish renaming remaining analyzeBranch functions llvm-svn: 281535	2016-09-14 20:43:16 +00:00
Matt Arsenault	e8e0f5cac6	Make analyzeBranch family of instruction names consistent analyzeBranch was renamed to use lowercase first, rename the related set to match. llvm-svn: 281506	2016-09-14 17:24:15 +00:00
Matt Arsenault	a2b036e88b	AArch64: Use TTI branch functions in branch relaxation The main change is to return the code size from InsertBranch/RemoveBranch. Patch mostly by Tim Northover llvm-svn: 281505	2016-09-14 17:23:48 +00:00
Geoff Berry	22dfbc5637	[AArch64] Re-factor code shared by AArch64LoadStoreOpt and AArch64InstrInfo. This re-factoring could cause the following slight changes in generated code, though none were observed during testing: - MachineScheduler could decide not to cluster some loads/stores if there are other load/stores with non-pairable opcodes that have the same base register and offset as a pairable set of load/stores. One case of different MachineScheduler pairing did show up in my testing, but it wasn't due to this issue, but due BaseMemOpClusterMutation::clusterNeighboringMemOps() being unstable w.r.t. the order it considers memory operations. See PR28942. - The ImplicitNullChecks optimization could be done for more load/store opcodes. This optimization isn't done for C/C++ code, so it didn't show up in my testing. Reviewers: mcrosier, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23365 llvm-svn: 278515	2016-08-12 15:26:00 +00:00
Matt Arsenault	e8da145493	AArch64: BranchRelaxtion cleanups Move some logic into TII. llvm-svn: 277430	2016-08-02 08:06:17 +00:00
Sjoerd Meijer	0eb96ed0de	TargetInstrInfo: add virtual function getInstSizeInBytes This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of subclasses already implement. Differential Revision: https://reviews.llvm.org/D22885 llvm-svn: 277126	2016-07-29 08:16:16 +00:00
Sjoerd Meijer	89217f8835	TargetInstrInfo: rename GetInstSizeInBytes to getInstSizeInBytes. NFC Differential Revision: https://reviews.llvm.org/D22925 llvm-svn: 276997	2016-07-28 16:32:22 +00:00
Ahmed Bougacha	5e402eec7b	[AArch64] Mark various *Info classes as 'final'. NFC. llvm-svn: 276874	2016-07-27 14:31:46 +00:00
Jacques Pienaar	71c30a14b7	Rename AnalyzeBranch* to analyzeBranch*. Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect. Reviewers: tstellarAMD, mcrosier Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai Differential Revision: https://reviews.llvm.org/D22409 llvm-svn: 275564	2016-07-15 14:41:04 +00:00
Justin Lebar	288b3376ae	[CodeGen] Refactor MachineMemOperand::Flags's target-specific flags. Summary: Make the target-specific flags in MachineMemOperand::Flags real, bona fide enum values. This simplifies users, prevents various constants from going out of sync, and avoids the false sense of security provided by declaring static members in classes and then forgetting to define them inside of cpp files. Reviewers: MatzeB Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22372 llvm-svn: 275451	2016-07-14 18:15:20 +00:00
Duncan P. N. Exon Smith	9cfc75c214	CodeGen: Use MachineInstr& in TargetInstrInfo, NFC This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189	2016-06-30 00:01:54 +00:00
Benjamin Kramer	bdc4956bac	Pass DebugLoc and SDLoc by const ref. This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512	2016-06-12 15:39:02 +00:00
Jonas Paulsson	8e5b0c65cc	[foldMemoryOperand()] Pass LiveIntervals to enable liveness check. SystemZ (and probably other targets as well) can fold a memory operand by changing the opcode into a new instruction that as a side-effect also clobbers the CC-reg. In order to do this, liveness of that reg must first be checked. When LIS is passed, getRegUnit() can be called on it and the right LiveRange is computed on demand. Reviewed by Matthias Braun. http://reviews.llvm.org/D19861 llvm-svn: 269026	2016-05-10 08:09:37 +00:00
Chad Rosier	9d1a556125	Cleanup comments. NFC. llvm-svn: 268236	2016-05-02 14:56:21 +00:00
Gerolf Hoflehner	01b3a6184a	[MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098) The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that do not define it. Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267328	2016-04-24 05:14:01 +00:00
Daniel Sanders	591c379563	Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64 It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others. llvm-svn: 267127	2016-04-22 09:37:26 +00:00
Gerolf Hoflehner	b32f11fc62	[MachineCombiner] Support for floating-point FMA on ARM64 Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267098	2016-04-22 02:15:19 +00:00
Evgeny Astigeevich	fd89fe0dd3	[AArch64][CodeGen] Fix of PR27158: incorrect peephole optimization in AArch64InstrInfo::optimizeCompareInstr AArch64InstrInfo::optimizeCompareInstr has bug PR27158 which causes generation of incorrect code. A compare instruction is substituted with another instruction which does not produce the same flags as the original compare instruction. This patch contains: 1. Fix of the bug. 2. A regression test in MIR. 3. A new test to check that SUBS is replaced by SUB. Differential Revision: http://reviews.llvm.org/D18838 llvm-svn: 266969	2016-04-21 08:54:08 +00:00
Jun Bum Lim	4c5bd58ebe	[MachineScheduler]Add support for store clustering Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437	2016-04-15 14:58:38 +00:00
Evgeny Astigeevich	9c24ebfa6d	[AArch64][CodeGen] NFC refactor AArch64InstrInfo::optimizeCompareInstr to prepare it for fixing a bug in it AArch64InstrInfo::optimizeCompareInstr has a bug which causes generation of incorrect code (PR#27158). The patch refactors the function to simplify reviewing the fix of the bug. 1. Function name ‘modifiesConditionCode’ is changed to ‘areCFlagsAccessedBetweenInstrs’ to reflect that the function can check modifying accesses, reading accesses or both. 2. Function ‘AArch64InstrInfo::optimizeCompareInstr’ - Documented the function - Cmp_NZCV is DeadNZCVIdx to reflect that it is an operand index of dead NZCV - The code for the case of substituting CmpInstr is put into separate functions the main of them is ‘substituteCmpInstr’. Differential Revision: http://reviews.llvm.org/D18609 llvm-svn: 265531	2016-04-06 11:39:00 +00:00
Chad Rosier	cdfd7e7201	[AArch64] Enable more load clustering in the MI Scheduler. This patch adds unscaled loads and sign-extend loads to the TII getMemOpBaseRegImmOfs API, which is used to control clustering in the MI scheduler. This is done to create more opportunities for load pairing. I've also added the scaled LDRSWui instruction, which was missing from the scaled instructions. Finally, I've added support in shouldClusterLoads for clustering adjacent sext and zext loads that too can be paired by the load/store optimizer. Differential Revision: http://reviews.llvm.org/D18048 llvm-svn: 263819	2016-03-18 19:21:02 +00:00
Chad Rosier	e4e15ba046	[AArch64] Move helper functions into TII, so they can be reused elsewhere. NFC. llvm-svn: 263032	2016-03-09 17:29:48 +00:00
Chad Rosier	0da267dd1d	[AArch64] Minor cleanup/remove redundant code. NFC. llvm-svn: 263024	2016-03-09 16:46:48 +00:00
Chad Rosier	c27a18f39f	[TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC. http://reviews.llvm.org/D17967 llvm-svn: 263021	2016-03-09 16:00:35 +00:00
Haicheng Wu	08b9462540	[AArch64 MachineCombine] Enhance/Add support for general reassociation to reduce the critical path Allow fadd/fmul to be reassociated in aarch64. llvm-svn: 257024	2016-01-07 04:01:02 +00:00

1 2

87 Commits