llvm-project

Commit Graph

Author	SHA1	Message	Date
Evgeny Leviant	cb86522c94	[ARM][SchedModels] Convert IsR1P0AndLaterPred to MCSchedPredicate. NFC Differential revision: https://reviews.llvm.org/D90017	2020-10-23 14:27:49 +03:00
Evgeny Leviant	7a78073be7	[ARM][SchedModels] Let ldm* instruction scheduling use MCSchedPredicate Differential revision: https://reviews.llvm.org/D89957	2020-10-23 10:33:20 +03:00
Mircea Trofin	e24537d48f	[NFC][MC] Use MCRegister for ReachingDefAnalysis APIs Also updated the users of the APIs; and a drive-by small change to RDFRegister.cpp Differential Revision: https://reviews.llvm.org/D89912	2020-10-22 08:47:35 -07:00
Evgeny Leviant	ed6a91f456	[ARM][SchedModels] Convert IsLdstsoScaledPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89939	2020-10-22 18:03:01 +03:00
Evgeny Leviant	bf9edcb6fd	[ARM][SchedModels] Convert IsLdrAm3RegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89876	2020-10-21 20:49:10 +03:00
Paul C. Anagnostopoulos	dfd6b69e01	[ARM] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans Differential Revision: https://reviews.llvm.org/D89822	2020-10-21 09:52:45 -04:00
Nicholas Guy	9a2d2bedb7	Add "SkipDead" parameter to TargetInstrInfo::DefinesPredicate Some instructions may be removable through processes such as IfConversion, however DefinesPredicate can not be made aware of when this should be considered. This parameter allows DefinesPredicate to distinguish these removable instructions on a per-call basis, allowing for more fine-grained control from processes like ifConversion. Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose Differential Revision: https://reviews.llvm.org/D88494	2020-10-21 11:52:47 +01:00
Evgeny Leviant	991e86156c	[ARM][SchedModels] Convert IsCPSRDefinedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89460	2020-10-20 11:14:21 +03:00
David Green	6dcbc323fd	Revert "[ARM][LowOverheadLoops] Adjust Start insertion." This reverts commit `38f625d0d1`. This commit contains some holes in its logic and has been causing issues since it was commited. The idea sounds OK but some cases were not handled correctly. Instead of trying to fix that up later it is probably simpler to revert it and work to reimplement it in a more reliable way.	2020-10-20 08:55:21 +01:00
Luqman Aden	51892a42da	[COFF][ARM] Fix CodeView for Windows on 32bit ARM targets. Create the LLVM / CodeView register mappings for the 32-bit ARM Window targets. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D89622	2020-10-19 22:16:16 -07:00
Evgeny Leviant	8a7ca143f8	[ARM][SchedModels] Convert IsPredicatedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89553	2020-10-19 11:37:54 +03:00
David Green	b93d74ac9c	[ARM] Basic getArithmeticReductionCost reduction costs This adds some basic costs for MVE reductions - currently just costing the simple legal add vectors as a single MVE instruction. More complex costing can be added in the future when the framework more readily allows it. Differential Revision: https://reviews.llvm.org/D88980	2020-10-17 10:29:00 +01:00
David Green	d79ee3a807	[ARM] Add a very basic active_lane_mask cost This adds a very basic cost for active_lane_mask under MVE - making the assumption that they will be free and then apologizing for that in a comment. In reality they may either be free (by being nicely folded into a tail predicated loop), cost the same as a VCTP or be expanded into vdup's, adds and cmp's. It is difficult to detect the difference from a single getIntrinsicInstrCost call, so makes the assumption that the vectorizer is adding them, and only added them where it makes sense. We may need to change this in the future to better model predicate costs in the vectorizer, especially at -Os or non-tail predicated loops. The vectorizer currently does not query the cost of these instructions but that will change in the future and a zero cost there probably makes the most sense at the moment. Differential Revision: https://reviews.llvm.org/D88989	2020-10-17 10:09:42 +01:00
David Sherwood	47f2dc7e5f	[SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets In most of lib/Target we know that we are not dealing with scalable types so it's perfectly fine to replace TypeSize comparison operators with their fixed width equivalents, making use of getFixedSize() and so on. Differential Revision: https://reviews.llvm.org/D89101	2020-10-15 09:01:21 +01:00
Evgeny Leviant	2ad82b0ed1	[ARM.td] Make instruction definitions visible to sched models Differential revision: https://reviews.llvm.org/D89308	2020-10-14 09:58:45 +03:00
David Green	cb27006a94	[ARM] Attempt to make Tail predication / RDA more resilient to empty blocks There are a number of places in RDA where we assume the block will not be empty. This isn't necessarily true for tail predicated loops where we have removed instructions. This attempt to make the pass more resilient to empty blocks, not casting pointers to machine instructions where they would be invalid. The test contains a case that was previously failing, but recently been hidden on trunk. It contains an empty block to begin with to show a similar error. Differential Revision: https://reviews.llvm.org/D88926	2020-10-10 14:50:25 +01:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Sam Tebbs	68e002e181	[ARM] Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV. Differential Revision: https://reviews.llvm.org/D87836	2020-10-06 14:44:58 +01:00
Amara Emerson	c9f5cdd453	Revert "[ARM]Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV" This reverts commit `2573cf3c3d`. These seem to break some lit tests.	2020-10-05 10:52:43 -07:00
Sam Tebbs	2573cf3c3d	[ARM]Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV. Differential Revision: https://reviews.llvm.org/D87836	2020-10-05 15:51:28 +01:00
David Green	7feafa0286	[ARM] Fix pointer offset when splitting stores from VMOVDRR We were not accounting for the pointer offset when splitting a store from a VMOVDRR node, which could lead to incorrect aliasing info. In this case it is the fneg via integer arithmetic that gives us a store->load pair that we started getting wrong. Differential Revision: https://reviews.llvm.org/D88653	2020-10-03 16:47:50 +01:00
Meera Nakrani	f7c0e2b8f2	[ARM] Prevent constants from iCmp instruction from being hoisted if part of a min(max()) pattern Marks constants of an ICmp instruction as free if it's only user is a select instruction that is part of a min(max()) pattern. Ensures that in loops, in particular when loop unrolling is turned on, SSAT will still be correctly generated. Differential Revision: https://reviews.llvm.org/D88662	2020-10-02 09:28:35 +00:00
Meera Nakrani	48c9e8244b	[ARM] Removed hasSideEffects from signed/unsigned saturates Removed hasSideEffects from SSAT and USAT so that they are no longer marked as unpredictable. Differential Revision: https://reviews.llvm.org/D88545	2020-10-01 14:55:01 +00:00
Sam Parker	7e02bc81c6	[NFC][ARM] LowOverheadLoop DEBUG statements	2020-10-01 13:38:16 +01:00
Sam Parker	38f625d0d1	[ARM][LowOverheadLoops] Adjust Start insertion. Try to move the insertion point to become the terminator of the block, usually the preheader. Differential Revision: https://reviews.llvm.org/D88638	2020-10-01 10:49:19 +01:00
Sam Parker	6ec5f32497	[ARM][LowOverheadLoops] Iteration count liveness Before deciding to insert a [W\|D]LSTP, check that defining LR with the element count won't affect any other instructions that should be taking the iteration count. Differential Revision: https://reviews.llvm.org/D88549	2020-10-01 10:11:10 +01:00
Sam Parker	7b90516d47	[ARM][LowOverheadLoops] Start insertion point If possible, try not to move the start position earlier than it already is. Differential Revision: https://reviews.llvm.org/D88542	2020-10-01 10:05:25 +01:00
Sam Parker	dfa2c14b8f	[ARM][LowOverheadLoops] Use iterator for InsertPt. Use a MachineBasicBlock::iterator instead of a MachineInstr* for the position of our LoopStart instruction. NFCish, as it change debug info.	2020-10-01 08:32:35 +01:00
Rahman Lavaee	8955950c12	Exception support for basic block sections This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf This patch provides exception support for basic block sections by splitting the call-site table into call-site ranges corresponding to different basic block sections. Still all landing pads must reside in the same basic block section (which is guaranteed by the the core basic block section patch D73674 (ExceptionSection) ). Each call-site table will refer to the landing pad fragment by explicitly specifying @LPstart (which is omitted in the normal non-basic-block section case). All these call-site tables will share their action and type tables. The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. In the case of basic block section where one section contains all the landing pads, the landing pad offset relative to LPStart could actually be zero. Thus, we avoid zero-offset landing pads by inserting a nop operation as the first non-CFI instruction in the exception section. Background on Exception Handling in C++ ABI https://github.com/itanium-cxx-abi/cxx-abi/blob/master/exceptions.pdf Compiler emits an exception table for every function. When an exception is thrown, the stack unwinding library queries the unwind table (which includes the start and end of each function) to locate the exception table for that function. The exception table includes a call site table for the function, which is used to guide the exception handling runtime to take the appropriate action upon an exception. Each call site record in this table is structured as follows: \| CallSite \| --> Position of the call site (relative to the function entry) \| CallSite length \| --> Length of the call site. \| Landing Pad \| --> Position of the landing pad (relative to the landing pad fragment’s begin label) \| Action record offset \| --> Position of the first action record The call site records partition a function into different pieces and describe what action must be taken for each callsite. The callsite fields are relative to the start of the function (as captured in the unwind table). The landing pad entry is a reference into the function and corresponds roughly to the catch block of a try/catch statement. When execution resumes at a landing pad, it receives an exception structure and a selector value corresponding to the type of the exception thrown, and executes similar to a switch-case statement. The landing pad field is relative to the beginning of the procedure fragment which includes all the landing pads (@LPStart). The C++ ABI requires all landing pads to be in the same fragment. Nonetheless, without basic block sections, @LPStart is the same as the function @Start (found in the unwind table) and can be omitted. The action record offset is an index into the action table which includes information about which exception types are caught. C++ Exceptions with Basic Block Sections Basic block sections break the contiguity of a function fragment. Therefore, call sites must be specified relative to the beginning of the basic block section. Furthermore, the unwinding library should be able to find the corresponding callsites for each section. To do so, the .cfi_lsda directive for a section must point to the range of call-sites for that section. This patch introduces a new CallSiteRange structure which specifies the range of call-sites which correspond to every section: `struct CallSiteRange { // Symbol marking the beginning of the precedure fragment. MCSymbol FragmentBeginLabel = nullptr; // Symbol marking the end of the procedure fragment. MCSymbol FragmentEndLabel = nullptr; // LSDA symbol for this call-site range. MCSymbol *ExceptionLabel = nullptr; // Index of the first call-site entry in the call-site table which // belongs to this range. size_t CallSiteBeginIdx = 0; // Index just after the last call-site entry in the call-site table which // belongs to this range. size_t CallSiteEndIdx = 0; // Whether this is the call-site range containing all the landing pads. bool IsLPRange = false; };` With N basic-block-sections, the call-site table is partitioned into N call-site ranges. Conceptually, we emit the call-site ranges for sections sequentially in the exception table as if each section has its own exception table. In the example below, two sections result in the two call site ranges (denoted by LSDA1 and LSDA2) placed next to each other. However, their call-sites will refer to records in the shared Action Table. We also emit the header fields (@LPStart and CallSite Table Length) for each call site range in order to place the call site ranges in separate LSDAs. We note that with -basic-block-sections, The CallSiteTableLength will not actually represent the length of the call site table, but rather the reference to the action table. Since the only purpose of this field is to locate the action table, correctness is guaranteed. Finally, every call site range has one @LPStart pointer so the landing pads of each section must all reside in one section (not necessarily the same section). To make this easier, we decide to place all landing pads of the function in one section (hence the `IsLPRange` field in CallSiteRange). \| @LPStart \| ---> Landing pad fragment ( LSDA1 points here) \| CallSite Table Length \| ---> Used to find the action table. \| CallSites \| \| … \| \| … \| \| @LPStart \| ---> Landing pad fragment ( LSDA2 points here) \| CallSite Table Length \| \| CallSites \| \| … \| \| … \| … … \| Action Table \| \| Types Table \| Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D73739	2020-09-30 11:05:55 -07:00
Sam Parker	779a8a028f	[ARM][LowOverheadLoops] TryRemove helper. Make a helper function that wraps around RDA::isSafeToRemove and utilises the existing DCE IT block checks.	2020-09-30 09:37:24 +01:00
Sam Parker	195c22f273	[ARM] Change VPT state assertion Just because we haven't encountered an instruction setting the VPR, it doesn't mean we can't create a VPT block - the VPR maybe a live-in. Differential Revision: https://reviews.llvm.org/D88224	2020-09-30 08:01:10 +01:00
Sam Parker	4c19b89b25	[NFC][ARM] Comments and lambdas Add some comments in LowOverheadLoops and make some lambda variables explicit arguments instead of capturing.	2020-09-29 08:41:53 +01:00
Sam Parker	e82a0084d3	[ARM][LowOverheadLoops] Cleanup and re-arrange Rename and reorganise how we decide where to put the LoopStart instruction.	2020-09-28 16:06:30 +01:00
Tres Popp	509fba75df	[llvm] Fix unused variable in non-debug configurations	2020-09-28 17:04:08 +02:00
Meera Nakrani	675431b987	[ARM] Added more patterns to generate SSAT/USAT with shift Added patterns to generate an SSAT or USAT with shift for SSAT/USAT instructions that are matched from IR patterns. Differential Revision: https://reviews.llvm.org/D88145	2020-09-28 14:50:19 +00:00
Sam Parker	3d1d089155	[NFC][ARM] Factor out some logic for LoLoops. Create a DCE function that accepts an instruction.	2020-09-28 14:51:52 +01:00
Sjoerd Meijer	1696dd27fb	[ARM][MVE] Enable tail-predication by default We have been running tests/benchmarks downstream with tail-predication enabled for some time now and this behaves as expected: we are not aware of any correctness issues, and this performs better across the board than with tail-predication disabled. Time to flip the switch! Differential Revision: https://reviews.llvm.org/D88093	2020-09-28 14:01:23 +01:00
Sjoerd Meijer	f39f92c1f6	[ARM][MVE] tail-predication: overflow checks for elementcount, cont'd This is a reimplementation of the overflow checks for the elementcount, i.e. the 2nd argument of intrinsic get.active.lane.mask. The element count is lowered in each iteration of the tail-predicated loop, and we must prove that this expression doesn't overflow. Many thanks to Eli Friedman and Sam Parker for all their help with this work. Differential Revision: https://reviews.llvm.org/D88086	2020-09-28 09:20:51 +01:00
David Green	e4b9867cb6	[ARM] Expand cannotInsertWDLSTPBetween to the last instruction `9d9a11c7be` added this check for predicatable instructions between the D/WLSTP and the loop's start, but it was missing the last instruction in the block. Change it to use some iterators instead. Differential Revision: https://reviews.llvm.org/D88354	2020-09-28 09:14:40 +01:00
Sam Parker	a399d1880b	[ARM] Find VPT implicitly predicated by VCTP On failing to find a VCTP in the list of instructions that explicitly predicate the entry of a VPT block, inspect whether the block is controlled via VPT which is implicitly predicated due to it's predicated operand(s). Differential Revision: https://reviews.llvm.org/D87819	2020-09-25 08:50:53 +01:00
Sam Parker	00ee52ae04	[NFC][ARM] Remove dead loop. Remove a loop that just calculated a couple of values that were now longer needed.	2020-09-24 15:37:26 +01:00
Sjoerd Meijer	2fc690ac90	[ARM] LowoverheadLoops: add an option to disable tail-predication This might be useful for testing. We already have an option -tail-predication but that controls the MVETailPredication pass. This -arm-loloops-disable-tail-pred is just for disabling it in the LowoverheadLoops pass. Differential Revision: https://reviews.llvm.org/D88212	2020-09-24 13:30:48 +01:00
Sam Parker	9d9a11c7be	[ARM] Check for LSTP side-effects. If the LSTP instruction is inserted with an element count low enough to immediately predicate some lanes as false, this can have some unintended effects on any proceeding MVE instructions in the preheader. Differential Revision: https://reviews.llvm.org/D88209	2020-09-24 13:28:35 +01:00
Eli Friedman	3f739f736b	[SelectionDAG][GISel] Make LegalizeDAG lower FNEG using integer ops. Previously, if a floating-point type was legal, but FNEG wasn't legal, we would use FSUB. Instead, we should use integer ops, to preserve the semantics. (Alternatively, there's a compiler-rt call we could use, but there isn't much reason to use that.) It turns out we actually are still using this obscure codepath in a few cases: on some targets, we have "legal" floating-point types that don't actually support any floating-point operations. In particular, ARM and AArch64 are using this path. The implementation for SelectionDAG is pretty simple because we can reuse the infrastructure from FCOPYSIGN. See also `9a3dc3e`, the corresponding change to type legalization. Also includes a "bonus" change to STRICT_FSUB legalization, so we can lower a STRICT_FSUB to a float libcall. Includes the changes to both LegalizeDAG and GlobalISel so we don't have inconsistent results in the future. Fixes https://bugs.llvm.org/show_bug.cgi?id=46792 . Differential Revision: https://reviews.llvm.org/D84287	2020-09-23 14:10:33 -07:00
David Sherwood	e077367a28	[SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits An existing function Type::getScalarSizeInBits returns a uint64_t instead of a TypeSize class because the caller is requesting a scalar size, which cannot be scalable. This patch makes other similar functions requesting a scalar size consistent with that, thereby eliminating more than 1000 implicit TypeSize -> uint64_t casts. Differential revision: https://reviews.llvm.org/D87889	2020-09-23 09:20:08 +01:00
Stefanos Baziotis	a7873e5abc	Small fixes for "[LoopInfo] empty() -> isInnermost(), add isOutermost()"	2020-09-22 23:59:34 +03:00
Stefanos Baziotis	89c1e35f3c	[LoopInfo] empty() -> isInnermost(), add isOutermost() Differential Revision: https://reviews.llvm.org/D82895	2020-09-22 23:28:51 +03:00
Sam Parker	94c799fecf	[ARM] Trying to fix asan buildbot	2020-09-22 13:43:23 +01:00
Meera Nakrani	a3d0dce260	[ARM][TTI] Prevents constants in a min(max) or max(min) pattern from being hoisted when in a loop Changes TTI function getIntImmCostInst to take an additional Instruction parameter, which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT. We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated. Required minor changes in some non-ARM backends to allow for the optional parameter to be included. Differential Revision: https://reviews.llvm.org/D87457	2020-09-22 11:54:10 +00:00
Sam Parker	b4fa884a73	[ARM] Improve VPT predicate tracking The VPTBlock has been modified to track the 'global' state of the VPR, as well as the state for each block. Each object now just holds a list of instructions that makeup the block, while static structures hold the predicate information. This enables global access for querying how both a VPT block and individual instructions are predicated. These changes now allow us, again, to handle more complicated cases where multiple instructions build a predicate and/or where the same predicate in used in multiple blocks. It doesn't, however, get us back to before the tracking was 'fixed' as some extra logic will be required to properly handle VPT instructions. Currently a VPT could be effectively predicated because of it's inputs, but the existing logic will not detect that and so will refuse to perform the transformation. This can be seen in remat-vctp.ll test where we still don't perform the transform. Differential Revision: https://reviews.llvm.org/D87681	2020-09-22 10:40:27 +01:00

1 2 3 4 5 ...

11113 Commits