llvm-project

Commit Graph

Author	SHA1	Message	Date
Evgeny Leviant	a6a6d11c7b	[MC][ARM] Fix number of operands of tMOVSr Differential revision: https://reviews.llvm.org/D92029	2020-11-24 18:13:10 +03:00
Evgeny Leviant	50bd686695	Add support for branch forms of ALU instructions to Cortex-A57 model Patch fixes scheduling of ALU instructions which modify pc register. Patch also fixes computation of mutually exclusive predicates for sequences of variants to be properly expanded Differential revision: https://reviews.llvm.org/D91266	2020-11-24 11:43:51 +03:00
Craig Topper	4252f7773a	[SelectionDAG][ARM][AArch64][Hexagon][RISCV][X86] Add SDNPCommutative to fma and fmad nodes in tablegen. Remove explicit commuted patterns from targets. X86 was already specially marking fma as commutable which allowed tablegen to autogenerate commuted patterns. This moves it to the target independent definition and fix up the targets to remove now unneeded patterns. Unfortunately, the tests change because the commuted version of the patterns are generating operands in a different than the explicit patterns. Differential Revision: https://reviews.llvm.org/D91842	2020-11-23 10:09:20 -08:00
David Green	c8c3a411c5	[ARM] Ensure MVE_TwoOpPattern is used inside Predicate's	2020-11-22 21:38:00 +00:00
David Green	f08c37da7b	[ARM] Disable WLSTP loops This checks to see if the loop will likely become a tail predicated loop and disables wls loop generation if so, as the likelihood for reverting is currently too high. These should be fairly rare situations anyway due to the way iterations and element counts are used during lowering. Just not trying can alter how SCEV's are materialized however, leading to different codegen. It also adds a option to disable all while low overhead loops, for debugging. Differential Revision: https://reviews.llvm.org/D91663	2020-11-20 13:30:44 +00:00
Sam Tebbs	8ecb015ed5	[ARM][LowOverheadLoops] Convert intermediate vpr use assertion to condition This converts the intermediate VPR use assertion to a condition in the if-statement to protect against assertion failures in case behaviuour is changed. This is a follow-up to https://reviews.llvm.org/D90935 and implements the post-approval comments. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91790	2020-11-19 17:15:45 +00:00
David Green	006b3bdedd	[ARM] Deliberately prevent inline asm in low overhead loops. NFC This was already something that was handled by one of the "else" branches in maybeLoweredToCall, so this patch is an NFC but makes it explicit and adds a test. We may in the future want to support this under certain situations but for the moment just don't try and create low overhead loops with inline asm in them. Differential Revision: https://reviews.llvm.org/D91257	2020-11-19 13:28:21 +00:00
Duncan P. N. Exon Smith	5abf76fbe3	ADT: Add assertions to SmallVector::insert, etc., for reference invalidation `2c196bbc6b` asserted that `SmallVector::push_back` doesn't invalidate the parameter when it needs to grow. Do the same for `resize`, `append`, `assign`, `insert`, and `emplace_back`. Differential Revision: https://reviews.llvm.org/D91744	2020-11-18 17:36:28 -08:00
Mikhail Goncharov	f45c052c9e	Fix unused variables in release build Differential Revision: https://reviews.llvm.org/D91705	2020-11-18 15:18:31 +01:00
Sam Tebbs	da2e4728c7	[ARM][LowOverheadLoops] Merge VCMP and VPST across VPT blocks This patch adds support for combining a VPST with a dangling VCMP from a previous VPT block. Differential Revision: https://reviews.llvm.org/D90935	2020-11-18 12:54:16 +00:00
Florian Hahn	b2f4c5fddc	[AsmWriter] Factor out mnemonic generation to accessible getMnemonic. This patch factors out the part of printInstruction that gets the mnemonic string for a given MCInst. This is intended to be used subsequently for the instruction-mix remarks to display the final mnemonic (D90040). Unfortunately making `getMnemonic` available to the AsmPrinter seems to require making it virtual. Not sure if there's a way around that with the current layering of the AsmPrinters. Reviewed By: Paul-C-Anagnostopoulos Differential Revision: https://reviews.llvm.org/D90039	2020-11-17 09:47:38 +00:00
David Penry	48b43c9d4f	[ARM] Cortex-M7 schedule This patch adds the SchedMachineModel for Cortex-M7. It also adds test cases for the scheduling information. Details of the pipeline and descriptions are in comments in file ARMScheduleM7.td included in this patch. Differential Revision: https://reviews.llvm.org/D91355	2020-11-16 10:16:07 +00:00
serge-sans-paille	9218ff50f9	llvmbuildectomy - replace llvm-build by plain cmake No longer rely on an external tool to build the llvm component layout. Instead, leverage the existing `add_llvm_componentlibrary` cmake function and introduce `add_llvm_component_group` to accurately describe component behavior. These function store extra properties in the created targets. These properties are processed once all components are defined to resolve library dependencies and produce the header expected by llvm-config. Differential Revision: https://reviews.llvm.org/D90848	2020-11-13 10:35:24 +01:00
David Green	11dee2eae2	[ARM] Ensure CountReg definition dominates InsertPt when creating t2DoLoopStartTP Of course there was something missing, in this case a check that the def of the count register we are adding to a t2DoLoopStartTP would dominate the insertion point. In the future, when we remove some of these COPY's in between, the t2DoLoopStartTP will always become the last instruction in the block, preventing this from happening. In the meantime we need to check they are created in a sensible order. Differential Revision: https://reviews.llvm.org/D91287	2020-11-12 13:47:46 +00:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Sam Parker	898a81dfc5	[NFC][ARM] Replace lambda with any_of	2020-11-11 10:02:55 +00:00
Pirama Arumuga Nainar	8262e94a6d	[ARM] Fix PR 47980: Use constrainRegClass during foldImmediate opt. Previously we used setRegClass to rgpr, which may expand the register domain if the result was already in a constrained class (tcgpr in the above PR). Differential Revision: https://reviews.llvm.org/D91192	2020-11-10 13:38:11 -08:00
Benjamin Kramer	92c61a045f	[ARM] Silence unused variable warning in Release builds. NFC.	2020-11-10 20:35:28 +01:00
David Green	08d1c2d470	[ARM] Introduce t2DoLoopStartTP This introduces a new pseudo instruction, almost identical to a t2DoLoopStart but taking 2 parameters - the original loop iteration count needed for a low overhead loop, plus the VCTP element count needed for a DLSTP instruction setting up a tail predicated loop. The idea is that the instruction holds both values and the backend ARMLowOverheadLoops pass can pick between the two, depending on whether it creates a tail predicated loop or falls back to a low overhead loop. To do that there needs to be something that converts a t2DoLoopStart to a t2DoLoopStartTP, for which this patch repurposes the MVEVPTOptimisationsPass as a "tail predication and vpt optimisation" pass. The extra operand for the t2DoLoopStartTP is chosen based on the operands of VCTP's in the loop, and the instruction is moved as late in the block as possible to attempt to increase the likelihood of making tail predicated loops. Differential Revision: https://reviews.llvm.org/D90591	2020-11-10 18:08:12 +00:00
David Green	dbe1bf63aa	[ARM] Cleanup for ARMLowOverheadLoops. NFC	2020-11-10 17:28:07 +00:00
David Green	c7e275388e	[ARM] Don't aggressively unroll vector remainder loops We already do not unroll loops with vector instructions under MVE, but that does not include the remainder loops that the vectorizer produces. These remainder loops will be rarely executed and are not worth unrolling, as the trip count is likely to be low if they get executed at all. Luckily they get llvm.loop.isvectorized to make recognizing them simpler. We have wanted to do this for a while but hit issues with low overhead loops being reverted due to difficult registry allocation. With recent changes that seems to be less of an issue now. Differential Revision: https://reviews.llvm.org/D90055	2020-11-10 17:01:31 +00:00
David Green	73a6cd4b6b	[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR This hints the operand of a t2DoLoopStart towards using LR, which can help make it more likely to become t2DLS lr, lr. This makes it easier to move if needed (as the input is the same as the output), or potentially remove entirely. The hint is added after others (from COPY's etc) which still take precedence. It needed to find a place to add the hint, which currently uses the post isel custom inserter. Differential Revision: https://reviews.llvm.org/D89883	2020-11-10 16:28:57 +00:00
David Green	b2ac9681a7	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881	2020-11-10 15:57:58 +00:00
David Green	c8cd7e2bbf	[ARM] Remove MI variable aliasing. NFC This was accidentally using the same name for two different variables in the same line. Whilst it seems to work for some compilers, others have trouble and it is probably not a fantastic idea.	2020-11-09 18:18:43 +00:00
Momchil Velikov	937ab6a785	[ARM][MachineOutliner] Emit more CFI instructions This patch make the outliner emit CFI instructions in a few more places: * after LR is restored, but before the return in an outlined function * around save/restore of LR to/from a register at calls to outlined functions * around save/restore of LR to/from the stack at calls to outlined functions The latter two only when the function does NOT spill LR. If the function spills LR, then outliner generated saves/restores around calls are not considered interesting for unwinding the frame. Differential Revision: https://reviews.llvm.org/D89483	2020-11-09 15:26:18 +00:00
Sam Tebbs	40a3f7e48d	[ARM][LowOverheadLoops] Merge a VCMP and the new VPST into a VPT There were cases where a VCMP and a VPST were merged even if the VCMP didn't have the same defs of its operands as the VPST. This is fixed by adding RDA checks for the defs. This however gave rise to cases where the new VPST created would precede the un-merged VCMP and so would fail a predicate mask assertion since the VCMP wasn't predicated. This was solved by converting the VCMP to a VPT instead of inserting the new VPST. Differential Revision: https://reviews.llvm.org/D90461	2020-11-09 15:03:48 +00:00
David Green	a0a9e1c798	[ARM] Remove kill flags between VCMP and insertion point When we fold a VCMP into a VPST instruction any kill flags between the old VCMP position and the new insertion point need to be removed, in order to keep the verifier happy. Differential Revision: https://reviews.llvm.org/D90964	2020-11-09 13:17:53 +00:00
Lucas Prates	c2c2cc1360	[ARM][AArch64] Adding Neoverse V1 CPU support Add support for the Neoverse V1 CPU to the ARM and AArch64 backends. This is based on patches from Mark Murray and Victor Campos. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D90765	2020-11-09 13:15:40 +00:00
Sanjay Patel	264a6df353	[ARM] remove cost-kind predicate for cmp/sel costs This is the cmp/sel sibling to D90692. Again, the reasoning is: the throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uops). We need to check for a valid (non-null) condition type parameter because SimplifyCFG may pass nullptr for that (and so we will crash multiple regression tests without that check). I'm not sure if passing nullptr makes sense, but other code in the cost model does appear to check if that param is set or not. Differential Revision: https://reviews.llvm.org/D90781	2020-11-05 14:52:25 -05:00
Sander de Smalen	d57bba7cf8	[SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference. To accommodate frame layouts that have both fixed and scalable objects on the stack, describing a stack location or offset using a pointer + uint64_t is not sufficient. For this reason, we've introduced the StackOffset class, which models both the fixed- and scalable sized offsets. The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset, so that this can be used in other interfaces, such as to eliminate frame indices in PEI or to emit Debug locations for variables on the stack. This patch is purely mechanical and doesn't change the behaviour of how the result of this function is used for fixed-sized offsets. The patch adds various checks to assert that the offset has no scalable component, as frame offsets with a scalable component are not yet supported in various places. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D90018	2020-11-05 11:02:18 +00:00
Cameron McInally	c126eb7529	[SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247. Differential Revision: https://reviews.llvm.org/D90644	2020-11-04 14:20:31 -06:00
David Green	eb611930b6	[ARM] Remove unused variable. NFC	2020-11-04 09:00:03 +00:00
Sanjay Patel	c40126e740	[ARM] remove cost-kind predicate for most math op costs This is based on the same idea that I am using for the basic model implementation and what I have partly already done for x86: throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uop)). Differential Revision: https://reviews.llvm.org/D90692	2020-11-03 17:23:46 -05:00
David Green	bd32386410	[ARM] Remove unused variable. NFC	2020-11-03 12:58:10 +00:00
David Green	e474499402	[ARM] Treat memcpy/memset/memmove as call instructions for low overhead loops If an instruction will be lowered to a call there is no advantage of using a low overhead loop as the LR register will need to be spilled and reloaded around the call, and the low overhead will end up being reverted. This teaches our hardware loop lowering that these memory intrinsics will be calls under certain situations. Differential Revision: https://reviews.llvm.org/D90439	2020-11-03 11:53:09 +00:00
Momchil Velikov	7360d6d921	[ARM][MachineOutliner] Do not overestimate LR liveness in return block The `LiveRegUnits` utility (as well as `LivePhysRegs`) considers callee-saved registers to be alive at the point after the return instruction in a block. In the ARM backend, the `LR` register is classified as callee-saved, which is not really correct (from an ARM eABI or just common sense point of view). These two conditions cause the `MachineOutliner` to overestimate the liveness of `LR`, which results in unnecessary saves/restores of `LR` around calls to outlined sequences. It also causes the `MachineVerifer` to crash in some cases, because the save instruction reads a dead `LR`, for example when the following program: int h(int, int); int f(int a, int b, int c, int d) { a = h(a + 1, b - 1); b = b + c; return 1 + (2 * a + b) * (c - d) / (a - b) * (c + d); } int g(int a, int b, int c, int d) { a = h(a - 1, b + 1); b = b + c; return 2 + (2 * a + b) * (c - d) / (a - b) * (c + d); } is compiled with `-target arm-eabi -march=armv7-m -Oz`. This patch computes the liveness of `LR` in return blocks only, while taking into account the few ARM instructions, which read `LR`, but nevertheless the register is not mentioned (explicitly or implicitly) in the instruction operands. Differential Revision: https://reviews.llvm.org/D89189	2020-11-02 16:47:22 +00:00
Florian Hahn	b3b993a7ad	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit `408c4408fa`. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Evgeny Leviant	cc96a82291	[TableGen][SchedModels] Fix read/write variant substitution Patch fixes case when sched class has write and read variants belonging to different processor models. Differential revision: https://reviews.llvm.org/D89777	2020-11-02 17:39:04 +03:00
David Green	30ad742644	[ARM] Fix crash for gather of pointer costs. If the elt size is unknown due to it being a pointer, a comparison against 0 will cause an assert. Make sure the elt size is large enough before comparing and for the moment just return the scalar cost.	2020-10-31 13:10:14 +00:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit `73f01e3df5`. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Cameron McInally	dda1e74b58	[Legalize] Add legalizations for VECREDUCE_SEQ_FADD Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass. Differential Revision: https://reviews.llvm.org/D90247	2020-10-30 16:02:55 -05:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
David Green	d14db8c8dc	[ARM] Match MVE vqdmulh This adds ISel matching for a form of VQDMULH. There are several ir patterns that we could match to that instruction, this one is for: min(ashr(mul(sext(a), sext(b)), 7), 127) Which is what llvm will optimize to once it has removed the max that usually makes up the min/max saturate pattern, as in this case the compare will always be false. The additional complication to match i32 patterns (which extend into an i64) is that the min will be a vselect/setcc, as vmin is not supported for i64 vectors. Tablegen patterns have also been updated to attempt to reuse the MVE_TwoOpPattern patterns. Differential Revision: https://reviews.llvm.org/D90096	2020-10-30 13:34:27 +00:00
Nicholas Guy	eb9fe24eaf	[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present. Differential Revision: https://reviews.llvm.org/D88496	2020-10-29 15:17:31 +00:00
Evgeny Leviant	a28388f95b	[ARM][SchedModels] Move IsLDMBaseRegInListPred to ARMSchedule.td. NFC This predicate is not specific to cortex-a57 and can be used in other processor models as well.	2020-10-26 22:31:41 +03:00
Evgeny Leviant	e74f66125e	[ARM][SchedModels] Convert IsLdstsoScaledNotOptimalPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90150	2020-10-26 20:22:41 +03:00
Evgeny Leviant	a877bda397	Fix issue in cortex-a57 sched model Differential revision: https://reviews.llvm.org/D90152	2020-10-26 20:16:40 +03:00
Evgeny Leviant	a95ce5f65f	[ARM][SchedModels] Rename and generalize predicate. NFC	2020-10-26 12:14:55 +03:00
Evgeny Leviant	99b2756517	[ARM][SchedModels] Get rid of IsLdrAm2ScaledPred Differential revision: https://reviews.llvm.org/D90024	2020-10-26 12:01:39 +03:00
Evgeny Leviant	a4fc18e641	[ARM][SchedModels] Convert IsLdstsoMinusRegPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90029	2020-10-26 11:54:08 +03:00

1 2 3 4 5 ...

11167 Commits