llvm-project

Commit Graph

Author	SHA1	Message	Date
Amaury Sechet	4f4387dd12	[TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66804 llvm-svn: 370190	2019-08-28 12:00:06 +00:00
David Green	379f6186dd	[ARM] Move MVEVPTBlockPass to a separate file. NFC This just pulls the MVEVPTBlockPass into a separate file, as opposed to being wrapped up in Thumb2ITBlockPass. Differential revision: https://reviews.llvm.org/D66579 llvm-svn: 370187	2019-08-28 11:37:31 +00:00
David Green	1c5b143c99	[MVE] VMOVX patterns This adds fp16 VMOVX patterns, using the same patterns as rL362482 with some adjustments for MVE. It allows us to move fp16 registers without going into and out of gprs. VMOVX is able to move the top bits from a fp16 in a fp reg into the bottom bits of another register, zeroing the rest. This can be used for odd MVE register lanes. The top bits are not read by fp16 instructions, so no move is required there if we are dealing with even lanes. Differential revision: https://reviews.llvm.org/D66793 llvm-svn: 370184	2019-08-28 10:13:23 +00:00
Sam Parker	a761ba0f2d	[ARM][ParallelDSP] Change search for muls rL369567 reverted a couple of recent changes made to ARMParallelDSP because of a miscompilation error: PR43073. The issue stemmed from an underlying bug that was caused by adding muls into a reduction before it was proved that they could be executed in parallel with another mul. Most of the changes here are from the previously reverted commits. The additional changes have been made area: 1) The Search function now doesn't insert any muls into the Reduction object. That now happens once the search has successfully finished. 2) For any muls added into the reduction but that weren't paired, we accumulate their values as an input into the smlad. Differential Revision: https://reviews.llvm.org/D66660 llvm-svn: 370171	2019-08-28 08:51:13 +00:00
Sam Clegg	90b6bb75e8	[MC] Minor cleanup to MCFixup::Kind handling. NFC. Prefer `MCFixupKind` where possible and add getTargetKind() to convert to `unsigned` when needed rather than scattering cast operators around the place. Differential Revision: https://reviews.llvm.org/D59890 llvm-svn: 369720	2019-08-23 01:00:55 +00:00
Sam Tebbs	a69d9d6156	Reapply: [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32 The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the changes from the above patch. This reverts commit `cd53ff6`, reapplying `7c6b229`. llvm-svn: 369638	2019-08-22 10:29:20 +00:00
Hans Wennborg	cd53ff6c0d	Revert r369626 "[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32" It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/ > This patch fixes shifts by a 128/256 bit shift amount. It also fixes > codegen for shifts of 32 by delegating to LLVM's default optimisation > instead of emitting a long shift. > > Tests that used to generate long shifts of 32 are updated to check for the > more optimised codegen. > > Differential revision: https://reviews.llvm.org/D66519 > > llvm-svn: 369626 llvm-svn: 369636	2019-08-22 09:16:53 +00:00
Sam Tebbs	7c6b229204	[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32 This patch fixes shifts by a 128/256 bit shift amount. It also fixes codegen for shifts of 32 by delegating to LLVM's default optimisation instead of emitting a long shift. Tests that used to generate long shifts of 32 are updated to check for the more optimised codegen. Differential revision: https://reviews.llvm.org/D66519 llvm-svn: 369626	2019-08-22 08:12:06 +00:00
Shiva Chen	72a41e7b0d	[TargetLowering] Remove optional arguments passing to makeLibCall The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497. The struct contain argument flags which will pass to makeLibCall function. The patch should not has any functionality changes. Differential Revision: https://reviews.llvm.org/D65795 llvm-svn: 369622	2019-08-22 04:59:43 +00:00
Nico Weber	ed18e70c86	Revert r367389 (and follow-up r368404); it caused PR43073. llvm-svn: 369567	2019-08-21 19:53:42 +00:00
David Green	717feabdf0	[ARM] Formatting for ARMInstrMVE.td. NFC This is just some formatting cleanup, prior to the masked load and store patch in D66534. llvm-svn: 369545	2019-08-21 16:20:35 +00:00
Sam Tebbs	dcfc2d40d3	[ARM] Select vaddva This patch adds vaddva selection. Differential revision: https://reviews.llvm.org/D66410 llvm-svn: 369404	2019-08-20 16:33:34 +00:00
Sam Tebbs	f312c1ecf4	[ARM] Add support for MVE vaddv This patch adds vecreduce_add and the relevant instruction selection for vaddv. Differential revision: https://reviews.llvm.org/D66085 llvm-svn: 369245	2019-08-19 09:38:28 +00:00
David Green	2bfc13fde1	[ARM] MVE sext costs This adds some sext costs for MVE, taken from the length of assembly sequences that we currently generate. Differential Revision: https://reviews.llvm.org/D66010 llvm-svn: 369244	2019-08-19 09:13:22 +00:00
Jian Cai	16fa8b0970	Reland "[ARM] push LR before __gnu_mcount_nc" This relands r369147 with fixes to unit tests. https://reviews.llvm.org/D65019 llvm-svn: 369173	2019-08-16 23:30:16 +00:00
Eli Friedman	eaff844fe9	[ARM] Preserve liveness in ARMConstantIslands. We currently don't use liveness information after this point, but it can be useful to catch bugs using -verify-machineinstrs, and optimizations could potentially use this information in the future. Differential Revision: https://reviews.llvm.org/D66319 llvm-svn: 369162	2019-08-16 22:20:14 +00:00
Jian Cai	2d957cfe02	Revert "[ARM] push LR before __gnu_mcount_nc" This reverts commit `f4cf3b9593`. llvm-svn: 369149	2019-08-16 20:40:21 +00:00
Jian Cai	f4cf3b9593	[ARM] push LR before __gnu_mcount_nc Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of the stack on ARM32. Differential Revision: https://reviews.llvm.org/D65019 llvm-svn: 369147	2019-08-16 20:21:08 +00:00
David Green	b782e61e47	[ARM] MVE sext of a load is free MVE also has some sext of loads, which will be free just as scalar instructions are. Differential Revision: https://reviews.llvm.org/D66008 llvm-svn: 369118	2019-08-16 15:13:37 +00:00
David Green	6e1ac42474	[ARM] Correct register for narrowing and widening MVE loads and stores. The widening and narrowing MVE instructions like VLDRH.32 are only permitted to use low tGPR registers. This means that if they are used for a stack slot, where the register used is only decided during frame setup, we need to be able to correctly pick a thumb1 register over a normal GPR. This attempts to add the required logic into eliminateFrameIndex and rewriteT2FrameIndex, only picking the FrameReg if it is a valid register for the operands register class, and picking a valid scratch register for the register class. Differential Revision: https://reviews.llvm.org/D66285 llvm-svn: 369108	2019-08-16 13:42:39 +00:00
David Green	8c2c5f5045	[ARM] Don't pretend we know how to generate MVE VLDn We don't yet know how to generate these instructions for MVE. And in the case of VLD3, we don't even have the instruction. For the moment don't tell the vectoriser that we have VLD4, just to end up serialising the results. Differential Revision: https://reviews.llvm.org/D66009 llvm-svn: 369101	2019-08-16 13:06:49 +00:00
Eli Friedman	9b9a308452	[ARM][LowOverheadLoops] Fix generated code for "revert". Two issues: 1. t2CMPri shouldn't use CPSR if it isn't predicated. This doesn't really have any visible effect at the moment, but it might matter in the future. 2. The t2CMPri generated for t2WhileLoopStart might need to use a register that isn't LR. My team found this because we have a patch to track register liveness late in the pass pipeline. I'll look into upstreaming it to help catch issues like this earlier. Differential Revision: https://reviews.llvm.org/D66243 llvm-svn: 369069	2019-08-15 23:35:53 +00:00
Philip Reames	5c38ca3534	[SDAG] Minor code cleanup/standardization of atomic accessors [NFC] llvm-svn: 369057	2019-08-15 22:21:14 +00:00
Daniel Sanders	0c47611131	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041	2019-08-15 19:22:08 +00:00
Jonas Devlieghere	0eaee545ee	[llvm] Migrate llvm::make_unique to std::make_unique Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013	2019-08-15 15:54:37 +00:00
David Green	3a99101812	[ARM] Fix alignment checks for BE VLDRH We need to allow any alignment at least 2, not just exactly 2, so that the big endian loads and stores can be selected successfully. I've also added extra BE testing for the load and store tests. Thanks to Oliver for the report. Differential Revision: https://reviews.llvm.org/D66222 llvm-svn: 368996	2019-08-15 12:54:47 +00:00
David Green	0ff2296a49	[ARM] MVE predicate store patterns Stack loads and stores were already working, but direct stores were not. This adds the patterns for them, same as predicate loads. Differential Revision: https://reviews.llvm.org/D66213 llvm-svn: 368988	2019-08-15 10:41:42 +00:00
David Green	04f2f32869	[ARM] MVE trunc to i1 vectors This adds patterns for selecting trunc instructions from full vectors to i1's vectors. Differential Revision: https://reviews.llvm.org/D66201 llvm-svn: 368981	2019-08-15 09:26:51 +00:00
David Green	a655393f17	[ARM] Add MVE beats vector cost model The MVE architecture has the idea of "beats", where a vector instruction can be executed over several ticks of the architecture. This adds a similar system into the Arm backend cost model, multiplying the cost of all vector instructions by a factor. This factor essentially becomes the expected difference between scalar code and vector code, on average. MVE Vector instructions can also overlap so the a true cost of them is often lower. But equally scalar instructions can in some situations be dual issued, or have other optimisations such as unrolling or make use of dsp instructions. The default is chosen as 2. This should not prevent vectorisation is a most cases (as the vector instructions will still be doing at least 4 times the work), but it will help prevent over vectorising in cases where the benefits are less likely. This adds things so far to the obvious places in ARMTargetTransformInfo, and updates a few related costs like not treating float instructions as cost 2 just because they are floats. Differential Revision: https://reviews.llvm.org/D66005 llvm-svn: 368733	2019-08-13 18:12:08 +00:00
Momchil Velikov	114c37e72a	[ARM] Fix detection of duplicates when parsing reg list operands Differential Revision: https://reviews.llvm.org/D65957 llvm-svn: 368712	2019-08-13 16:13:00 +00:00
Momchil Velikov	f990e4a4c7	[ARM] Fix encoding of APSR in CLRM instruction The APSR is encoded by setting bit 15 in the register list of the CLRM instruction (cf. https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf). Differential Revision: https://reviews.llvm.org/D65873 llvm-svn: 368711	2019-08-13 16:12:46 +00:00
Matt Arsenault	5af9cf042f	GlobalISel: Change representation of shuffle masks Currently shufflemasks get emitted as any other constant, and you end up with a bunch of virtual registers of G_CONSTANT with a G_BUILD_VECTOR. The AArch64 selector then asserts on anything that doesn't fit this pattern. This isn't an ideal representation, and should avoid legalization and have fewer opportunities for a representational error. Rather than invent a new shuffle mask operand type, similar to what ShuffleVectorSDNode does, just track the original IR Constant mask operand. I don't completely like the idea of adding another link to the IR, but MIR is already quite dependent on IR constants already, and this will allow sharing the shuffle mask utility functions with the IR. llvm-svn: 368704	2019-08-13 15:34:38 +00:00
Amara Emerson	e14c91b71a	[GlobalISel] Make the InstructionSelector instance non-const, allowing state to be maintained. Currently we can't keep any state in the selector object that we get from subtarget. As a result we have to plumb through all our variables through multiple functions. This change makes it non-const and adds a virtual init() method to allow further state to be captured for each target. AArch64 makes use of this in this patch to cache a call to hasFnAttribute() which is expensive to call, and is used on each selection of G_BRCOND. Differential Revision: https://reviews.llvm.org/D65984 llvm-svn: 368652	2019-08-13 06:26:59 +00:00
David Green	86876422ef	[ARM] sext of a load is free This teaches the cost model that the sext or zext of a load is going to be free. Differential Revision: https://reviews.llvm.org/D66006 llvm-svn: 368593	2019-08-12 17:39:56 +00:00
David Green	3e39f39ad9	[ARM] MVE shuffle broadcast costs A VDUP will perform a vector broadcast in a single instruction. Update the cost model for MVE accordingly. Code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D63448 llvm-svn: 368589	2019-08-12 16:54:07 +00:00
David Green	83bbfaa5e4	[ARM] Put some of the TTI costmodel behind hasNeon calls. This puts some of the calls in ARMTargetTransformInfo.cpp behind hasNeon() checks, now that we have MVE, and updates all the tests accordingly. Differential Revision: https://reviews.llvm.org/D63447 llvm-svn: 368587	2019-08-12 15:59:52 +00:00
David Green	11c4602fce	[MVE] Don't try to unroll vectorised MVE loops Due to the nature of the beat system in the MVE architecture, along with tail predication and low-overhead loops, unrolling has less benefit compared to normal loops. You can not, for example, hide the latency of a load with other instructions as you can for scalar code. Preventing unrolling also makes the code easier to read and reason about. So if a loop contains vector code, don't enable the runtime unrolling. At least for the time being. Differential Revision: https://reviews.llvm.org/D65803 llvm-svn: 368530	2019-08-11 08:53:18 +00:00
David Green	44f8d635e2	[ARM] Permit auto-vectorization using MVE With enough codegen complete, we can now correctly report the number and size of vector registers for MVE, allowing auto vectorisation. This also allows FP auto-vectorization for MVE without -Ofast/-ffast-math, due to support for IEEE FP arithmetic and parity between scalar and vector FP behaviour. Patch by David Sherwood. Differential Revision: https://reviews.llvm.org/D63728 llvm-svn: 368529	2019-08-11 08:42:57 +00:00
Daniel Sanders	e9a57c2b23	[globalisel] Add G_SEXT_INREG Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487	2019-08-09 21:11:20 +00:00
Tim Northover	e1a5f668b3	GlobalISel: pack various parameters for lowerCall into a struct. I've now needed to add an extra parameter to this call twice recently. Not only is the signature getting extremely unwieldy, but just updating all of the callsites and implementations is a pain. Putting the parameters in a struct sidesteps both issues. llvm-svn: 368408	2019-08-09 08:26:38 +00:00
Sam Parker	0dba791a25	[ARM][ParallelDSP] Replace SExt uses As loads are combined and widened, we replaced their sext users operands whereas we should have been replacing the uses of the sext. I've added a load of tests, with only a few of them originally causing assertion failures, the rest improve pattern coverage. Differential Revision: https://reviews.llvm.org/D65740 llvm-svn: 368404	2019-08-09 07:48:50 +00:00
David Green	27ca82f32a	[ARM] Add support for MVE pre and post inc loads and stores This adds pre- and post- increment and decrements for MVE loads and stores. It uses the builtin pre and post load/store detection, unlike Neon. Loads are selected with the code in tryT2IndexedLoad, stores are selected with tablegen patterns. The immediates have a +/-7bit range, multiplied by the size of the element. Differential Revision: https://reviews.llvm.org/D63840 llvm-svn: 368305	2019-08-08 15:27:58 +00:00
David Green	824ffd8b12	[ARM] MVE big endian loads/stores This adds some missing patterns for big endian loads/stores, allowing unaligned loads/stores to also be selected with an extra VREV, which produces better code than aligning through a stack. Also moves VLDR_P0 to not be LE only, and adjusts some of the tests to show all that working. Differential Revision: https://reviews.llvm.org/D65583 llvm-svn: 368304	2019-08-08 15:15:19 +00:00
Sam Tebbs	7ca980edcd	[ARM] Select VFMA llvm-svn: 368264	2019-08-08 08:21:01 +00:00
David Green	1becefd3f7	[ARM] Tighten up VLDRH.32 with low alignments VLDRH needs to have an alignment of at least 2, including the widening/narrowing versions. This tightens up the ISel patterns for it and alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded through the stack. It also fixed some incorrect shift amounts, which seemed to be passing a multiple not a shift. Differential Revision: https://reviews.llvm.org/D65580 llvm-svn: 368256	2019-08-08 06:22:03 +00:00
Oliver Cruickshank	4d4eefda6c	[ARM] Expand CTPOP intrinsic for MVE llvm-svn: 368180	2019-08-07 15:47:45 +00:00
Oliver Cruickshank	30dcae0956	[ARM] Generate MVE VHADDs/VHSUBs llvm-svn: 368146	2019-08-07 10:26:57 +00:00
Sam Parker	173de03740	[ARM][LowOverheadLoops] Revert after read/write Currently we check whether LR is stored/loaded to/from inbetween the loop decrement and loop end pseudo instructions. There's two problems here: - It relies on all load/store instructions being labelled as such in tablegen. - Actually any use of loop decrement is troublesome because the value doesn't exist! So we need to check for any read/write of LR that occurs between the two instructions and revert if we find anything. Differential Revision: https://reviews.llvm.org/D65792 llvm-svn: 368130	2019-08-07 07:39:19 +00:00
Matt Arsenault	f4d3113a5f	CodeGen: Migration to using Register llvm-svn: 367974	2019-08-06 03:59:31 +00:00
Amara Emerson	bc1172df14	[GlobalISel][CallLowering] Rename isArgumentHandler() -> isIncomingArgumentHandler() Previous name and comment incorrectly implied it was just for formal arg handlers, which is not true. llvm-svn: 367945	2019-08-05 23:05:28 +00:00

1 2 3 4 5 ...

10258 Commits