llvm-project

Commit Graph

Author	SHA1	Message	Date
Eli Friedman	1eb160fe8d	[ARM] Fix tail call validity checking for varargs calls. If a varargs function is calling a non-varargs function, or vice versa, make sure we use the correct "varargs" bit for each. Fixes https://bugs.llvm.org/show_bug.cgi?id=45234 Differential Revision: https://reviews.llvm.org/D79199	2020-05-04 12:34:14 -07:00
David Green	de904f5325	[ARM] isHardwareLoopProfitable debug messages. NFC	2020-05-04 19:20:34 +01:00
Stanislav Mekhanoshin	c85eda74b8	[AMDGPU] fix copies between 32 and 16 bit This a hack to fix illegal 32 to 16 bit copies. The problem is when we make 16 bit subregs legal it creates a huge amount of failures which can only be resolved at once without a temporary hack like this. The next step is to change operands, instruction definitions and patterns until this hack is not needed. Differential Revision: https://reviews.llvm.org/D79119	2020-05-04 08:54:22 -07:00
Simon Pilgrim	4b9d75c1ac	[X86][SSE] Move some VZEXT_MOVL combines into combineTargetShuffle. NFC. Minor cleanup of combineShuffle by moving some of the low hanging fruit (load + scalar_to_vector folds).	2020-05-04 15:13:44 +01:00
alex-t	5b898bddff	[AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection. Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence. Reviewers: rampitec, arsenm, vpykhtin Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78091	2020-05-04 16:42:25 +03:00
Raul Tambre	0863e94ebd	[AArch64] Add NVIDIA Carmel support Summary: NVIDIA's Carmel ARM64 cores are used in Tegra194 chips found in Jetson AGX Xavier, DRIVE AGX Xavier and DRIVE AGX Pegasus. References: * https://devblogs.nvidia.com/nvidia-jetson-agx-xavier-32-teraops-ai-robotics/#h.huq9xtg75a5e * NVIDIA Xavier Series System-on-Chip Technical Reference Manual 1.3 (https://developer.nvidia.com/embedded/downloads#?search=Xavier%20Series%20SoC%20Technical%20Reference%20Manual) Reviewers: sdesmalen, paquette Reviewed By: sdesmalen Subscribers: llvm-commits, ianshmean, kristof.beyls, hiraditya, jfb, danielkiss, cfe-commits, t.p.northover Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D77940	2020-05-04 13:52:30 +01:00
Kerry McLaughlin	19f5da9c1d	[SVE][Codegen] Lower legal min & max operations Summary: This patch adds AArch64ISD nodes for [S\|U]MIN_PRED and [S\|U]MAX_PRED, and lowers both SVE intrinsics and IR operations for min and max to these nodes. There are two forms of these instructions for SVE: a predicated form and an immediate (unpredicated) form. The patterns which existed for the latter have been updated to match a predicated node with an immediate and map this to the immediate instruction. Reviewers: sdesmalen, efriedma, dancgr, rengolin Reviewed By: efriedma Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79087	2020-05-04 11:19:19 +01:00
Simon Moll	1e89f36c98	[VE][NFC] formatting VEISD enum	2020-05-04 09:50:27 +02:00
Craig Topper	243ffc0e65	[X86] Simplify some code in combineTruncatedArithmetic. NFC We haven't promoted AND/OR/XOR to vXi64 types for a while. So there's no reason to use isOperationLegalOrPromote. So we can just use isOperationLegal by merging with ADD handling.	2020-05-03 23:53:10 -07:00
Craig Topper	8b53fdd3b6	[X86] Custom legalize v16i64->v16i8 truncate with avx512. Default legalization will create two v8i64 truncs to v8i32, concat them to v16i32, and then truncate the rest of the way to v16i8. Instead we can truncate directly from v8i64 to v8i8 in the lower half of an xmm. Then concat the two halves to use vpunpcklqdq. This is the same number of uops, but the dependency chain through the uops is better since the halves are merged at the end. I had to had SimplifyDemandedBits support for VTRUNC to prevent a regression on vector-trunc-math.ll. combineTruncatedArithmetic no longer gets a chance to shrink vXi64 mul so we were producing the v8i64 multiply sequence using multiple PMULUDQs. With the demanded bits fix we are able to prune out the extra ops leaving just two PMULUDQs, one for each v8i64 half. This is twice the width of the 2 v8i32 PMULLDs we had before, but PMULUDQ is 1 uop and PMULLD is 2. We also save some truncates. It's probably worth using PMULUDQ even when PMULLQ is available since the latter is 3 uops, but that will require a different change. Differential Revision: https://reviews.llvm.org/D79231	2020-05-03 23:26:04 -07:00
Simon Pilgrim	7c203163c7	[X86] Use splitVector helper in truncateVectorWithPACK/splitVectorStore/combineHorizontalMinMaxResult/combineReductionToHorizontal. NFC. All these locations were performing the same type splitting/extractSubVector calls as the spltVector helper.	2020-05-03 13:40:38 +01:00
Simon Pilgrim	e8d9794a23	[X86] Don't limit splitVector helper to simple types. It can handle EVT just as well (and so can the extractSubVector calls).	2020-05-03 12:27:37 +01:00
Simon Pilgrim	74e9952c8e	[X86][SSE] splitAndLowerShuffle - use splitVector helper. NFC. The splitVector helper uses extractSubVector which splits build vectors like we do here, so avoid reimplementing it. splitVector could easily be extended to peek through bitcasts as well but I'd prefer to keep this commit NFC.	2020-05-03 11:26:51 +01:00
Simon Pilgrim	4d2b0ebd17	[X86] detectAVGPattern - use matchUnaryPredicate helper. NFC. Use the ISD::matchUnaryPredicate helper to check for inrange constants.	2020-05-03 11:26:51 +01:00
Sam Elliott	fe4245a4c1	[RISCV] Implement convertSelectOfConstantsToMath Summary: The current lowering of `select` on RISC-V uses a branch instruction to load a register with one or other value. This is inefficient, especially in the case of small constants that can be computed easily. By implementing the TargetLowering::convertSelectOfConstantsToMath hook, some of the simpler cases are covered that let us avoid introducing a branch in these cases. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D79260	2020-05-02 15:05:57 +01:00
Sam Elliott	a4a9a1f671	[RISCV] Add patterns for checking isnan Summary: This patch addresses some weird assembly sequences we were seeing during comparing floats. In particular, comparing a float to itself tells you whether it is NaN or not, which we were doing correctly, but with an extra unneeded `and` instruction. This patch specialises the existing patterns to remove the `and` instructions when both their operands are the same. Reviewed By: luismarques, asb Differential Revision: https://reviews.llvm.org/D78908	2020-05-02 15:01:04 +01:00
Sam McCall	d10c995b4d	std::isspace -> llvm::isSpace (where locale should be ignored) I've left out some cases where I wasn't totally sure this was right or whether the include was ok (compiler-rt) or idiomatic (flang).	2020-05-02 15:36:04 +02:00
Thomas Lively	e0f52842c8	[WebAssembly] Renumber SIMD opcodes Summary: As described in https://github.com/WebAssembly/simd/pull/209. This is the final reorganization of the SIMD opcode space before standardization. It has been landed in concert with corresponding changes in other projects in the WebAssembly SIMD ecosystem. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79224	2020-05-01 17:20:49 -07:00
Nemanja Ivanovic	8ca2fc9993	[PowerPC] Refactor PPCInstrVSX.td Over time, we have made many additions to this file and it has frankly become a bit of a mess. This has led to at least one issue - we have a number of instructions where the side effects flag should be set to false and we neglected to do this. This patch suggests a refactoring that should make the file much more maintainable. The file is split up into major sections and the nesting level is reduced, predicate blocks merged, etc. Sections: - Custom PPCISD node definitions - Predicate definitions - Instruction formats - Instruction definitions - Helper DAG definitions - Anonymous patterns - Instruction aliases Differential revision: https://reviews.llvm.org/D78132	2020-05-01 19:17:39 -05:00
Craig Topper	b938168aef	[X86] Lower the cost of v4i64->v4i32 truncate with avx512. We use the vpmovqd instruction which is a single uop. So the cost should be 1.	2020-05-01 11:09:37 -07:00
Simon Pilgrim	8cbd8194c1	[X86] Improving folding of concat_vectors of subvectors from the same broadcast Handle concat_vectors(extract_subvector(broadcast(x)), extract_subvector(broadcast(x))) -> broadcast(x) To expose this we also need collectConcatOps to recognise the insert_subvector(x, extract_subvector(x, lo), hi) subvector splat pattern	2020-05-01 11:23:10 +01:00
Jay Foad	5f7ea85e78	[AMDGPU] Remove unnecessary s_waitcnt between VMEM loads VMEM loads of the same type (sampler vs no sampler) are guaranteed to write their result registers in order, so there is no need for an s_waitcnt even if they write to overlapping vgprs. Differential Revision: https://reviews.llvm.org/D79176	2020-05-01 10:10:23 +01:00
Craig Topper	ed7479b635	[X86] Update type actions for ISD::TRUNCATE with avx512f to be Legal when possible. NFCI The Custom handler wasn't doing anything for these cases anyway.	2020-04-30 23:27:29 -07:00
Suyog Sarda	ea093f6481	Handle cases for subregisters. While restoring latency, check if any of the registers of source instruction is a subregister of the successor instructions apart from being same register.	2020-04-30 20:32:33 -05:00
Hubert Tong	a3515ab8af	[MC][Target][XCOFF] Consolidate MCAsmInfo XCOFF defaults; NFC The setting of `MCAsmInfo` properties for XCOFF got split between `MCAsmInfoXCOFF` and `PPCXCOFFMCAsmInfo`. Except for the properties that are dependent on the target information being passed via the constructor, the properties being set in `PPCXCOFFMCAsmInfo` had no fundamental reason for being treated as specific for XCOFF on PowerPC. Indeed, the property that might be considered more specific to PowerPC, `NeedsFunctionDescriptors`, was set in `MCAsmInfoXCOFF`. XCOFF being specific to PowerPC anyway, this patch consolidates the setting of the properties into `MCAsmInfoXCOFF` except for the cases that are dependent on the information provided via the `PPCXCOFFMCAsmInfo` constructor. This patch also reorders the assignments to the fields to match the declaration order in `MCAsmInfo`.	2020-04-30 20:48:30 -04:00
Craig Topper	c5f7c039ef	[X86] Add x, t and g modifiers for inline asm This patch adds the x, t and g modifiers for inline asm from GCC. These will print a vector register as xmm, ymm or zmm* respectively. I also fixed register names with modifiers with inteldialect so they are no longer printed with a leading %. Patch by Amanieu d'Antras Differential Revision: https://reviews.llvm.org/D78977	2020-04-30 17:45:45 -07:00
Craig Topper	6a1ad76dab	[X86] Don't return true from isTruncateFree for vectors Also fix some cost tables for vXi1 types to match the costs entries for the types they will be promoted to. Differential Revision: https://reviews.llvm.org/D79045	2020-04-30 16:43:35 -07:00
Craig Topper	ff66919020	[X86][CostModel] Bump the cost of vpermw/vpermt2b/vperm2w vpermw is 2 uops. vpermt2b/vpermt2w are two shuffle uops and a port 015 uop. Weirdly vpermb is a single uop. This patch bumps the cost to 2 for these operations. Maybe should go to 3 for the vpermt2*, but I've started conservative. I've also removed a few entries that were now the same as earlier subtargets or that I didn't think we really did. Like I don't think we extend v32i8 to v32i16, shuffle, and then truncate. Differential Revision: https://reviews.llvm.org/D79148	2020-04-30 11:32:25 -07:00
Simon Pilgrim	bf468f4349	[X86][SSE] Canonicalize UNARYSHUFFLE(XOR(X,-1) -> XOR(UNARYSHUFFLE(X),-1) This pushes the NOT pattern up the DAG to help expose it for further combines (AND->ANDN in particular). The PSHUFD/MOVDDUP 'splat' cases are the only ones I've seen in the wild so far, we can further generalize if/when we need to.	2020-04-30 19:18:51 +01:00
Arthur Eubanks	a90948fd6e	[NFC] Rename ByValOrInalloca to PassPointeeByValue Summary: In preparation for preallocated. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79152	2020-04-30 09:42:13 -07:00
Simon Pilgrim	30211c4783	[X86] combineANDXORWithAllOnesIntoANDNP - add BROADCAST handling Fold BROADCAST(NOT(Y)) -> BROADCAST(Y) as part of finding a NOT inversion.	2020-04-30 16:24:17 +01:00
Jay Foad	1bf7ccb706	[AMDGPU] Use int and unsigned instead of other 32-bit integer types. NFC.	2020-04-30 15:21:36 +01:00
diggerlin	a2c8cd1812	[AIX] emit .extern and .weak directive linkage SUMMARY: emit .extern and .weak directive linkage Reviewers: hubert.reinterpretcast, Jason Liu Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D76932	2020-04-30 09:54:10 -04:00
Simon Pilgrim	96238486ed	[DAGCombine] Move the remaining X86 funnel shift patterns to DAGCombine X86 matches several 'shift+xor' funnel shift patterns: fold (or (srl (srl x1, 1), (xor y, 31)), (shl x0, y)) -> (fshl x0, x1, y) fold (or (shl (shl x0, 1), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y) fold (or (shl (add x0, x0), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y) These patterns are also what we end up with the proposed expansion changes in D77301. This patch moves these to DAGCombine's generic MatchFunnelPosNeg. All existing X86 test cases still pass, and we just have a small codegen change in pr32282.ll. Reviewed By: @spatel Differential Revision: https://reviews.llvm.org/D78935	2020-04-30 12:57:17 +01:00
Jay Foad	462b960de8	Fix silly mistake in `31c09d03a1` [AMDGPU] Remove WaitcntBrackets::MixedPendingEvents[]. NFC.	2020-04-30 11:41:14 +01:00
Sam Elliott	09f6b9792b	[RISCV][NFC] Remove Duplicated F Extension Patterns	2020-04-30 11:35:49 +01:00
Cullen Rhodes	672b62ea21	[AArch64][SVE] Custom lowering of floating-point reductions Summary: This patch implements custom floating-point reduction ISD nodes that have vector results, which are used to lower the following intrinsics: * llvm.aarch64.sve.fadda * llvm.aarch64.sve.faddv * llvm.aarch64.sve.fmaxv * llvm.aarch64.sve.fmaxnmv * llvm.aarch64.sve.fminv * llvm.aarch64.sve.fminnmv SVE reduction instructions keep their result within a vector register, with all other bits set to zero. Changes in this patch were implemented by Paul Walker and Sander de Smalen. Reviewers: sdesmalen, efriedma, rengolin Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D78723	2020-04-30 10:18:40 +00:00
David Sherwood	058cd8c5be	[CodeGen] Add support for inserting elements into scalable vectors Summary: This patch tries to ensure that we do something sensible when generating code for the ISD::INSERT_VECTOR_ELT DAG node when operating on scalable vectors. Previously we always returned 'undef' when inserting an element into an out-of-bounds lane index, whereas now we only do this for fixed length vectors. For scalable vectors it is assumed that the backend will do the right thing in the same way that we have to deal with variable lane indices. In this patch I have permitted a few basic combinations for scalable vector types where it makes sense, but in general avoided most cases for now as they currently require the use of BUILD_VECTOR nodes. This patch includes tests for all scalable vector types when inserting into lane 0, but I've only included one or two vector types for other cases such as variable lane inserts. Differential Revision: https://reviews.llvm.org/D78992	2020-04-30 11:14:04 +01:00
Jay Foad	86545bf72d	[AMDGPU] Simplify loops in SIInsertWaitcnts::generateWaitcntInstBefore The loops over use operands and def operands were mostly identical. Combine them, and likewise for load memoperands and store memoperands. NFC.	2020-04-30 08:53:12 +01:00
Jay Foad	9f59d1931c	[AMDGPU] Remove Def argument from WaitcntBrackets::getRegInterval. NFC. It's cleaner to check this in the callers instead.	2020-04-30 08:53:12 +01:00
Fangrui Song	52eb2f65a7	[MC] Move MCInstrAnalysis::evaluateBranch to X86MCInstrAnalysis::evaluateBranch The generic implementation is actually specific to x86. It assumes the offset is relative to the end of the instruction and the immediate is not scaled (which is false on most RISC).	2020-04-29 23:23:52 -07:00
Craig Topper	9d4bcc3a60	[X86] Merge the last of the useBWIRegs() section into the useAVX512Regs() section of the X86TargetLowering constructor. NFC This section is the remnant of how this code was structured before we made v32i16/v64i8 legal types with avx512f when not restricting to 256 bit vectors. Now that there are just a few items left, merge them near similar things in the other section.	2020-04-29 14:40:04 -07:00
Craig Topper	cff6686532	[X86] Lower the cost of v4i64->v4i32 and v8i64->v8i32 truncate with AVX We generate much better code these days than we used to. And we use the same sequence for AVX1 and AVX2 for these For v4i64->v4i32 we generate: vextractf128 xmm1, ymm0, 1 vshufps xmm0, xmm0, xmm1, 136 # xmm0 = xmm0[0,2],xmm1[0,2] And for v8i64->v8i32 we generate: vperm2f128 ymm2, ymm0, ymm1, 49 # ymm2 = ymm0[2,3],ymm1[2,3] vinsertf128 ymm0, ymm0, xmm1, 1 vshufps ymm0, ymm0, ymm2, 136 # ymm0 = ymm0[0,2],ymm2[0,2],ymm0[4,6],ymm2[4,6] Differential Revision: https://reviews.llvm.org/D79109	2020-04-29 13:21:44 -07:00
Jay Foad	31c09d03a1	[AMDGPU] Remove WaitcntBrackets::MixedPendingEvents[]. NFC. It's trivial to derive this information from other state.	2020-04-29 19:58:19 +01:00
Jay Foad	120572072e	[AMDGPU] Initialize gpr upper bounds to -1. NFC. These upper bounds are inclusive, so -1 (rather than 0) is the natural way to express an empty range.	2020-04-29 19:58:06 +01:00
Jay Foad	777f91f47e	[AMDGPU] Simplify MergeInfo calculations. NFC. This makes the definition and uses of NewUB more symmetrical, and makes it clear that ScoreLBs[T] does not change.	2020-04-29 19:58:06 +01:00
Ulrich Weigand	e1de2773a5	[SystemZ] Allow specifying plain register numbers in AsmParser For compatibility with other assemblers on the platform, allow using just plain integer register numbers in all places where a register operand is expected. Bug: llvm.org/PR45582	2020-04-29 20:42:30 +02:00
Ulrich Weigand	6bfde063f0	[SystemZ] Simplify register parsing in AsmParser Remove redundant Group and Regs arguments from parseRegister and eliminate one of its overloaded versions. Remove redundant Regs argument from parseAddress. NFC intended.	2020-04-29 20:42:30 +02:00
Simon Pilgrim	f0903de1aa	[x86] Enable bypassing 64-bit division on generic x86-64 This is currently enabled for Intel big cores from Sandy Bridge onward, as well as Atom, Silvermont, and KNL, due to 64-bit division being so slow on these cores. AMD cores can do this in hardware (use 32-bit division based on input operand width), so it's not a win there. But since the majority of x86 CPUs benefit from this optimization, and since the potential upside is significantly greater than the downside, we should enable this for the generic x86-64 target. Patch By: @atdt Reviewed By: @craig.topper, @RKSimon Differential Revision: https://reviews.llvm.org/D75567	2020-04-29 16:55:48 +01:00
Victor Campos	d3dc4c32af	[AArch64] Remove inexistent system register ERXTS_EL1 Summary: AArch64's system register ERXTS_EL1 is present in the backend as a component of the Arm Reliability, Availability and Serviceability (RAS) extension. However, it has been removed from the specification before its final release. This patch removes the register. Reviewers: SjoerdMeijer, DavidSpickett Reviewed By: DavidSpickett Subscribers: DavidSpickett, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79007	2020-04-29 16:43:48 +01:00
Jay Foad	4649da119a	[AMDGPU] Use a MapVector instead of a DenseMap and a std::vector. NFC.	2020-04-29 16:02:24 +01:00
Jay Foad	2a10957f62	[AMDGPU] Minor cleanups. NFC.	2020-04-29 16:02:24 +01:00
Simon Pilgrim	090cae8491	[TTI] Add DemandedElts to getScalarizationOverhead The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited. This patch does 2 things: 1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern. 2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs. This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing. A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well. Reviewed By: @craig.topper Differential Revision: https://reviews.llvm.org/D78216	2020-04-29 12:00:38 +01:00
Jay Foad	3c1f21cdf6	[AMDGPU] Remove some redundant variables. NFC.	2020-04-29 09:24:41 +01:00
Dmitri Gribenko	1a9cc47f94	Fixed a -Wunused-variable warning in no-assert builds	2020-04-29 09:12:47 +02:00
Craig Topper	52a6d47ada	[X86] Add initialize function for X86FixupSetCC so that it will show up in print-after-all.	2020-04-28 23:31:34 -07:00
David Blaikie	f6d5320ebe	WebAssemblyExceptionInfo::Exceptions: Use unique_ptr to simplify memory management	2020-04-28 17:33:46 -07:00
David Blaikie	eadb596730	InstrCOPYReplacer::Converters: Use unique_ptr to own values to simplify memory management	2020-04-28 17:33:46 -07:00
Stanislav Mekhanoshin	26777ad7a0	[AMDGPU] Adapt GCNRegBankReassign for 16 bit subregs It allows it not to crash and analyze 16 bit subregs if those appear in the instructions. At the same time it does not attempt to reassign these. It still can correctly identify register banks to let larger registers to be reassigned. More work will be needed here when real instructions will use these registers and more tests as well. Differential Revision: https://reviews.llvm.org/D78772	2020-04-28 16:16:04 -07:00
Stanislav Mekhanoshin	8a30460697	[AMDGPU] Define AGPR subregs These are only needed as VGPR counterpart. Differential Revision: https://reviews.llvm.org/D78597	2020-04-28 15:30:43 -07:00
Jessica Paquette	e0dbeb2173	Fix buildbot after `9f31446c` Add missing ifndef to make release builds happy. Example failure: http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/4006/steps/ninja-build/logs/stdio	2020-04-28 15:19:17 -07:00
Craig Topper	446a3be8f1	[X86] Add PACK instructions to hasUndefRegUpdate so the BreakFalseDeps pass will reassign an undef second source to match the first source We generate PACK instructions with an undef second source when we are truncating from a 128-bit vector to something narrower and we don't care about the upper bits of the vector register. The register allocation process will always assign untied undef uses to xmm0. This creates a false dependency on xmm0. By adding these instructions to hasUndefRegUpdate, we can get the BreakFalseDeps pass to reassign the source to match the other input. Normally this interface is used for instructions that might need an xor inserted to break the dependency. But the pass also has a heuristic that tries to use the same register as other sources. That should always be possible for these instructions so we'll never trigger the xor dependency break. Differential Revision: https://reviews.llvm.org/D79032	2020-04-28 15:11:32 -07:00
Stanislav Mekhanoshin	46a75436f8	[AMDGPU] Define special SGPR subregs These are used in SReg_32 and when we start to use SGPR_LO16 there will be compaints that not all registers in RC support all subreg indexes. For now it is NFC. Unused regunits are reserved so that verifier does not complain about missing phys reg live-ins. Differential Revision: https://reviews.llvm.org/D78591	2020-04-28 14:57:46 -07:00
Jessica Paquette	9f31446c99	[AArch64][GlobalISel] Generalize logic for promoting copies Generalize the 16-bit FPR to 32-bit GPR logic to work for all cases where destination size is bigger than source size. Also fixed CheckCopy() always returning true instead of the result of isValidCopy(). Differential Revision: https://reviews.llvm.org/D77530 Patch by tambre (Raul Tambre)	2020-04-28 14:56:08 -07:00
Stanislav Mekhanoshin	395d93358e	Revert "[AMDGPU] Define special SGPR subregs" This reverts commit `1baaa080e0`.	2020-04-28 13:53:15 -07:00
Stanislav Mekhanoshin	1baaa080e0	[AMDGPU] Define special SGPR subregs These are used in SReg_32 and when we start to use SGPR_LO16 there will be compaints that not all registers in RC support all subreg indexes. For now it is NFC. Unused regunits are reserved so that verifier does not complain about missing phys reg live-ins. Differential Revision: https://reviews.llvm.org/D78591	2020-04-28 13:34:24 -07:00
Sean Fertile	2a3cf5e583	[PowerPC][AIX] Pass ByVal formal args that span registers and stack. Implement passing of ByVal formal arguments when the argument is passed partly in the argument registers, with the remainder of the argument passed on the stack. Differential Revision: https://reviews.llvm.org/D78515	2020-04-28 14:57:14 -04:00
Craig Topper	59b9e6fe76	[X86] Update costs for truncates from less than 128-bit vectors to vXi1 on pre-avx512 targets vXi1 types are legalized by promoting, but the narrow vectors are legalized by widening. This results in some truncates turning into any_extends.	2020-04-28 11:35:41 -07:00
Jessica Paquette	2af31b3b65	[AArch64][GlobalISel] Select immediate forms of compares by wiggling constants Similar to code in `getAArch64Cmp` in AArch64ISelLowering. When we get a compare against a constant, sometimes, that constant isn't valid for selecting an immediate form. However, sometimes, you can get a valid constant by adding 1 or subtracting 1, and updating the condition code. This implements the following transformations when valid: - x slt c => x sle c - 1 - x sge c => x sgt c - 1 - x ult c => x ule c - 1 - x uge c => x ugt c - 1 - x sle c => x slt c + 1 - x sgt c => s sge c + 1 - x ule c => x ult c + 1 - x ugt c => s uge c + 1 Valid meaning the constant doesn't wrap around when we fudge it, and the result gives us a compare which can be selected into an immediate form. This also moves `getImmedFromMO` higher up in the file so we can use it. Differential Revision: https://reviews.llvm.org/D78769	2020-04-28 11:35:01 -07:00
Craig Topper	0de7ddbfb0	[X86] Handle more cases in combineAddOrSubToADCOrSBB. This adds support for X + SETAE --> sbb X, -1 X - SETAE --> adc X, -1 Fixes PR45700 Differential Revision: https://reviews.llvm.org/D78984	2020-04-28 10:39:39 -07:00
Craig Topper	d42192c50f	[X86][CostModel] Correct the costs for truncate to a mask register with avx512 I've modified isTruncateFree to get an accurate cost for types that need to be split. I'm planning to look into fixing it for all vectors, but need more cost cleanups first. Differential Revision: https://reviews.llvm.org/D78973	2020-04-28 10:39:36 -07:00
Francis Visoiu Mistrih	e770153865	[AArch64] Add support for -ffixed-x30 Add support for reserving LR in: * the driver through `-ffixed-x30` * cc1 through `-target-feature +reserve-x30` * the backend through `-mattr=+reserve-x30` * a subtarget feature `reserve-x30` the same way we're doing for the other registers.	2020-04-28 08:48:28 -07:00
Nick Desaulniers	1b9fdec1f6	[TII] remove overrides of isUnpredicatedTerminator Summary: They all match the base implementation in TargetInstrInfo::isUnpredicatedTerminator. Follow up to D62749. Reviewers: echristo, MaskRay, hfinkel Reviewed By: echristo Subscribers: wuzish, nemanjai, hiraditya, kbarton, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D78976	2020-04-28 08:47:28 -07:00
David Green	1084b32339	[ARM] Always replace FP16 bitcasts with VMOVhr or VMOVrh This changes the logic with lowering fp16 bitcasts to always produce either a VMOVhr or a VMOVrh, instead of only trying to do it with certain surrounding nodes. To perform the same optimisations demand bits and known bits information has been added for them. Differential Revision: https://reviews.llvm.org/D78587	2020-04-28 16:12:53 +01:00
Krzysztof Parzyszek	25a4b1904c	Handle part-word LL/SC in atomic expansion pass Differential Revision: https://reviews.llvm.org/D77213	2020-04-28 10:07:39 -05:00
Ng Zhi An	500b4ad5f4	[PowerPC] Fix downcast from nullptr for target streamer getTargetStreamer() might return null (e.g. when running inlined-strings.ll test), downcasting to a reference will be wrong. This is detectable with -fsanitize=null. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D78686	2020-04-28 09:20:10 +00:00
Sam Parker	e9c9329aa4	[TTI] Add TargetCostKind argument to getUserCost There are several different types of cost that TTI tries to provide explicit information for: throughput, latency, code size along with a vague 'intersection of code-size cost and execution cost'. The vectorizer is a keen user of RecipThroughput and there's at least 'getInstructionThroughput' and 'getArithmeticInstrCost' designed to help with this cost. The latency cost has a single use and a single implementation. The intersection cost appears to cover most of the rest of the API. getUserCost is explicitly called from within TTI when the user has been explicit in wanting the code size (also only one use) as well as a few passes which are concerned with a mixture of size and/or a relative cost. In many cases these costs are closely related, such as when multiple instructions are required, but one evident diverging cost in this function is for div/rem. This patch adds an argument so that the cost required is explicit, so that we can make the important distinction when necessary. Differential Revision: https://reviews.llvm.org/D78635	2020-04-28 08:57:45 +01:00
Kazushi (Jam) Marukawa	3c80478d73	[VE] Update branch instructions Summary: Changing all mnemonic to match assembly instructions to simplify mnemonic naming rules. This time update all branch instructions. This also change to use %s10 register consistently. Differential Revision: https://reviews.llvm.org/D78889	2020-04-28 09:41:01 +02:00
Kazushi (Jam) Marukawa	0314e8980f	[VE] Support floating point immediate values Summary: Add simm7fp/mimmfp to represent floating point immediate values. Also clean multiclasses to define floating point arithmetic instructions to handle simm7fp/mimmfp operands. Also add several regression tests for new operands. Differential Revision: https://reviews.llvm.org/D78887	2020-04-28 09:36:10 +02:00
Chen Zheng	45d92806ea	[PowerPC] use inst-level fast-math-flags to drive MachineCombiner Currently, on PowerPC target, it uses function scope UnsafeFPMath option to drive Machine Combiner pass. This is not accurate in two ways: 1: the scope is not accurate. Machine Combiner pass only requires instruction-level flags instead of the function scope. 2: the float point flag is not accurate. Machine Combiner pass only requires float point flags reassoc and nsz. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D78183	2020-04-28 03:31:12 -04:00
Haojian Wu	b73290be9f	Fix the -Wunused-variable warning.	2020-04-28 08:44:15 +02:00
Craig Topper	a58b62b4a2	[IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand(). This method has been commented as deprecated for a while. Remove it and replace all uses with the equivalent getCalledOperand(). I also made a few cleanups in here. For example, to removes use of getElementType on a pointer when we could just use getFunctionType from the call. Differential Revision: https://reviews.llvm.org/D78882	2020-04-27 22:17:03 -07:00
Kang Zhang	4bb0a1cb70	[PowerPC] Fix the liveins for ppc-expand-isel pass Summary: In the ppc-expand-isel pass, we use stepForward() to update the liveins, this function is not recommended, because it needs the accurate kill info. This patch uses the function computeAndAddLiveIns() to update the liveins, it's the recommended method and can fix the liveins bug for ppc-expand-isel pass.. Reviewed By: efriedma, lkail Differential Revision: https://reviews.llvm.org/D78657	2020-04-28 03:22:48 +00:00
Nick Desaulniers	bc7f3240e6	[X86] remove derived method w/ same impl as base Summary: While looking into issues with IfConverter, I noticed that X86InstrInfo::isUnpredicatedTerminator matched its overriden implementation in TargetInstrInfo::isUnpredicatedTerminator. Reviewers: craig.topper, hfinkel, MaskRay, echristo Reviewed By: MaskRay, echristo Subscribers: hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62749	2020-04-27 17:41:00 -07:00
Craig Topper	37ec709233	[X86][CostModel] Update truncate costs for some narrow vector cases to match their wider version. This updates v4i16->v4i8 with sse2 to match v8i16->v8i8. Update v2i16->v2i8 and v4i16->v4i8 with sse 4.1 to match v8i16->v8i8.	2020-04-27 13:47:48 -07:00
Victor Huang	64d44ae7c2	[PowerPC][Future] Remove "unskipableSimplifyCode()" in PPCMIPeephole.cpp "unskipableSimplifyCode()" was added to handle unsafe BL8_NOTOC instruction when TOC was not completely removed. The function is not needed after confirming TOC pointer is not used in a function that uses PC-Relative addressing. Differential Revision: https://reviews.llvm.org/D78517	2020-04-27 14:57:02 -05:00
Wei Mi	68d2301e12	Recommit "Generate Callee Saved Register (CSR) related cfi directives like .cfi_restore" Insert .cfi_offset/.cfi_register when IncomingCSRSaved of current block is larger than OutgoingCSRSaved of its previous block. Original commit message: https://reviews.llvm.org/D42848 only handled CFA related cfi directives but didn't handle CSR related cfi. The patch adds the CSR part. Basically it reuses the framework created in D42848. For each basicblock, the patch tracks which CSR set have been saved at its CFG predecessors's exits, and compare the CSR set with the set at its previous basicblock's exit (The previous block is the block laid before the current block). If the saved CSR set at its previous basicblock's exit is larger, .cfi_restore will be inserted. The patch also generates proper .cfi_restore in epilogue to make sure the saved CSR set is consistent for the incoming edges of each block. Differential Revision: https://reviews.llvm.org/D74303	2020-04-27 12:46:58 -07:00
Craig Topper	bdbbed115f	[X86][CostModel] Update costs for vector truncate with avx512f/avx512bw. All avx512 truncate instructions except vXi64->vXi32 are 2 uops on port 5. So raise their costs to 2. Except when we have an earlier faster sequence like pshufb for 128 bit input vectors. Add a lower cost of 3 v16i16->v16i8 with avx512f where we can extend to v16i32 then truncate. And a cost of 2 for avx512bw with and without avx512vl. There we can use vpmovwb with either a ymm or zmm input. Both of these beat masking, splitting, and using packuswb which is our avx/avx2 codegen.	2020-04-27 12:00:24 -07:00
Stefan Pintilie	1354a03e74	[PowerPC][Future] Implement PC Relative Tail Calls Tail Calls were initially disabled for PC Relative code because it was not safe to make certain assumptions about the tail calls (namely that all compiled functions no longer used the TOC pointer in R2). However, once all of the TOC pointer references have been removed it is safe to tail call everything that was tail called prior to the PC relative additions as well as a number of new cases. For example, it is now possible to tail call indirect functions as there is no need to save and restore the TOC pointer for indirect functions if the caller is marked as may clobber R2 (st_other=1). For the same reason it is now also possible to tail call functions that are external. Differential Revision: https://reviews.llvm.org/D77788	2020-04-27 12:55:08 -05:00
Craig Topper	5eff75d86a	[X86][CostModel] Improve costs for fp_to_uint/fp_to_sint for vXi8/vXi16/v2i32 results. Differential Revision: https://reviews.llvm.org/D78893	2020-04-27 10:35:15 -07:00
Fangrui Song	3c9c9c1768	[llvm-objdump] Print target address with evaluateMemoryOperandAddress() D63847 added `MCInstrAnalysis::evaluateMemoryOperandAddress()`. This patch leverages the feature to print the target addresses for evaluable instructions. ``` -400a: movl 4080(%rip), %eax +400a: movl 4080(%rip), %eax # 5000 <data1> ``` This patch also deletes `MIA->isCall(Inst) \|\| MIA->isUnconditionalBranch(Inst) \|\| MIA->isConditionalBranch(Inst)` which is used to guard `MCInstrAnalysis::evaluateBranch()` Reviewed By: jhenderson, skan Differential Revision: https://reviews.llvm.org/D78776	2020-04-27 09:43:51 -07:00
Jay Foad	498795829b	[AMDGPU] Remove odd blank line in debug output.	2020-04-27 17:10:36 +01:00
David Green	61b8af0375	[ARM] Allow fma in tail predicated loops There are some intrinsics like this that currently block tail predication, but should be fine. This allows fma through, as the one that I ran into. There may be others that need the same treatment but I've only done this one here. Differential Revision: https://reviews.llvm.org/D78385	2020-04-27 15:32:47 +01:00
Simon Pilgrim	d9e174dbf7	[X86][SSE] getFauxShuffle - account for PEXTW/PEXTB implicit zero-extension The insert(truncate/extend(extract(vec0,c0)),vec1,c1) case in rGacbc5ede99 wasn't combining the 'mineltsize' with the src vector elt size which may be smaller due to implicit extension during extraction. Reduced from test case provided by @mstorsjo	2020-04-27 12:46:50 +01:00
David Green	7a076418dd	[ARM] Replace hasNoSchedulingInfo with UnsupportedFeatures in the A57 schedule hasNoSchedulingInfo should be used for Pseudo's and other instructions that are never expected to be scheduled. This removes the flag from new ARM instructions, instead fixing the A57 schedule by marking the related architecture features as unsupported.	2020-04-27 10:13:29 +01:00
David Green	8807139026	[ARM] Only produce qadd8b under hasV6Ops When compiling for a arm5te cpu from clang, the +dsp attribute is set. This meant we could try and generate qadd8 instructions where we would end up having no pattern. I've changed the condition here to be hasV6Ops && hasDSP, which is what other parts of ARMISelLowering seem to use for similar instructions. Fixed PR45677. Differential Revision: https://reviews.llvm.org/D78877	2020-04-27 10:13:29 +01:00
QingShan Zhang	2957fa0cd1	[NFC][DAGCombine] Adding three helper functions and change the getNegatedExpression to negateExpression This is a NFC patch for D77319. The idea is to hide the getNegatibleCost inside the getNegatedExpression() to have it return null if the cost is expensive, and add some helper function for easy to use. And rename the old getNegatedExpression to negateExpression to avoid the semantic conflict. Reviewed By: RKSimon Differential revision: https://reviews.llvm.org/D78291	2020-04-27 04:11:42 +00:00
Craig Topper	fc02d9f3c6	[X86] Add cost table entry for v2i32->v2f64 fp_to_uint with avx512. We're currently getting this from the default implementation. But I don't like how the cost model came to this answer and I might be making some changes there.	2020-04-26 19:59:01 -07:00
Simon Pilgrim	acbc5ede99	[X86][SSE] getFauxShuffle - support insert(truncate/extend(extract(vec0,c0)),vec1,c1) shuffle patterns at the byte level Followup to the PR45604 fix at rGe71dd7c011a3 where we disabled most of these cases. By creating the shuffle at the byte level we can handle any extension/truncation as long as we track how small the scalar got and assume that the upper bytes will need to be zero.	2020-04-26 15:31:01 +01:00
Simon Pilgrim	33f043cc9f	X86ISelDAGToDAG.cpp - remove unnecessary includes. NFC. The X86 specific headers have to include these so we don't need to duplicate.	2020-04-26 14:50:53 +01:00
Simon Pilgrim	a90d939030	X86MCTargetDesc.h - remove unused DataType.h include. NFC.	2020-04-26 14:50:52 +01:00
Simon Pilgrim	5cc84d095e	X86MCTargetDesc.cpp - remove MSVC intrin.h include. NFC. This was needed when the file called cpuid but that was removed at rL233170.	2020-04-26 14:50:52 +01:00
Simon Pilgrim	fd283ddb9b	X86MacroFusion.h - reduce MachineScheduler.h include. NFC. We only need a ScheduleDAGMutation forward declaration.	2020-04-26 14:50:52 +01:00
Simon Pilgrim	a3982491db	[Pass] Ensure we don't include PassSupport.h or PassAnalysisSupport.h directly Both PassSupport.h and PassAnalysisSupport.h are only supposed to be included via Pass.h. Differential Revision: https://reviews.llvm.org/D78815	2020-04-26 12:58:20 +01:00
Simon Pilgrim	e4196b1cae	X86Operand.h - remove unnecessary includes. NFC.	2020-04-26 12:12:22 +01:00
Simon Pilgrim	43d6f9a876	AMDGPU/Utils - cleanup include and forward declarations. NFC. Remove unused includes + forward declarations. Reduce unnecessary StringRef.h includes to StringRef forward declaration.	2020-04-26 12:12:21 +01:00
Craig Topper	b9de62c2b6	[X86] Fix the cost of v16i1->v16i16 sext/zext on avx targets. Previously we were hitting the scalarization case in the default implementation.	2020-04-25 23:16:20 -07:00
Craig Topper	19cb26f517	[X86][CostModel] Improve costs for vXi1 sign_extend/zero_extend with avx512. With avx512 vXi1 is legal and uses k-registers with many custom cases for extending.	2020-04-25 23:16:20 -07:00
Fangrui Song	2cb48d620f	[TableGen] Drop deprecated leading # operation (NOP) and replace ## with #	2020-04-25 16:26:45 -07:00
Craig Topper	c1cb733db6	[X86] Improve lowering of v16i8->v16i1 truncate under prefer-vector-width=256.	2020-04-25 15:20:33 -07:00
Simon Pilgrim	4425751317	X86ISelLowering.h - remove unnecessary includes. NFC. Fixed implicit MachineFrameInfo.h dependency in X86SelectionDAGInfo.cpp	2020-04-25 20:07:34 +01:00
Sanjay Patel	7f4ff782d4	[x86] use vector instructions to lower even more FP->int->FP casts This is another enhancement to D77895/D78362 to avoid a round-trip from XMM->GPR->XMM. This time we handle the case of starting/ending with different FP types but always with signed i32 as the intermediate value. I think this covers all of the faux vector optimization possibilities for pre-AVX512. There is at least 1 other transform mentioned in PR36617: https://bugs.llvm.org/show_bug.cgi?id=36617#c19 ...where we fold an 'fpext' into a preceding 'sitofp'. I think we will want to handle that earlier (DAGCombiner or instcombine) because that's a target-independent optimization. Differential Revision: https://reviews.llvm.org/D78758	2020-04-25 11:38:54 -04:00
Benjamin Kramer	1d42764df7	Give helpers internal linkage. NFC.	2020-04-25 11:50:52 +02:00
Craig Topper	7664a0d282	[X86] Improve accuracy of cost for v16i64->v16i8 truncate with avx512. The 2 vpmovqds are only 1 uop each.	2020-04-24 19:13:55 -07:00
Craig Topper	e4a9190ad7	[X86][ArgumentPromotion] Allow Argument Promotion if caller and callee disagree on 512-bit vectors support if the arguments are scalar. If one of caller/callee has disabled ZMM registers due to prefer-vector-width=256, we were previously disabling argument promotion as the ABI might be incompatible since one side will split 512-bit vectors in this case. But if we can see that the types are all scalar this shouldn't be a problem. This patch assumes that pointer element type reflects the type that the argument will be promoted to. Differential Revision: https://reviews.llvm.org/D78770	2020-04-24 15:47:02 -07:00
Amara Emerson	dbb0356771	[AArch64][GlobalISel] Fix sub-64b stack parameter passing on Darwin. A previous bug fix for varargs introduced a regression where we would incorrectly widen some stores to memory when passing i8/i16 parameters on the stack. This didn't show up seemingly because it only happens when there is no signext/zeroext parameter attribute, which I think for Darwin clang adds. Swift however seems to be a different story, and a plain anyext on the parameter triggered the bug. To fix this, I've added a new ValueHandler::assignValueToAddress type override which lets us distiguish between varargs and fixed args (we still need this widening behaviour for varargs to fix the original bug in 2018). rdar://61353552	2020-04-24 13:56:43 -07:00
Jean-Michel Gorius	505685a67a	[llvm][CodeGen] Check for memory instructions when querying for alias status Summary: Add a check to make sure that MachineInstr::mayAlias returns prematurely if at least one of its instruction parameters does not access memory. This prevents calls to TargetInstrInfo::areMemAccessesTriviallyDisjoint with incompatible instructions. A side effect of this change is to render the mayAlias helper in the AArch64 load/store optimizer obsolete. We can now directly call the MachineInstr::mayAlias member function. Reviewers: hfinkel, t.p.northover, mcrosier, eli.friedman, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78823	2020-04-24 22:54:46 +02:00
Pengxuan Zheng	79702dd349	[RISCV] Add instruction definition for dret Summary: The instruction dret is used to return from debug mode and is defined in the RISC-V debug mode spec. https://github.com/riscv/riscv-opcodes/blob/master/opcodes-system Reviewers: apazos, asb, lenary, luismarques Reviewed By: apazos Subscribers: jfb, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, sameer.abuasal, evandro, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78583	2020-04-24 13:27:43 -07:00
Matt Arsenault	35e6a9c839	AMDGPU: Break read2/write2 search range on a memory fence This is to fix performance regressions introduced by `86c944d790`. The old search would collect all potentially mergeable instructions in the entire block. In this case, the same address is written in multiple places in the block on the other side of a fence. When sorted by offset, the two unmergeable, identical addresses would be next to each other and the merge would give up. Break the search space when we encounter an instruction we won't be able to merge across. This will keep the identical addresses in different merge attempts. This may also improve compile time by reducing the merge list size.	2020-04-24 15:53:30 -04:00
Vedant Kumar	c0fa447e02	AArch64: Remove reversedInstructionsWithoutDebug helper When using reversedInstructionsWithoutDebug to construct a range from a pair of MachineInstrBundleIterators, the range unexpectedly leaves out an element. This results in mis-optimization as @mstorsjo points out in https://reviews.llvm.org/D78157. The problem is that when we convert a MachineInstrBundleIterator to a reverse iterator, the result gets incremented: MachineInstrBundleIterator(++I.getReverse()) The comment there explains that the "resulting iterator will dereference ... to the previous node, which is somewhat unexpected; but converting the two endpoints in a range will give the same range in reverse". This makes it hard to understand what reversedInstructionsWithoutDebug will do: I've removed the helper to prevent similar mistakes in the future.	2020-04-24 11:28:17 -07:00
Pablo Barrio	d4e7b000b2	[AArch64] Allow PAC mnemonics in the HINT space with PAC disabled Summary: It is important to emit HINT instructions instead of PAC ones when PAC is disabled. This allows compatibility with other assemblers (e.g. GAS). This was implemented in commit `da33762de8`. Still, developers of assembly code will want to write code that is compatible with both pre- and post-PAC CPUs. They could use HINT mnemonics, but the new mnemonics are a lot more readable (e.g. paciaz instead of hint #24), and they will result in the same encodings. So, while LLVM should not emit the new mnemonics when PAC is disabled, this patch will at least make LLVM accept assembly code that uses them. Reviewers: danielkiss, chill, olista01, LukeCheeseman, simon_tatham Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78372	2020-04-24 16:56:51 +01:00
Fangrui Song	25e22613df	[XRay] Change ARM/AArch64/powerpc64le to use version 2 sled (PC-relative address) Follow-up of D78082 (x86-64). This change avoids dynamic relocations in `xray_instr_map` for ARM/AArch64/powerpc64le. MIPS64 cannot use 64-bit PC-relative addresses because R_MIPS_PC64 is not defined. Because MIPS32 shares the same code, for simplicity, we don't use PC-relative addresses for MIPS32 as well. Tested on AArch64 Linux and ppc64le Linux. Reviewed By: ianlevesque Differential Revision: https://reviews.llvm.org/D78590	2020-04-24 08:35:43 -07:00
Simon Pilgrim	82c9eed2cf	MipsTargetStreamer.h - remove unnecessary MipsABIFlagsSection forward declaration. NFC. We need to include MipsABIFlagsSection.h already	2020-04-24 16:21:37 +01:00
Simon Pilgrim	091f7f0103	AMDGPUArgumentUsageInfo.h - cleanup includes and forward declarations. NFC. Reduce Function.h include to (already existing) forward declaration. Remove unused GCNSubtarget/TargetMachine forward declarations.	2020-04-24 16:21:37 +01:00
Luke Geeson	659ca50245	[AArch32] Armv8.6a Matrix Mul Assembly Parsing Support This patch upstreams support for the Armv8.6-a Matrix Multiplication Extension. A summary of the features can be found here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a This patch includes: - Assembly support for AArch32 and Assembly Parsing D77872 has already added the MC representations of the instructions so that they can be used in code gen; this patch fills in the details needed to make assembly parsing work, and adds tests for asm and disasm This is part of a patch series, starting with BFloat16 support and the other components in the armv8.6a extension (in previous patches linked in phabricator) Based on work by: - Luke Geeson - Oliver Stannard - Luke Cheeseman Reviewers: t.p.northover, simon_tatham Reviewed By: simon_tatham Subscribers: simon_tatham, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77874	2020-04-24 15:54:06 +01:00
Luke Geeson	e714683880	[AArch64] Armv8.6-A Mat Mul SVE Assembly This patch upstreams support for the Armv8.6-a Matrix Multiplication Extension. A summary of the features can be found here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a This patch includes: - Assembly support for AArch64 Scalable Vector Instructions (in line with the Scalable Vector Extension - SVE) This is part of a patch series, starting with BFloat16 support and the other components in the armv8.6a extension (in previous patches linked in phabricator) Based on work by: - Luke Geeson - Oliver Stannard - Luke Cheeseman Reviewers: t.p.northover, rengolin, c-rhodes Reviewed By: c-rhodes Subscribers: c-rhodes, ostannard, tschuett, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77873	2020-04-24 15:54:06 +01:00
Luke Geeson	7da1905125	[AArch32] Armv8.6-a Matrix Mult Assembly + Intrinsics This patch upstreams support for the Armv8.6-a Matrix Multiplication Extension. A summary of the features can be found here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a This patch includes: - Assembly support for AArch32 - Intrinsics Support for AArch32 Neon Intrinsics for Matrix Multiplication Note: these extensions are optional in the 8.6a architecture and so have to be enabled by default No additional IR types or C Types are needed for this extension. This is part of a patch series, starting with BFloat16 support and the other components in the armv8.6a extension (in previous patches linked in phabricator) Based on work by: - Luke Geeson - Oliver Stannard - Luke Cheeseman Reviewers: t.p.northover, miyuki Reviewed By: miyuki Subscribers: miyuki, ostannard, kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77872	2020-04-24 15:54:06 +01:00
Luke Geeson	832cd74913	[AArch64] Armv8.6-a Matrix Mult Assembly + Intrinsics This patch upstreams support for the Armv8.6-a Matrix Multiplication Extension. A summary of the features can be found here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a This patch includes: - Assembly support for AArch64 only (no SVE or Neon) - Intrinsics Support for AArch64 Armv8.6a Matrix Multiplication Instructions (No bfloat16 matrix multiplication) No IR types or C Types are needed for this extension. This is part of a patch series, starting with BFloat16 support and the other components in the armv8.6a extension (in previous patches linked in phabricator) Based on work by: - Luke Geeson - Oliver Stannard - Luke Cheeseman Reviewers: ostannard, t.p.northover, rengolin, kmclaughlin Reviewed By: kmclaughlin Subscribers: kmclaughlin, kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77871	2020-04-24 15:54:06 +01:00
Simon Pilgrim	d04059778e	SIRegisterInfo.h - remove unnecessary MachineRegisterInfo forward declaration. NFC. We already need to include MachineRegisterInfo.h	2020-04-24 13:27:57 +01:00
Simon Pilgrim	fd8035cf32	HexagonShuffler.h - remove duplicate STLExtras.h include. NFC.	2020-04-24 13:27:56 +01:00
Piotr Sobczak	7631af3af2	[AMDGPU] Skip generating cache invalidating instructions on AMDPAL Summary: Frontend guarantees that coherent accesses have corresponding cache policy bits set (glc, dlc). Therefore there is no need for extra instructions that invalidate cache. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78800	2020-04-24 13:53:44 +02:00
Kerry McLaughlin	53dd72a87a	[SVE][CodeGen] Lower SDIV & UDIV to SVE intrinsics Summary: This patch maps IR operations for sdiv & udiv to the @llvm.aarch64.sve.[s\|u]div intrinsics. A ptrue must be created during lowering as the div instructions have only a predicated form. Patch contains changes by Andrzej Warzynski. Reviewers: sdesmalen, c-rhodes, efriedma, cameron.mcinally, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, andwar, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78569	2020-04-24 11:38:20 +01:00
Simon Atanasyan	0eec6662f6	[MC][mips] Replace setRType## methods by single setRTypes function. NFC MCELFObjectWriter::setRType## methods are always used altogether to build complete MIPS N64 ABI "chain" of relocations. Using single function for this task makes code less verbose.	2020-04-24 12:13:27 +03:00
Kazushi (Jam) Marukawa	9aa6792729	[VE] Update floating-point arithmetic instructions Summary: Changing all mnemonic to match assembly instructions to simplify mnemonic naming rules. This time update all floating-point arithmetic instructions. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D78768	2020-04-24 11:11:44 +02:00
Christudasan Devadasan	207cd5f68f	[AMDGPU] Add the SGPR used for FP copy to block livein lists. The temporary register used for FP copy should be live throughout the function.	2020-04-24 11:47:38 +05:30
Matt Arsenault	6bffd0df78	AMDGPU: Fix redundant members	2020-04-23 23:14:01 -04:00
Matt Arsenault	50128f8a33	AMDGPU: Use Register	2020-04-23 22:25:36 -04:00
Krzysztof Parzyszek	5c7a2cfac1	[Hexagon] Fix result word order when bitcasting vector pred to int64/128	2020-04-23 19:15:11 -05:00
Pavel Iliin	cc457672e6	[AArch64][FIX] FPR16_lo for f16 indexed patterns.	2020-04-23 23:44:56 +01:00
Christopher Tetreault	18c611ed92	[SVE] Remove calls to isScalable from Hexagon Reviewers: efriedma, sdesmalen, kparzysz, colinl Reviewed By: kparzysz Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77757	2020-04-23 14:02:14 -07:00
Christopher Tetreault	84584b0d29	[SVE] Remove calls to isScalable from AARCH64 Reviewers: efriedma, sdesmalen, t.p.northover, mcrosier Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77758	2020-04-23 13:09:17 -07:00
Matt Arsenault	156afb2253	AMDGPU: Fix inlining logic for denormals This was backwards from intended and missing a test. We perhaps should just ignored the FP mode here, since it shouldn't be legal to mix code with different default modes in the absence of strictfp.	2020-04-23 15:30:48 -04:00
Matt Arsenault	89c8c80bd5	AMDGPU: Change pre-gfx9 implementation of fcanonicalize to mul If f32 denormals were enabled pre-gfx9, we would still try to implement this with v_max_f32. Pre-gfx9, these instructions ignored the denormal mode and did not flush. Switch to the multiply form for f32 as a workaround which should always work in any case. This fixes conformance failures when the library implementation of fmin/fmax were accidentally not inlined, forcing the assumption of no flushing on targets where denormals are not enabled by default. This is a workaround, since really we should not be mixing code with different FP mode expectations, but prefer the lowering that will work in any mode. Now this will always use max to implement canonicalize on gfx9+. This is only really beneficial for f64. For f32/f16 it's a neutral choice (and worse in terms of code size in 1 case), but possibly worse for the compiler since it does add an extra register use operand. Leave this change for later.	2020-04-23 15:24:13 -04:00
Simon Pilgrim	c741dfe325	X86MCTargetDesc.h - replace FormattedStream.h include with forward declaration. NFC.	2020-04-23 17:42:51 +01:00
Simon Pilgrim	90c956318b	X86TargetObjectFile.h - remove unnecessary TargetLoweringObjectFile.h include. NFC. We already include TargetLoweringObjectFileImpl.h which includes it and we only use its types as part of TargetLoweringObjectFile* overridden methods.	2020-04-23 17:42:50 +01:00
Vedant Kumar	210616bd38	Rename a shadowed variable causing build failure on gcc<5.5 See discussion here: https://reviews.llvm.org/D78265	2020-04-23 09:23:44 -07:00
Simon Pilgrim	022ba502c1	[RISCV] Remove unused forward declarations. NFC.	2020-04-23 16:30:45 +01:00
Simon Pilgrim	5387899bb4	[WebAssembly] Remove unused forward declarations. NFC.	2020-04-23 16:30:45 +01:00
Simon Pilgrim	770931b242	[XCore] Remove unused forward declarations. NFC.	2020-04-23 16:30:45 +01:00
Simon Pilgrim	155190567c	[NVPTX] Remove unused forward declarations. NFC.	2020-04-23 16:30:44 +01:00
Simon Pilgrim	33f52ee1d7	[Sparc] Remove unused forward declarations. NFC.	2020-04-23 16:30:44 +01:00
Victor Huang	e20b07b021	[PowerPC][Future] Add missing changes for PC Realtive addressing 1. Use Subtarget.isUsingPCRelativeCalls() in LowerConstantPool to check if using PCRelative addressing. 2. Change MO_GOT_FLAG = 32 to MO_GOT_FLAG = 8 in PPC.h to use consecutive bits. Differential Revision: https://reviews.llvm.org/D78406	2020-04-23 10:26:43 -05:00
Simon Pilgrim	d8a4a99161	[PowerPC] Remove unused forward declarations. NFC.	2020-04-23 15:02:18 +01:00
Simon Pilgrim	db56a6aaf8	[Mips] Remove unused forward declarations. NFC.	2020-04-23 15:02:18 +01:00
Simon Pilgrim	82583b17ce	LanaiMCTargetDesc.h - remove unused forward declarations. NFC.	2020-04-23 15:02:18 +01:00
Simon Pilgrim	0f1a2ad440	[MSP430] Remove unused forward declarations. NFC.	2020-04-23 15:02:17 +01:00
Jay Foad	cca6bc42d9	[AMDGPU] Use RegClass helper functions in getRegForInlineAsmConstraint. This avoids more long lists of register classes that have to be updated every time we add a new one. NFC. Differential Revision: https://reviews.llvm.org/D78570	2020-04-23 12:26:52 +01:00
Jay Foad	0337017a9f	[AMDGPU] Use SGPR instead of SReg classes `12994a70cf` did this for 128-bit classes: SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. This patch extends it to all classes > 64 bits, for consistency. Differential Revision: https://reviews.llvm.org/D78622	2020-04-23 11:45:22 +01:00
Sander de Smalen	a5e0389b2a	[AArch64] Define ACLE FP conversion intrinsics with more specific predicate. This patch changes the FP conversion intrinsics to take a predicate that matches the number of lanes for the vector with the widest element type as opposed to using <vscale x 16 x i1>. For example: ```<vscale x 4 x float> @llvm.aarch64.sve.fcvt.f32f16(<vscale x 4 x float>, <vscale x 4 x i1>, <vscale x 8 x half>)``` now uses <vscale x 4 x i1> instead of <vscale x 16 x i1> And similar for: ```<vscale x 4 x float> @llvm.aarch64.sve.fcvt.f32f64(<vscale x 4 x float>, <vscale x 2 x i1>, <vscale x 2 x double>)``` where the predicate now matches the wider type, so <vscale x 2 x i1>. Reviewers: efriedma, SjoerdMeijer, paulwalker-arm, rengolin Reviewed By: efriedma Tags: #clang Differential Revision: https://reviews.llvm.org/D78402	2020-04-23 10:53:23 +01:00
Amara Emerson	613f12dd8e	[AArch64][GlobalISel] Set the current debug loc when missing in some cases.	2020-04-23 01:34:57 -07:00
Kazuaki Ishizaki	0312b9f550	[llvm] NFC: Fix trivial typo in rst and td files Differential Revision: https://reviews.llvm.org/D77469	2020-04-23 14:26:32 +09:00
Vedant Kumar	e0b60c6df2	[AArch64CollectLOH] Debug insts should not break LOH collection [14/14] Fix an issue where the presence of debug instructions could break collection of linker optimization hints.	2020-04-22 17:03:41 -07:00
Vedant Kumar	ff8c417d31	[AArch64PreLegalizerCombiner] Fix debug invariance issue in matchFConstantToConstant [13/14] Fix an issue where the FConstantToConstant combine could fail if debug instructions were present.	2020-04-22 17:03:41 -07:00
Vedant Kumar	c2c2dc526a	[AArch64LoadStoreOptimizer] Skip debug insts during pattern matching [12/14] Do not count the presence of debug insts against the limit set by LdStLimit, and allow the optimizer to find matching insts by skipping over debug insts. Differential Revision: https://reviews.llvm.org/D78411	2020-04-22 17:03:40 -07:00
Vedant Kumar	bf4c70b355	[AArch64ConditionOptimizer] Fix missed optimization due to debug insts [11/14] Summary: The findSuitableCompare method can fail if debug instructions are present in the MBB -- fix this by using helpers to skip over debug insts. Reviewers: aemerson, paquette Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78265	2020-04-22 17:03:40 -07:00
Vedant Kumar	78d69e97cc	[AArch64CondBrTuning] Ignore debug insts when scanning for NZCV clobbers [10/14] Summary: This fixes several instances in which condbr optimization was missed due to a debug instruction appearing as a bogus NZCV clobber. Reviewers: aemerson, paquette Subscribers: kristof.beyls, hiraditya, jfb, danielkiss, aprantl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78264	2020-04-22 17:03:40 -07:00
Vedant Kumar	4a51b61cb3	[AArch64] Clean up assorted usage of hasOneUse/use_instructions [9/14] Summary: Use the variants of these APIs which skip over debug instructions. This is mostly a cleanup, but it does fix a debug-variance issue which causes addsub-shifted.ll and addsub_ext.ll to fail when debug info is inserted by -mir-debugify. Reviewers: aemerson, paquette Subscribers: kristof.beyls, hiraditya, jfb, danielkiss, llvm-commits, aprantl Tags: #llvm Differential Revision: https://reviews.llvm.org/D78262	2020-04-22 17:03:40 -07:00
Vedant Kumar	b157974ab3	[AArch64ConditionalCompares] Ignore debug insts in findConvertibleCompare [8/14] Summary: Fix an issue where the presence of debug info could disable the ccmp optimization due to findConvertibleCompare failing too early (the error is "Can't create ccmp with multiple uses", where the "use" is a DBG_VALUE inst). Depends on D78151. Reviewers: t.p.northover, paquette, aemerson Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78156	2020-04-22 17:03:40 -07:00
Vedant Kumar	f0b52beef3	[AArch64InstrInfo] Ignore debug insts in areCFlagsAccessedBetweenInstrs [7/14] Summary: Fix an issue where the presence of debug info could disable a peephole optimization due to areCFlagsAccessedBetweenInstrs returning the wrong result. In test/CodeGen/AArch64/arm64-csel.ll, the issue was found in the function @foo5, in which the first compare could successfully be optimized but not the second. Reviewers: t.p.northover, eastig, paquette Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, dsanders, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78157	2020-04-22 17:03:40 -07:00
Vedant Kumar	26271c8384	[AArch64InstrInfo] Ignore debug insts in canInstrSubstituteCmpInstr [6/14] Summary: Fix an issue where the presence of debug info could disable a peephole optimization in optimizeCompareInstr due to canInstrSubstituteCmpInstr returning the wrong result. Depends on D78137. Reviewers: t.p.northover, eastig, paquette Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, llvm-commits, dsanders Tags: #llvm Differential Revision: https://reviews.llvm.org/D78151	2020-04-22 17:03:40 -07:00
Vedant Kumar	10ce1bc8d0	[MachineBasicBlock] Add helpers for skipping debug instructions [1/14] Summary: These helpers are exercised by follow-up commits in this patch series, which is all about removing CodeGen differences with vs. without debug info in the AArch64 backend. Reviewers: fhahn, aprantl, jpaquette, paquette Subscribers: kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78260	2020-04-22 17:03:39 -07:00
Eli Friedman	1a78b0bd38	[MachineOutliner] Teach outliner to set live-ins Preserving liveness can be useful even late in the pipeline, if we're doing substantial optimization work afterwards. (See, for example, D76065.) Teach MachineOutliner how to correctly set live-ins on the basic block in outlined functions. Differential Revision: https://reviews.llvm.org/D78605	2020-04-22 14:19:26 -07:00
Victor Huang	a60ca4b4e9	[PowerPC][Future] Initial support for PCRel addressing to get block address Add initial support for PCRelative addressing to get block address instead of using TOC. Differential Revision: https://reviews.llvm.org/D76294	2020-04-22 15:01:29 -05:00
Simon Pilgrim	dc869d5aad	[Lanai] Remove unused forward declarations. NFC.	2020-04-22 18:26:50 +01:00
Simon Pilgrim	f8a5e746c6	[Hexagon] Remove unused forward declarations. NFC.	2020-04-22 18:26:50 +01:00
Simon Pilgrim	1b154ec0d0	[AVR] Remove unused forward declarations. NFC.	2020-04-22 18:26:50 +01:00
Simon Pilgrim	fa6b68a404	BPFMCTargetDesc.h - remove unused raw_ostream forward declaration. NFC.	2020-04-22 18:26:50 +01:00
Victor Huang	02141a17ae	[PowerPC][Future] Remove redundant r2 save and restore for indirect call Currently an indirect call produces the following sequence on PCRelative mode: extern void function( ); extern void (ptrfunc) ( ); void g() { ptrfunc=function; } void f() { (ptrfunc) ( ); } Producing paddi 3, 0, .LC0@PCREL, 1 ld 3, 0(3) std 2, 24(1) ld 12, 0(3) mtctr 12 bctrl ld 2, 24(1) Though the caller does not use or preserve r2, it is still saved and restored across a function call. This patch is added to remove these redundant save and restores for indirect calls. Differential Revision: https://reviews.llvm.org/D77749	2020-04-22 12:05:51 -05:00
Benjamin Kramer	4b33c935db	[Hexagon] Silence warning llvm/lib/Target/Hexagon/HexagonTargetObjectFile.cpp:296:11: warning: enumeration value 'ScalableVectorTyID' not handled in switch [-Wswitch] switch (Ty->getTypeID()) { ^	2020-04-22 18:57:08 +02:00
Christopher Tetreault	2dea3f1298	[SVE] Add new VectorType subclasses Summary: Introduce new types for fixed width and scalable vectors. Does not remove getNumElements yet so as to not break code during transition period. Reviewers: deadalnix, efriedma, sdesmalen, craig.topper, huntergr Reviewed By: sdesmalen Subscribers: jholewinski, arsenm, jvesely, nhaehnle, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, kerbowa, Joonsoo, grosul1, frgossen, lldb-commits, tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm, #lldb Differential Revision: https://reviews.llvm.org/D77587	2020-04-22 08:59:01 -07:00
Mark Murray	3df8135286	[ARM][MC][Thumb] Recommit: Revert relocation for some pc-relative fixups. Summary: This commit recommits the reversion of https://reviews.llvm.org/D75039. Concensus appears to be in favour of assembly-time resolution of these ADR and LDR relocations, in line with GNU. The previous backout broke many lld tests, now fixed by Peter Smith in `61bccda9d9`. Reviewers: psmith Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78301	2020-04-22 16:54:26 +01:00
Victor Huang	43abef06f4	[PowerPC][Future] Initial support for PCRel addressing for jump tables. Add initial support for PC Relative addressing to get jump table base address instead of using TOC. Differential Revision: https://reviews.llvm.org/D75931	2020-04-22 10:45:01 -05:00
Haojian Wu	f33e86df3a	Fix -Wunused-variable error.	2020-04-22 17:17:41 +02:00
Simon Pilgrim	54b3f91d20	[BPF] Remove unused forward declarations. NFC.	2020-04-22 15:07:18 +01:00
John Brawn	8211cfb7c8	[ARM] Don't shrink STM if it would cause an unknown base register store If a 16-bit thumb STM with writeback stores the base register but it isn't the first register in the list, then an unknown value is stored. The load/store optimizer knows this and generates a 32-bit STM without writeback instead, but thumb2 size reduction converts it into a 16-bit STM. Fix this by having thumb2 size reduction notice such STMs and leave them as they are. Differential Revision: https://reviews.llvm.org/D78493	2020-04-22 14:50:42 +01:00
David Green	892af45c86	[ARM] Distribute MVE post-increments This adds some extra processing into the Pre-RA ARM load/store optimizer to detect and merge MVE loads/stores and adds of the same base. This we don't always turn into a post-inc during ISel, and due to the nature of it being a graph we don't always know an order to use for the nodes, not knowing which nodes to make post-inc and which to use the new post-inc of. After ISel, we have an order that we can use to post-inc the following instructions. So this looks for a loads/store with a starting offset of 0, and an add/sub from the same base, plus a number of other loads/stores. We then do some checks and convert the zero offset load/store into a postinc variant. Any loads/stores after it have the offset subtracted from their immediates. For example: LDR #4 LDR #4 LDR #0 LDR_POSTINC #16 LDR #8 LDR #-8 LDR #12 LDR #-4 ADD #16 It only handles MVE loads/stores at the moment. Normal loads/store will be added in a followup patch, they just have some extra details to ensure that we keep generating LDRD/LDM successfully. Differential Revision: https://reviews.llvm.org/D77813	2020-04-22 14:16:51 +01:00
Pavel Iliin	4eca1c06a4	[AArch64][FIX] f16 indexed patterns encoding restrictions.	2020-04-22 14:11:28 +01:00
Simon Pilgrim	09ba6f9e69	X86TargetMachine.h - remove unused X86RegisterBankInfo forward declaration. NFC.	2020-04-22 14:01:51 +01:00
Jay Foad	dbdffe3ee9	[AMDGPU] Add 192-bit register classes Differential Revision: https://reviews.llvm.org/D78312	2020-04-22 13:10:37 +01:00
Jay Foad	d625b4b081	[AMDGPU] Add missing AReg classes Add 96-bit, 160-bit and 256-bit AReg classes to match VReg and SReg. NFC as far as I know, but it may avoid weird legalization problems. Differential Revision: https://reviews.llvm.org/D78348	2020-04-22 13:10:37 +01:00
Kerry McLaughlin	17f6e18acf	[AArch64][SVE] Add SVE intrinsic for LD1RQ Summary: Adds the following intrinsic for contiguous load & replicate: - @llvm.aarch64.sve.ld1rq The LD1RQ intrinsic only needs the SImmS16XForm added by this patch. The others (SImmS2XForm, SImmS3XForm & SImmS4XForm) were added for consistency. Reviewers: andwar, sdesmalen, efriedma, cameron.mcinally, dancgr, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76929	2020-04-22 11:29:27 +01:00
Sjoerd Meijer	0736d1ccf3	[ARM][MVE] Tail-predication: some more comments and debug messages. NFC. Finding the loop tripcount is the first crucial step in preparing a loop for tail-predication, and this adds a debug message if a tripcount cannot be found. And while I was at it, I added some more comments here and there. Differential Revision: https://reviews.llvm.org/D78485	2020-04-22 10:34:23 +01:00
Jay Foad	7318625674	[AMDGPU] Remove obsolete special case for 1024-bit vector types. NFC.	2020-04-22 09:05:24 +01:00
Jay Foad	2fa17cdd7a	[AMDGPU] Simplify definition of VReg and AReg classes. NFC. Differential Revision: https://reviews.llvm.org/D78553	2020-04-22 08:59:28 +01:00
Kazushi (Jam) Marukawa	a6ef471919	[VE] Update shift operation instructions Summary: Changing all mnemonic to match assembly instructions to simplify mnemonic naming rules. This time update all shift operation instructions. This also corrects instruction's operation kinds. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D78468	2020-04-22 09:10:10 +02:00
Kazushi (Jam) Marukawa	ba4162c1c4	[VE] Add alternative names to registers Summary: VE uses identical names "%s0-63" to all generic registers. Change to use alternative name mechanism among all generic registers instead of hard- coding them. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D78174	2020-04-22 09:07:42 +02:00
Craig Topper	d22989c34e	[CallSite removal][Target] Replace CallSite with CallBase. NFC In some cases just delete an unneeded include.	2020-04-21 23:29:36 -07:00
Qiu Chaofan	c12722cde8	[PowerPC] Exploit RLDIMI for OR with large immediates This patch exploits rldimi instruction for patterns like `or %a, 0b000011110000`, which saves number of instructions when the operand has only one use, compared with `li-ori-sldi-or`. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D77850	2020-04-22 14:16:52 +08:00
Craig Topper	daadb48553	[CallSite removal][TargetTransformInfoImpl] Replace CallSite with CallBase. NFC	2020-04-21 22:49:30 -07:00
Matt Arsenault	7dece2fde3	AMDGPU: Use Register	2020-04-21 15:19:35 -04:00

... 2 3 4 5 6 ...

57472 Commits