llvm-project

Commit Graph

Author	SHA1	Message	Date
David Tenty	1ea1e053f6	[AIX] Make sure to use QualNames for external global objects Summary: Previously we only handled the case where the csect hadn't been set up yet, so we'd hit an assert later on. Reviewers: jasonliu, DiggerLin, stevewan Reviewed By: jasonliu Subscribers: hubert.reinterpretcast, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71032	2019-12-05 15:22:53 -05:00
Craig Topper	f688570d5c	[X86] Remove ProcIntelGLM/ProcIntelGLP/ProcIntelTRM and replace them with a single feature flag covers the two places they were used. Differential Revision: https://reviews.llvm.org/D71048	2019-12-05 10:58:57 -08:00
Sanne Wouda	e503fee904	[AArch64] Fix MUL/SUB fusing Summary: When MUL is the first operand to SUB, we can't use MLS because the accumulator should be negated. Emit a NEG of the accumulator and an MLA instead, similar to what we do for FMUL / FSUB fusing. Reviewers: dmgreen, SjoerdMeijer, fhahn, Gerolf, mstorsjo, asbirlea Reviewed By: asbirlea Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71067	2019-12-05 18:10:06 +00:00
Danilo Carvalho Grael	b29916cec3	[AArch64][SVE] Integer reduction instructions pattern/intrinsics. Added pattern matching/intrinsics for the following SVE instructions: -- saddv, uaddv -- smaxv, sminv, umaxv, uminv -- orv, eorv, andv	2019-12-05 09:59:19 -05:00
Cullen Rhodes	f0355bc4d1	[AArch64][SVE] Implement element count intrinsics Summary: Adds intrinsics for the following: * cntb * cnth * cntw * cntd * cntp Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma, rovka Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70967	2019-12-05 10:26:49 +00:00
Florian Hahn	76a5c8421e	[MCRegInfo] Add forward sub and super register iterators. (NFC) This patch adds forward iterators mc_difflist_iterator, mc_subreg_iterator and mc_superreg_iterator, based on the existing DiffListIterator. Those are used to provide iterator ranges over sub- and super-register from TRI, which are slightly more convenient than the existing MCSubRegIterator/MCSuperRegIterator. Unfortunately, it duplicates a bit of functionality, but the new iterators are a bit more convenient (and can be used with various existing iterator utilities) and should probably replace the old iterators in the future. This patch updates some existing users. Reviewers: evandro, qcolombet, paquette, MatzeB, arsenm Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D70565	2019-12-05 09:29:26 +00:00
Shengchen Kan	f3dafd21a3	Fix the macro fusion table for X86 according to Intel optimization manual and add function isMacroFused Differential Revision: https://reviews.llvm.org/D70999	2019-12-05 14:39:11 +08:00
Danilo Carvalho Grael	53b95a3cb6	[AArch64][SVE] Add intrinsics and patterns for logical predicate instructions Add instrinics and patters for the following logical predicate instructions: -- and, ands, bic, bics, eor, eors -- sel -- orr, orrs, orn, orns, nor, nors, nand, nads	2019-12-04 23:11:46 -05:00
Craig Topper	3d43c73f26	[X86] Remove override of shouldUseStrictFP_TO_INT for fp80. NFC I suspect this became unnecessary after r354161. Prior to that we may have been going through the default expansion of FP_TO_UINT on 64-bit targets and then ending up back in Custom X86 handling to handle the FP_TO_SINT for it. Now we just Custom handle the FP_TO_UINT directly. We already need to handle it for 32-bit mode during type legalization so we wouldn't save any code by using the default expansion on 64-bit.	2019-12-04 17:58:10 -08:00
David Tellenbach	cec2d5c174	Reland [AArch64][MachineOutliner] Return address signing for outlined functions Summary: Reland after fixing an ASan failure by stopping outlining early if the constraints for return address signing removed too many outlining candidates. During AArch64 frame lowering instructions to enable return address signing are inserted into functions if needed. Functions generated during machine outlining don't run through target frame lowering and hence are missing such instructions. This patch introduces the following changes: 1. If not all functions that potentially participate in function outlining agree on their return address signing scope and their return address signing key, outlining is disabled for these functions. 2. If not all functions that potentially participate in function outlining agree on their support for v8.3A features, outlining is disabled for these functions. 3. If an outlining candidate would outline instructions that modify sp in a way that invalidates return address signing, outlining is disabled for that particular candidate. 4. If all candidate functions agree on the signing scope, signing key and their support for v8.3 features, the outlined function behaves as if it had the same scope and key attributes and as if it would provide the same v8.3A support as the original functions. Reviewers: ostannard, paquette Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70635	2019-12-05 02:20:59 +01:00
Sterling Augustine	f65267ee16	Revert "Reland [AArch64][MachineOutliner] Return address signing for outlined functions" This reverts commit `02760b750b`. The original commit is not asan clean. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/37147/steps/check-llvm%20asan/logs/stdio	2019-12-04 16:31:10 -08:00
Craig Topper	f730ac719d	[X86] Add missing break to the end of the last case in a switch. NFC	2019-12-04 12:34:10 -08:00
Amy Huang	9e978bb01c	Add support for lowering 32-bit/64-bit pointers Summary: This follows a previous patch that changes the X86 datalayout to represent mixed size pointers (32-bit sext, 32-bit zext, and 64-bit) with address spaces (https://reviews.llvm.org/D64931) This patch implements the address space cast lowering to the corresponding sign extension, zero extension, or truncate instructions. Related to https://bugs.llvm.org/show_bug.cgi?id=42359 Reviewers: rnk, craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69639	2019-12-04 11:39:03 -08:00
David Tellenbach	02760b750b	Reland [AArch64][MachineOutliner] Return address signing for outlined functions Summary: Reland after fixing a bug that allowed outlining of SP modifying instructions that invalidated return address signing. During AArch64 frame lowering instructions to enable return address signing are inserted into functions if needed. Functions generated during machine outlining don't run through target frame lowering and hence are missing such instructions. This patch introduces the following changes: 1. If not all functions that potentially participate in function outlining agree on their return address signing scope and their return address signing key, outlining is disabled for these functions. 2. If not all functions that potentially participate in function outlining agree on their support for v8.3A features, outlining is disabled for these functions. 3. If an outlining candidate would outline instructions that modify sp in a way that invalidates return address signing, outlining is disabled for that particular candidate. 4. If all candidate functions agree on the signing scope, signing key and their support for v8.3 features, the outlined function behaves as if it had the same scope and key attributes and as if it would provide the same v8.3A support as the original functions. Reviewers: ostannard, paquette Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70635	2019-12-04 19:39:52 +01:00
Mikhail Gudim	61e54fd60c	[SVE][AArch64] Adding patterns for while intrinsics.	2019-12-04 12:33:50 -05:00
jasonliu	5422e81a89	[XCOFF][AIX] Emit TOC entries for object file generation Summary: Implement emitTCEntry for PPCTargetXCOFFStreamer. Add TC csects to TOCCsects for object file writing. Note: 1. I did not include any raw data testing for this object file generation because TC entries raw data will all be 0 without relocation implemented. I will add raw data testing as part of relocation testing later. 2. I removed "Symbol->setFragment(F);" for common symbols because we don't need it, and if we have it then we would hit assertions below: Assertion `(SymbolContents == SymContentsUnset \|\| SymbolContents == SymContentsOffset) && "Cannot get offset for a common/variable symbol"' failed. 3.Fixed incorrect TOC-base alignment. Differential Revision: https://reviews.llvm.org/D70798	2019-12-04 16:44:44 +00:00
Mark Murray	d3f62ceac0	[ARM][MVE][Intrinsics] Add VMULH/VRMULH intrinsics. Summary: Add MVE VMULH/VRMULH intrinsics and unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70948	2019-12-04 14:27:12 +00:00
Cullen Rhodes	201d91daad	[AArch64][SVE] Implement reversal intrinsics Summary: Adds intrinsics for the following: * rbit * revb * revh * revw Patterns are also defined to map the 'llvm.bswap.*' intrinsic to the SVE revb instruction. Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma, rovka Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70960	2019-12-04 12:01:57 +00:00
Jay Foad	7847986ceb	[AMDGPU][MC] Remove duplicate code introduced in r359316.	2019-12-04 11:44:12 +00:00
Alex Richardson	39b534da18	Allow negative offsets in MipsMCInstLower::LowerOperand Summary: We rely on this in our CHERI backend to address the GOT by generating a $pc-relative addresses. For this we emit the following code sequence: lui $1, %pcrel_hi(_CHERI_CAPABILITY_TABLE_-8) daddiu $1, $1, %pcrel_lo(_CHERI_CAPABILITY_TABLE_-4) cgetpccincoffset $c1, $1 However, without this change the addend is implicitly converted to UINT32_MAX and an invalid pointer value is generated. Reviewers: atanasyan Reviewed By: atanasyan Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70953	2019-12-04 11:30:00 +00:00
Alex Richardson	b5f69e234e	Handle BUNDLE instructions in MipsAsmPrinter Summary: In our CHERI fork we use BUNDLE instructions to ensure that a three-instruction sequence to generate a program-counter-relative value is emitted without reordering or insertions (since that would break the 32-bit offset computation). Currently MipsAsmPrinter asserts when it encounters a pseudo instruction. To handle BUNDLE we can simply skip the instruction which will then make EmitInstruction() process the contents of the bundle in order. Reviewers: atanasyan Reviewed By: atanasyan Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70945	2019-12-04 11:30:00 +00:00
Alex Richardson	b91f239485	MipsDelaySlotFiller: Don't move BUNDLE instructions into the delay slot Summary: In our CHERI fork we use BUNDLE instructions to ensure that a three-instruction sequence to generate a program-counter-relative value is emitted without reordering or insertions (since that would break the 32-bit offset computation). This sequence is created in MipsExpandPseudo and we use finalizeBundle() to create the BUNDLE instruction. However, the delay slot filler currently breaks this pattern since the BUNDLE will be removed and so all instructions are moved into the delay slot. Since the delay slot only executes the first instruction, this results in incorrect computations (and run-time crashes) if the branch is taken. The original test cases uses CHERI instructions, so for the test case here I simple filled a BUNDLE with a no-op DADDiu $sp_64, -16 and DADDiu $sp_64, 16. Reviewers: atanasyan Reviewed By: atanasyan Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70944	2019-12-04 11:30:00 +00:00
Alex Richardson	0cc4b95985	Add debug output to MipsDelaySlotFiller pass Summary: I was tracking down a code-generation bug in this pass and found that the added output was useful. It is also helpful to find out why a delay slot could not be filled even though there is clearly a valid instruction (which appears to mostly be caused by CFI instructions). Reviewers: atanasyan Reviewed By: atanasyan Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70940	2019-12-04 11:30:00 +00:00
Florian Hahn	93c8235702	[AArch64TTI] Compute imm materialization cost for AArch64 intrinsics Currently, getIntImmCost returns TCC_Free for almost all intrinsics. For most AArch64 specific intrinsics however, it looks like integer constants cannot be folded into most of them (at least the ones I checked). Unless we know that we can fold integer operands with the intrinsic, we handle more cases correctly by returning the cost to materialize the immediate than return TCC_Free. Reviewers: SjoerdMeijer, dmgreen, t.p.northover, ributzka Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D70669	2019-12-04 11:09:03 +00:00
David Stuttard	46db606834	AMDGPU: Avoid folding 2 constant operands into an SALU operation Summary: Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2 constant operands into the same SALU operation, with neither operand able to be encoded as an inline constant. Change-Id: Ibc48d662c9ffd8bbacd154976b0b1c257ace0927 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70896	2019-12-04 10:25:34 +00:00
czhengsz	f0ba1aec35	[PowerPC] folding rlwinm + rlwinm to rlwinm For example: x3 = rlwinm x3, 27, 5, 31 x3 = rlwinm x3, 19, 0, 12 can be combined to x3 = rlwinm x3, 14, 0, 12 Reviewed by: steven.zhang, lkail Differential Revision: https://reviews.llvm.org/D70374	2019-12-03 21:51:19 -05:00
Wang, Pengfei	c8995de069	[X86] Model DAZ and FTZ Summary: This is a follow-up of D70881. It models DAZ and FTZ for releated instructions. Reviewers: craig.topper, RKSimon, andrew.w.kaylor Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70938	2019-12-04 08:22:45 +08:00
Wang, Pengfei	c1c673303d	[X86] Model MXCSR for all AVX512 instructions Summary: Model MXCSR for all AVX512 instructions Reviewers: craig.topper, RKSimon, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70881	2019-12-04 08:07:38 +08:00
James Clarke	da7b129b1b	[RISCV] Don't force Local Exec TLS for non-PIC Summary: Forcing Local Exec TLS requires the use of copy relocations. Copy relocations need special handling in the runtime linker when being used against TLS symbols, which is present in glibc, but not in FreeBSD nor musl, and so cannot be relied upon. Moreover, copy relocations are a hack that embed the size of an object in the ABI when it otherwise wouldn't be, and break protected symbols (which are expected to be DSO local), whilst also wasting space, thus they should be avoided whenever possible. As discussed in D70398, RISC-V should move away from forcing Local Exec, and instead use Initial Exec like other targets, with possible linker relaxation to follow. The RISC-V GCC maintainers also intend to adopt this more-conventional behaviour (see https://github.com/riscv/riscv-elf-psabi-doc/issues/122). Reviewers: asb, MaskRay Reviewed By: MaskRay Subscribers: emaste, krytarowski, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, llvm-commits, bsdjhb Tags: #llvm Differential Revision: https://reviews.llvm.org/D70649	2019-12-03 22:04:54 +00:00
Roman Lebedev	9a20c79ddc	[NFC][KnownBits] Add getMinValue() / getMaxValue() methods As it can be seen from accompanying cleanup, it is not unheard of to write `~Known.Zero` meaning "what maximal value can this KnownBits produce". But i think `~Known.Zero` isn't that self-explanatory, as compared to a method with a name. Note that not all `~Known.Zero` places were cleaned up, only those where this arguably improves things.	2019-12-03 20:04:51 +03:00
Sanne Wouda	f2e7de81c6	[AArch64] Fix over-eager fusing of NEON SIMD MUL/ADD Summary: The ISel pattern for SIMD MLA is a bit too eager: it replaces the ADD with an MLA even when the MUL cannot be eliminated, e.g. when it has another use. An MLA is usually has a higher latency than an ADD (and there are fewer pipes available that can execute it), so trading an MLA for an ADD is not great. ISel is not taking the number of uses of the MUL result into account, nor any other factors such as the length of the critical path or other resource pressure. The MachineCombiner is able to make these judgments so this patch ports the ISel pattern for MUL/ADD fusing to the MachineCombiner. Similarly for MUL/SUB -> MLS, as well as the indexed variants. The change has no impact on SPEC CPU© intrate nor fprate. Reviewers: dmgreen, SjoerdMeijer, fhahn, Gerolf Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70673	2019-12-03 15:48:37 +00:00
Sander de Smalen	79f2422d6a	[Aarch64][SVE] Add intrinsics for gather loads (vector + imm) This patch adds intrinsics for SVE gather loads from memory addresses generated by a vector base plus immediate index: * @llvm.aarch64.sve.ld1.gather.imm This intrinsics maps 1-1 to the corresponding SVE instruction (example for half-words): * ld1h { z0.d }, p0/z, [z0.d, #16] Committed on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, huntergr, kmclaughlin, eli.friedman, rengolin, rovka, dancgr, mgudim, efriedma Reviewed By: sdesmalen Tags: #llvm Differential Revision: https://reviews.llvm.org/D70806	2019-12-03 15:19:16 +00:00
Sander de Smalen	8bf31e28d7	[Aarch64][SVE] Add intrinsics for gather loads with 32-bits offsets This patch adds intrinsics for SVE gather loads for which the offsets are 32-bits wide and are: * unscaled * @llvm.aarch64.sve.ld1.gather.sxtw * @llvm.aarch64.sve.ld1.gather.uxtw * scaled (offsets become indices) * @llvm.arch64.sve.ld1.gather.sxtw.index * @llvm.arch64.sve.ld1.gather.uxtw.index The offsets are either zero (uxtw) or sign (sxtw) extended to 64 bits. These intrinsics map 1-1 to the corresponding SVE instructions (examples for half-words): * unscaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw] * scaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw #1] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw #1] Committed on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, kmclaughlin, eli.friedman, rengolin, rovka, huntergr, dancgr, mgudim, efriedma Reviewed By: sdesmalen Tags: #llvm Differential Revision: https://reviews.llvm.org/D70782	2019-12-03 14:48:29 +00:00
Kerry McLaughlin	8881ac9c39	[AArch64][SVE2] Implement remaining SVE2 floating-point intrinsics Summary: Adds the following intrinsics: - faddp - fmaxp, fminp, fmaxnmp & fminnmp - fmlalb, fmlalt, fmlslb & fmlslt - flogb Reviewers: huntergr, sdesmalen, dancgr, efriedma Reviewed By: sdesmalen Subscribers: efriedma, tschuett, kristof.beyls, hiraditya, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70253	2019-12-03 13:29:41 +00:00
Sander de Smalen	6e51ceba53	[AArch64][SVE] Add intrinsics for gather loads with 64-bit offsets This patch adds the following intrinsics for gather loads with 64-bit offsets: * @llvm.aarch64.sve.ld1.gather (unscaled offset) * @llvm.aarch64.sve.ld1.gather.index (scaled offset) These intrinsics map 1-1 to the following AArch64 instructions respectively (examples for half-words): * ld1h { z0.d }, p0/z, [x0, z0.d] * ld1h { z0.d }, p0/z, [x0, z0.d, lsl #1] Committing on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, huntergr, rovka, mgudim, dancgr, rengolin, efriedma Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D70542	2019-12-03 12:55:03 +00:00
Kerry McLaughlin	7483eb656f	[AArch64][SVE] Implement shift intrinsics Summary: Adds the following intrinsics: - asr & asrd - insr - lsl & lsr This patch also adds a new AArch64ISD node (INSR) to represent the int_aarch64_sve_insr intrinsic. Reviewers: huntergr, sdesmalen, dancgr, mgudim, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70437	2019-12-03 11:47:12 +00:00
Sam Parker	bc76dadb3c	[CodeGen] Move ARMCodegenPrepare to TypePromotion Convert ARMCodeGenPrepare into a generic type promotion pass by: - Removing the insertion of arm specific intrinsics to handle narrow types as we weren't using this. - Removing ARMSubtarget references. - Now query a generic TLI object to know which types should be promoted and what they should be promoted to. - Move all codegen tests into Transforms folder and testing using opt and not llc, which is how they should have been written in the first place... The pass searches up from icmp operands in an attempt to safely promote types so we can avoid generating unnecessary unsigned extends during DAG ISel. Differential Revision: https://reviews.llvm.org/D69556	2019-12-03 11:12:52 +00:00
QingShan Zhang	4cde2d6b8d	[NFC][PowerPC] Add the inheritable and additional features to make the processor definition more clear The old processor design assume that, all the old processor's feature must be inherited into future processor. That is not true as instruction fusion or some implementation defined features are not inheritable. What this patch did: * Rename the old "specific features" to "additional features" that keep the new added inheritable features. * Use the "specific features" to keep those features only for specific processor. * Add the "inheritable features" to keep all the features that inherited from early processor. Differential Revision: https://reviews.llvm.org/D70768	2019-12-03 06:32:46 +00:00
Wang, Pengfei	cf81714a7e	[X86] Model MXCSR for AVX instructions other than AVX512 Summary: Model MXCSR for AVX instructions other than AVX512 Reviewers: craig.topper, RKSimon Subscribers: hiraditya, llvm-commits, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70875	2019-12-03 08:53:47 +08:00
Florian Hahn	5154b0253d	[MIBundles] Move analyzePhysReg out of MIBundleOperands iterator (NFC). analyzePhysReg does not really fit into the iterator and moving it makes it easier to change the base iterator. Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D70559	2019-12-02 20:47:08 +00:00
David Green	469ee617a0	[ARM] Add ARMVCCThen to tablegen and make use of it. NFC Similar to the parent, this adds some constants to tablegen to replace the existing magic values. Differential Revision: https://reviews.llvm.org/D70825	2019-12-02 19:57:12 +00:00
David Green	a223a4d66f	[ARM] Add ARMCC constants to tablegen. NFC I got tired of looking at magic constants in tablegen files. This adds condition codes like ARMCCeq and makes use of them. I also removed the extra patterns for reverse condition codes from D70296, they should now be covered by the parent commit. Differential Revision: https://reviews.llvm.org/D70824	2019-12-02 19:57:12 +00:00
David Green	57d96ab593	[ARM] Add some VCMP folding and canonicalisation The VCMP instructions in MVE can accept a register or ZR, but only as the right hand operator. Most of the time this will already be correct because the icmp will have been canonicalised that way already. There are some cases in the lowering of float conditions that this will not apply to though. This code should fix up those cases. Differential Revision: https://reviews.llvm.org/D70822	2019-12-02 19:57:12 +00:00
Simon Tatham	d173fb5d28	[ARM,MVE] Add intrinsics to deal with predicates. Summary: This commit adds the `vpselq` intrinsics which take an MVE predicate word and select lanes from two vectors; the `vctp` intrinsics which create a tail predicate word suitable for processing the first m elements of a vector (e.g. in the last iteration of a loop); and `vpnot`, which simply complements a predicate word and is just syntactic sugar for the `~` operator. The `vctp` ACLE intrinsics are lowered to the IR intrinsics we've already added (and which D70592 just reorganized). I've filled in the missing isel rule for VCTP64, and added another set of rules to generate the predicated forms. I needed one small tweak in MveEmitter to allow the `unpromoted` type modifier to apply to predicates as well as integers, so that `vpnot` doesn't pointlessly convert its input integer to an `<n x i1>` before complementing it. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70485	2019-12-02 16:20:30 +00:00
Simon Tatham	48cce077ef	[ARM,MVE] Rename and clean up VCTP IR intrinsics. Summary: D65884 added a set of Arm IR intrinsics for the MVE VCTP instruction, to use in tail predication. But the 64-bit one doesn't work properly: its predicate type is `<2 x i1>` / `v2i1`, which isn't a legal MVE type (due to not having a full set of instructions that manipulate it usefully). The test of `vctp64` in `basic-tail-pred.ll` goes through `opt` fine, as the test expects, but if you then feed it to `llc` it causes a type legality failure at isel time. The usual workaround we've been using in the rest of the MVE intrinsics family is to bodge `v2i1` into `v4i1`. So I've adjusted the `vctp64` IR intrinsic to do that, and completely removed the code (and test) that uses that intrinsic for 64-bit tail predication. That will allow me to add isel rules (upcoming in D70485) that actually generate the VCTP64 instruction. Also renamed all four of these IR intrinsics so that they have `mve` in the name, since its absence was confusing. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: MarkMurrayARM Subscribers: samparker, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70592	2019-12-02 16:20:30 +00:00
Nemanja Ivanovic	241cbf201a	[PowerPC] Fix crash in peephole optimization When converting reg+reg shifts to reg+imm rotates, we neglect to consider the CodeGenOnly versions of the 32-bit shift mnemonics. This means we produce a rotate with missing operands which causes a crash. Committing this fix without review since it is non-controversial that the list of mnemonics to consider should include the 64-bit aliases for the exact mnemonics. Fixes PR44183.	2019-12-02 08:56:04 -06:00
Victor Campos	dcf11c5e86	[ARM][AArch64] Complex addition Neon intrinsics for Armv8.3-A Summary: Add support for vcadd_* family of intrinsics. This set of intrinsics is available in Armv8.3-A. The fp16 versions require the FP16 extension, which has been available (opt-in) since Armv8.2-A. Reviewers: t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70862	2019-12-02 14:38:39 +00:00
Tim Renouf	3d5ba7c60f	AMDGPU: Fixed indeterminate map iteration in SIPeepholeSDWA Differential Revision: https://reviews.llvm.org/D70783 Change-Id: Ic26f915a4acb4c00ecefa9d09d7c24cec370ed06	2019-12-02 12:08:49 +00:00
Mark Murray	510792a2e0	[ARM][MVE][Intrinsics] Add VMINQ/VMAXQ/VMINNMQ/VMAXNMQ intrinsics. Summary: Add VMINQ/VMAXQ/VMINNMQ/VMAXNMQ intrinsics and their predicated versions. Add unit tests. Subscribers: kristof.beyls, hiraditya, dmgreen, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70829	2019-12-02 11:18:53 +00:00
David Green	e9e1daf2b9	[ARM] Remove VHADD patterns These instructions do not work quite like I expected them to. They perform the addition and then shift in a higher precision integer, so do not match up with the patterns that we added. For example with s8s, adding 100 and 100 should wrap leaving the shift to work on a negative number. VHADD will instead do the arithmetic in higher precision, giving 100 overall. The vhadd gives a "better" result, but not one that matches up with the input. I am just removing the patterns here. We might be able to re-add them in the future by checking for wrap flags or changing bitwidths. But for the moment just remove them to remove the problem cases.	2019-12-02 10:38:14 +00:00

1 2 3 4 5 ...

54892 Commits