llvm-project

Commit Graph

Author	SHA1	Message	Date
Daniel Sanders	778db88723	[gicombiner] Allow disable-rule option to disable all-except-... Summary: Adds two features to the generated rule disable option: - '' - Disable all rules - '!<foo>' - Re-enable rule(s) - '!foo' - Enable rule named 'foo' - '!5' - Enable rule five - '!4-9' - Enable rule four to nine - '!foo-bar' - Enable rules from 'foo' to (and including) 'bar' (the '!' is available to the generated disable option but is not part of the underlying and determines whether to call setRuleDisabled() or setRuleEnabled()) This is intended to support unit testing of combine rules so that you can do: GeneratedCfg.setRuleDisabled("") GeneratedCfg.setRuleEnabled("foo") to ensure only a specific rule is in effect. The rule is still required to be included in a combiner though Also added --...-only-enable-rule=X,Y which is effectively an alias for --...-disable-rule=*,!X,!Y and as such interacts properly with disable-rule. Reviewers: aditya_nandakumar, bogner, volkan, aemerson, paquette, arsenm Subscribers: wdng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81889	2020-06-16 16:57:16 -07:00
Jessica Paquette	7caa9caa80	[AArch64][GlobalISel] Avoid creating redundant ubfx when selecting G_ZEXT When selecting 32 b -> 64 b G_ZEXTs, we don't have to always emit the extend. If the instruction feeding into the G_ZEXT implicitly zero extends the high half of the register, we can just emit a SUBREG_TO_REG instead. Differential Revision: https://reviews.llvm.org/D81897	2020-06-16 09:50:47 -07:00
Luke Geeson	10b6567f49	[AArch64]: BFloat MatMul Intrinsics&CodeGen This patch upstreams support for BFloat Matrix Multiplication Intrinsics and Code Generation from __bf16 to AArch64. This includes IR intrinsics. Unittests are provided as needed. AArch32 Intrinsics + CodeGen will come after this patch. This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile The following people contributed to this patch: Luke Geeson - Momchil Velikov - Mikhail Maltsev - Luke Cheeseman Reviewers: SjoerdMeijer, t.p.northover, sdesmalen, labrinea, miyuki, stuij Reviewed By: miyuki, stuij Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits, miyuki, chill, pbarrio, stuij Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D80752 Change-Id: I174f0fd0f600d04e3799b06a7da88973c6c0703f	2020-06-16 15:23:30 +01:00
Luke Geeson	508a4764c0	[AArch64]: BFloat Load/Store Intrinsics&CodeGen This patch upstreams support for ld / st variants of BFloat intrinsics in from __bf16 to AArch64. This includes IR intrinsics. Unittests are provided as needed. This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile The following people contributed to this patch: - Luke Geeson - Momchil Velikov - Luke Cheeseman Reviewers: fpetrogalli, SjoerdMeijer, sdesmalen, t.p.northover, stuij Reviewed By: stuij Subscribers: arsenm, pratlucas, simon_tatham, labrinea, kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits, pbarrio, stuij Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D80716 Change-Id: I22e1dca2a8a9ec25d1e4f4b200cb50ea493d2575	2020-06-16 15:23:30 +01:00
Fangrui Song	a3b5f428c1	[AArch64] Print the immediate operand for SPACE pseudo instruction Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D81814	2020-06-15 20:55:53 -07:00
Amara Emerson	1035a416a6	[AArch64][GlobalISel] Emit constant pool loads for 64 bit fp immediates. Note: don't do this for integer 64 bit materialization to match SDAG. Differential Revision: https://reviews.llvm.org/D81893	2020-06-15 20:53:09 -07:00
Jessica Paquette	5a4c3f6b06	[GlobalISel] Look through extends etc in CombinerHelper::matchConstantOp It's possible to end up with a zext or something in the way of a G_CONSTANT, even pre-legalization. This can happen with memsets. e.g. https://godbolt.org/z/Bjc8cw To make sure we can catch these cases, use `getConstantVRegValWithLookThrough` instead of `mi_match`. Differential Revision: https://reviews.llvm.org/D81875	2020-06-15 16:34:25 -07:00
Amara Emerson	fc905ae003	[GlobalISel] Don't emit multiply by magic constant for zero memset values.	2020-06-15 14:42:14 -07:00
Jessica Paquette	7c93a19790	NFC: Remove disabled rule from postlegalizer-combiner-zip.mir test Apparently an x86 bot doesn't like the disabled rule in this test. http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/6569 Remove disabled rule and update the test to try and pacify the bot.	2020-06-15 13:15:02 -07:00
Jessica Paquette	3495b884de	[AArch64][GlobalISel] Add G_EXT and select ext using it Add selection support for ext via a new opcode, G_EXT and a post-legalizer combine which matches it. Add an `applyEXT` function, because the AArch64ext patterns require a register for the immediate. So, we have to create a G_CONSTANT to get these without writing new patterns or modifying the existing ones. Tests are the same as arm64-ext.ll. Also prevent ext from firing on the zip test. It has higher priority, so we don't want it potentially getting in the way of mask tests. Also fix up the shuffle-splat test, because ext is now selected there. The test was incorrectly regbank selected before, which could cause a verifier failure when you emit copies. Differential Revision: https://reviews.llvm.org/D81436	2020-06-15 12:20:59 -07:00
Jessica Paquette	1ac8451a9b	[GlobalISel] Simplify G_ADD when it has (0-X) on the LHS or RHS This implements the following combines: ((0-A) + B) -> B-A (A + (0-B)) -> A-B Porting over the basic algebraic combines from the DAGCombiner. There are several combines which fold adds away into subtracts. This is just the simplest one. I noticed that add combines are some of the most commonly hit across CTMark, (via print statements when they fire), so I'm porting over some of the obvious ones. This gives some minor code size improvements on CTMark at -O3 on AArch64. Differential Revision: https://reviews.llvm.org/D77453	2020-06-15 09:43:24 -07:00
Francesco Petrogalli	28a00ac9ba	[llvm][SVE] IR intrinsics for quadword permutation instructions. Summary: Adding intrinsics and codegen patterns for: * trn1 <Zd>.q, <Zm>.q, <Zn>.q * trn2 <Zd>.q, <Zm>.q, <Zn>.q * zip1 <Zd>.q, <Zm>.q, <Zn>.q * zip2 <Zd>.q, <Zm>.q, <Zn>.q * uzp1 <Zd>.q, <Zm>.q, <Zn>.q * uzp2 <Zd>.q, <Zm>.q, <Zn>.q These instructions are defined in Armv8.6-A. Reviewers: sdesmalen, efriedma, kmclaughlin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80850	2020-06-15 16:21:56 +00:00
Daniel Kiss	b8ae3fdfa5	[AArch64] Fix BTI instruction emission. Summary: SCTLR_EL1.BT[01] controls the PACI[AB]SP compatibility with PBYTE 11 (see [1]) This bit will be set to zero so PACI[AB]SP are equal to BTI C instruction only. [1] https://developer.arm.com/docs/ddi0595/b/aarch64-system-registers/sctlr_el1 Reviewers: chill, tamas.petz, pbarrio, ostannard Reviewed By: tamas.petz, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81746	2020-06-15 15:04:36 +02:00
Dominik Montada	c87bf29149	[MachineVerifier][GlobalISel] Check that branches have a MBB operand or are declared indirect. Add missing properties to G_BRJT, G_BRINDIRECT Summary: Teach MachineVerifier to check branches for MBB operands if they are not declared indirect. Add `isBarrier`, `isIndirectBranch` to `G_BRINDIRECT` and `G_BRJT`. Without these, `MachineInstr.isConditionalBranch()` was giving a false-positive for those instructions. Reviewers: aemerson, qcolombet, dsanders, arsenm Reviewed By: dsanders Subscribers: hiraditya, wdng, simoncook, s.egerton, arsenm, rovka, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81587	2020-06-15 11:17:09 +02:00
Amanieu d'Antras	6973125cb7	Fix FastISel dropping srcloc metadata from InlineAsm Summary: Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46060 I've also added the Extra_IsConvergent flag which was missing from FastISel. Reviewers: echristo Reviewed By: echristo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80759	2020-06-13 16:52:37 +01:00
Amara Emerson	1cbebd95de	[AArch64][GlobalISel] Legalize vector G_PTR_ADD and enable selection. Differential Revision: https://reviews.llvm.org/D81419	2020-06-12 11:25:17 -07:00
Jessica Paquette	d3a56f062b	[AArch64][GlobalISel] Allow G_DUP for elements smaller than 32 B. We select all of these via patterns now, so there's no reason to disallow this. Update select-dup.mir to show that we correctly select the smaller types. Differential Revision: https://reviews.llvm.org/D81322	2020-06-12 09:40:34 -07:00
Jessica Paquette	305862a5a6	[AArch64][GlobalISel] Set hasSideEffects = 0 on custom shuffle opcodes This was making it so that the instructions weren't eliminated in select-rev.mir and select-trn.mir despite not being used. Update the tests accordingly. Differential Revision: https://reviews.llvm.org/D81492	2020-06-12 09:39:46 -07:00
Kristof Beyls	c35ed40f4f	[AArch64] Extend AArch64SLSHardeningPass to harden BLR instructions. To make sure that no barrier gets placed on the architectural execution path, each BLR x<N> instruction gets transformed to a BL __llvm_slsblr_thunk_x<N> instruction, with __llvm_slsblr_thunk_x<N> a thunk that contains __llvm_slsblr_thunk_x<N>: BR x<N> <speculation barrier> Therefore, the BLR instruction gets split into 2; one BL and one BR. This transformation results in not inserting a speculation barrier on the architectural execution path. The mitigation is off by default and can be enabled by the harden-sls-blr subtarget feature. As a linker is allowed to clobber X16 and X17 on function calls, the above code transformation would not be correct in case a linker does so when N=16 or N=17. Therefore, when the mitigation is enabled, generation of BLR x16 or BLR x17 is avoided. As BLRA* indirect calls are not produced by LLVM currently, this does not aim to implement support for those. Differential Revision: https://reviews.llvm.org/D81402	2020-06-12 07:34:33 +01:00
Fangrui Song	432f20bc18	[GlobalISel][test] Add REQUIRES: asserts after D76934	2020-06-11 13:50:56 -07:00
Eli Friedman	12459ec926	[AArch64] Regenerate SVE test llvm-ir-to-intrinsic.ll.	2020-06-11 12:14:24 -07:00
Dominik Montada	f24e2e9eeb	[GlobalISel] fix crash in IRTranslator, MachineIRBuilder when translating @llvm.dbg.value intrinsic and using -debug Summary: Fix crash when using -debug caused by the GlobalISel observer trying to print an incomplete DBG_VALUE instruction. This was caused by the MachineIRBuilder using buildInstr, which immediately inserts the instruction causing print, instead of using BuildMI to first build up the instruction and using insertInstr when finished. Add RUN-line to existing debug-insts.ll test with -debug flag set to make sure no crash is happening. Also fixed a missing %s in the 2nd RUN-line of the same test. Reviewers: t.p.northover, aditya_nandakumar, aemerson, dsanders, arsenm Reviewed By: arsenm Subscribers: wdng, arsenm, rovka, hiraditya, volkan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76934	2020-06-11 10:47:49 +02:00
David Sherwood	bd97342a0c	[CodeGen] Let computeKnownBits do something sensible for scalable vectors Until we have a real need for computing known bits for scalable vectors I have simply changed the code to bail out for now and pretend we know nothing. I've also fixed up some simple callers of computeKnownBits too. Differential Revision: https://reviews.llvm.org/D80437	2020-06-11 08:17:11 +01:00
Kristof Beyls	0ee176edc8	[AArch64] Introduce AArch64SLSHardeningPass, implementing hardening of RET and BR instructions. Some processors may speculatively execute the instructions immediately following RET (returns) and BR (indirect jumps), even though control flow should change unconditionally at these instructions. To avoid a potential miss-speculatively executed gadget after these instructions leaking secrets through side channels, this pass places a speculation barrier immediately after every RET and BR instruction. Since these barriers are never on the correct, architectural execution path, performance overhead of this is expected to be low. On targets that implement that Armv8.0-SB Speculation Barrier extension, a single SB instruction is emitted that acts as a speculation barrier. On other targets, a DSB SYS followed by a ISB is emitted to act as a speculation barrier. These speculation barriers are implemented as pseudo instructions to avoid later passes to analyze them and potentially remove them. Even though currently LLVM does not produce BRAA/BRAB/BRAAZ/BRABZ instructions, these are also mitigated by the pass and tested through a MIR test. The mitigation is off by default and can be enabled by the harden-sls-retbr subtarget feature. Differential Revision: https://reviews.llvm.org/D81400	2020-06-11 07:51:17 +01:00
Sander de Smalen	a0e3ceea6c	[AArch64][SVE] Change pointer type of struct load/store intrinsics. Instead of loading from e.g. `<vscale x 16 x i8>`, load from element pointer `i8`. This is more in line with the other load/store intrinsics for SVE. Reviewers: fpetrogalli, c-rhodes, rengolin, efriedma Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D81458	2020-06-10 14:02:35 +01:00
Shawn Landden	9ec57cce62	[AArch64] custom lowering for i128 popcount halves the number of CNT instructions generated	2020-06-10 09:44:16 +04:00
Amara Emerson	938cc573ee	[AArch64][GlobalISel] Select G_ADD_LOW into a MOVaddr pseudo. This ensures that we match SelectionDAG behaviour by waiting until the expand pseudos pass to generate ADRP + ADD pairs. Doing this at selection time for the G_ADD_LOW is fine because by the time we get to selecting the G_ADD_LOW, previous attempts to fold it into loads/stores must have failed. Differential Revision: https://reviews.llvm.org/D81512	2020-06-09 16:47:58 -07:00
Mehdi Amini	d31c9e5a46	Change filecheck default to dump input on failure Having the input dumped on failure seems like a better default: I debugged FileCheck tests for a while without knowing about this option, which really helps to understand failures. Remove `-dump-input-on-failure` and the environment variable FILECHECK_DUMP_INPUT_ON_FAILURE which are now obsolete. Differential Revision: https://reviews.llvm.org/D81422	2020-06-09 18:57:46 +00:00
David Green	2fea3fe41c	[MachineScheduler] Update available queue on the first mop of a new cycle If a resource can be held for multiple cycles in the schedule model then an instruction can be placed into the available queue, another instruction can be scheduled, but the first will not be taken back out if the two instructions hazard. To fix this make sure that we update the available queue even on the first MOp of a cycle, pushing available instructions back into the pending queue if they now conflict. This happens with some downstream schedules we have around MVE instruction scheduling where we use ResourceCycles=[2] to show the instruction executing over two beats. Apparently the test changes here are OK too. Differential Revision: https://reviews.llvm.org/D76909	2020-06-09 19:13:53 +01:00
Jessica Paquette	cb2d8b30ad	[AArch64][GlobalISel] Select trn1 and trn2 Same idea as for zip, uzp, etc. Teach the post-legalizer combiner to recognize G_SHUFFLE_VECTORs that are trn1/trn2 instructions. - Add G_TRN1 and G_TRN2 - Port mask matching code from AArch64ISelLowering - Produce G_TRN1 and G_TRN2 in the post-legalizer combiner - Select via importer Add select-trn.mir to test selection. Add postlegalizer-combiner-trn.mir to test the combine. This is similar to the existing arm64-trn test. Note that both of these tests contain things we currently don't legalize. I figured it would be easier to test these now rather than later, since once we legalize the G_SHUFFLE_VECTORs, it's not guaranteed that someone will update the tests. Differential Revision: https://reviews.llvm.org/D81182	2020-06-09 10:55:19 -07:00
Sanjay Patel	702cf93356	[DAGCombiner] allow more folding of fadd + fmul into fma If fmul and fadd are separated by an fma, we can fold them together to save an instruction: fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1)) The fold implemented here is actually a specialization - we should be able to peek through >1 fma to find this pattern. That's another patch if we want to try that enhancement though. This transform was guarded by the TLI hook enableAggressiveFMAFusion(), so it was done for some in-tree targets like PowerPC, but not AArch64 or x86. The hook is protecting against forming a potentially more expensive computation when fma takes longer to execute than a single fadd. That hook may be needed for other transforms, but in this case, we are replacing fmul+fadd with fma, and the fma should never take longer than the 2 individual instructions. 'contract' FMF is all we need to allow this transform. That flag corresponds to -ffp-contract=fast in Clang, so we are allowed to form fma ops freely across expressions. Differential Revision: https://reviews.llvm.org/D80801	2020-06-09 10:41:27 -04:00
Cullen Rhodes	b82be5db71	[AArch64][SVE] Implement structured load intrinsics Summary: This patch adds initial support for the following instrinsics: * llvm.aarch64.sve.ld2 * llvm.aarch64.sve.ld3 * llvm.aarch64.sve.ld4 For loading two, three and four vectors worth of data. Basic codegen is implemented with reg+reg and reg+imm addressing modes being addressed in a later patch. The types returned by these intrinsics have a number of elements that is a multiple of the elements in a 128-bit vector for a given type and N, where N is the number of vectors being loaded, i.e. 2, 3 or 4. Thus, for 32-bit elements the types are: LD2 : <vscale x 8 x i32> LD3 : <vscale x 12 x i32> LD4 : <vscale x 16 x i32> This is implemented with target-specific intrinsics for each variant that take the same operands as the IR intrinsic but return N values, where the type of each value is a full vector, i.e. <vscale x 4 x i32> in the above example. These values are then concatenated using the standard concat_vector intrinsic to maintain type legality with the IR. These intrinsics are intended for use in the Arm C Language Extension (ACLE). Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75751	2020-06-09 08:51:58 +00:00
David Sherwood	cc8872400c	[CodeGen] Ensure callers of CreateStackTemporary use sensible alignments In two instances of CreateStackTemporary we are sometimes promoting alignments beyond the stack alignment. I have introduced a new function called getReducedAlign that will return the alignment for the broken down parts of illegal vector types. For example, on NEON a <32 x i8> type is made up of two <16 x i8> types - in this case the sensible alignment is 16 bytes, not 32. In the legalization code wherever we create stack temporaries I have started using the reduced alignments instead for illegal vector types. I added a test to CodeGen/AArch64/build-one-lane.ll that tries to insert an element into an illegal fixed vector type that involves creating a temporary stack object. Differential Revision: https://reviews.llvm.org/D80370	2020-06-09 08:10:17 +01:00
Florian Hahn	1975ff9a0a	[AArch64] Fix ldst-opt of multiple disjunct subregs. Currently aarch64-ldst-opt will incorrectly rename registers with multiple disjunct subregisters (e.g. result of LD3). This patch updates the canRenameUpToDef to bail out if it encounters such a register class that contains the register to rename. Fixes PR46105. Reviewers: efriedma, dmgreen, paquette, t.p.northover Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D81108	2020-06-08 20:18:24 +01:00
Florian Hahn	22c2dc5931	[AArch64] Add a ldst-opt test with undef operands (NFC). This patch adds a test to check that we do not use an undef renamable register for renaming the other operand in a LDP instruction, as suggested in D81108.	2020-06-08 18:46:56 +01:00
David Sherwood	41fb119e8c	[CodeGen] Fix nullptr crash in tryConvertSVEWideCompare When the input to a wide compare instruction is a DUP or SPLAT_VECTOR node we should deal with cases where the DUP/SPLAT_VECTOR input operand is not an immediate value. I've fixed the code to return SDValue() in such cases and added a couple of tests - one each to represent the signed and unsigned cases. Differential Revision: https://reviews.llvm.org/D81167	2020-06-08 15:20:18 +01:00
Cullen Rhodes	3ebbe35363	[AArch64][SVE] Implement vector tuple intrinsics Summary: This patch adds the following intrinsics for creating two-tuple, three-tuple and four-tuple scalable vectors: * llvm.aarch64.sve.tuple.create2 * llvm.aarch64.sve.tuple.create3 * llvm.aarch64.sve.tuple.create4 As well as: * llvm.aarch64.sve.tuple.get * llvm.aarch64.sve.tuple.set For extracting and inserting scalable vectors from vector tuples. These intrinsics are intended to be used by the ACLE functions svcreate<n>, svget and svset. This patch also includes calling convention support for passing and returning tuples of scalable vectors to/from functions. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D75674	2020-06-08 11:09:55 +00:00
Nico Weber	526352bae3	Reverte AArch64 changes to popcount test, they break most bots. See http://lab.llvm.org:8011/console. This reverts commit `0fa3a03327`. This reverts commit `5787ad6c91`.	2020-06-08 06:48:43 -04:00
Shawn Landden	80ab9345ed	[AArch64] Add combine-load test; NFC Problem discovered in https://reviews.llvm.org/D81343	2020-06-08 14:24:27 +04:00
Sander de Smalen	ae09670ee4	[CodeGen][SVE] CopyToReg: Split scalable EVTs that are not powers of 2 Scalable vectors cannot use 'BUILD_VECTOR', so it is necessary to properly split and widen scalable vectors when passing them to CopyToReg/CopyFromReg. This functionality is added to TargetLoweringBase::getVectorTypeBreakdown(). This patch only adds support for 'splitting' scalable vectors that are a multiple of some legal type, e.g. <vscale x 6 x i64> -> 3 x <vscale x 2 x i64> Reviewers: efriedma, c-rhodes Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D80139	2020-06-08 10:39:18 +01:00
Shawn Landden	0fa3a03327	[AArch64] update popcount pre-patch test, take 2; NFC accidentally pushed the post-patch test results The change from before is the use of -O0 to see better what changed	2020-06-08 13:16:26 +04:00
Shawn Landden	5787ad6c91	[AArch64] update popcount pre-patch test; NFC	2020-06-08 13:14:44 +04:00
Shawn Landden	53a4bfa803	[AArch64] add test for large popcount; NFC	2020-06-07 19:20:33 +04:00
Jessica Paquette	8f262a686e	[AArch64][GlobalISel] Move dup optimization into post-legalizer combiner Since all of the other G_SHUFFLE_VECTOR transforms are going there, let's do this with dup as well. This is nice, because it lets us split up the original code into matching, register bank selection, and instruction selection. - Create G_DUP, make it equivalent to AArch64dup - Add a post-legalizer combine which is 90% a copy-and-paste from tryOptVectorDup, except with shuffle matching closer to what SelectionDAG does in `ShuffleVectorSDNode::isSplatMask`. - Teach RegBankSelect about G_DUP. Since dup selection relies on the correct register bank for FP/GPR dup selection, this is necessary. - Kill `tryOptVectorDup`, since it's now entirely handled by G_DUP. - Add testcases for the combine, RegBankSelect, and selection. The selection test gives the same selection results as the old test. Differential Revision: https://reviews.llvm.org/D81221	2020-06-05 17:46:28 -07:00
Matt Arsenault	6c570f789d	GlobalISel: Add G_EXTRACT/G_INSERT offset to legalize info Immediate legalize fields were added for G_SEXT_INREG. Simiarly, these are likely not legal except for certain offsets.	2020-06-05 14:54:40 -04:00
Kerry McLaughlin	89fc0166f5	[CodeGen][SVE] Legalisation of extends with scalable types Summary: This patch adds legalisation of extensions where the operand of the extend is a legal scalable type but the result is not. EXTRACT_SUBVECTOR is used to split the result, before being replaced by target-specific [S\|U]UNPK[HI\|LO] operations. For example: ``` zext <vscale x 16 x i8> %a to <vscale x 16 x i16> ``` should emit: ``` uunpklo z2.h, z0.b uunpkhi z1.h, z0.b ``` Reviewers: sdesmalen, efriedma, david-arm Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79587	2020-06-05 12:08:42 +01:00
Matt Arsenault	9cdc27ffac	AArch64/GlobalISel: Fix assert on call returning 0 sized type I don't know why this is considered valid IR, but it probably should not be.	2020-06-03 19:56:07 -04:00
Jessica Paquette	06ae439110	[AArch64][NFC] Regenerate arm64-rev.ll Test had some GISel stuff in it which was changed by `969d2d1ea9`.	2020-06-03 15:49:51 -07:00
Jessica Paquette	969d2d1ea9	[AArch64][GlobalISel] Add selection support for rev16, rev32, and rev64 This does three things: 1) Adds G_REV16, G_REV32, and G_REV64. These are equivalent to AArch64rev16, AArch64rev32, and AArch64rev64 respectively. 2) Adds support for producing G_REV64 in the postlegalizer combiner. We don't legalize any of the shuffles which could give us a G_REV32 or G_REV16 yet. Since the function for detecting the rev mask is lifted from AArch64ISelLowering, it should work for G_REV32 and G_REV16 when we get there. 3) Adds a selection test for a good portion of the patterns imported for the rev family. The only ones which are not tested are the ones with bitconvert. This also does a little cleanup, and adds a struct for shuffle vector pseudo matchdata. This lets us still use `applyShuffleVectorPseudo` rather than adding a new function. It should also make it a bit easier to port some of the other masks from AArch64ISelLowering. (e.g. `isZIP_v_undef_Mask` and friends) Differential Revision: https://reviews.llvm.org/D81112	2020-06-03 15:30:30 -07:00
Jessica Paquette	8dd34cce07	[AArch64][GlobalISel] Select uzp1 and uzp2 Porting the mask stuff for uzp1 and uzp2 from AArch64ISelLowering. Add two custom opcodes: G_UZP1 and G_UZP2. Produce them in the post-legalizer combiner when the mask checks out. Tests: - postlegalizer-combiner-uzp.mir verifies that we create G_UZP1 and G_UZP2. The testcases that check that we create them come from neon-perm.ll. - select-uzp.mir verifies that we can select G_UZP1 and G_UZP2. Differential Revision: https://reviews.llvm.org/D81049	2020-06-03 15:09:41 -07:00

1 2 3 4 5 ...

3661 Commits