llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Moll	24215fec9a	[NFC][VE] format VEInstrInfo	2020-02-03 14:25:49 +01:00
Simon Moll	5c8ba508b2	[NFC] unsigned->Register in storeRegTo/loadRegFromStack Summary: This patch makes progress on the 'unsigned -> Register' rewrite for `TargetInstrInfo::loadRegFromStack` and `TII::storeRegToStack`. Reviewers: arsenm, craig.topper, uweigand, jpienaar, atanasyan, venkatra, robertlytton, dylanmckay, t.p.northover, kparzysz, tstellar, k-ishizaka Reviewed By: arsenm Subscribers: wuzish, merge_guards_bot, jyknight, sdardis, nemanjai, jvesely, wdng, nhaehnle, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73870	2020-02-03 14:22:16 +01:00
Guillaume Chatelet	fc19465965	[Alignment][NFC] Use Align for code creating MemOp Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73874	2020-02-03 14:10:30 +01:00
John Brawn	b37d59353f	[FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCS These can be lowered to code sequences using CMPFP and CMPFPE which then get selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain operand isn't handled correctly, but resolving that looks like it would involve changes around FPSCR-handling instructions and how the FPSCR is modelled. The fp-intrinsics test was already testing some of this but as the entire test was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the cases where we aren't generating the right instruction sequences as FIXME. Differential Revision: https://reviews.llvm.org/D73194	2020-02-03 12:59:12 +00:00
Simon Tatham	961530fdc9	[ARM,MVE] Fix vreinterpretq in big-endian mode. Summary: In big-endian MVE, the simple vector load/store instructions (i.e. both contiguous and non-widening) don't all store the bytes of a register to memory in the same order: it matters whether you did a VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register formats of different vector types relate to each other in a different way from the in-memory formats. So, if you want to 'bitcast' or 'reinterpret' one vector type as another, you have to carefully specify which you mean: did you want to reinterpret the //register// format of one type as that of the other, or the //memory// format? The ACLE `vreinterpretq` intrinsics are specified to reinterpret the register format. But I had implemented them as LLVM IR bitcast, which is specified for all types as a reinterpretation of the memory format. So a `vreinterpretq` intrinsic, applied to values already in registers, would code-generate incorrectly if compiled big-endian: instead of emitting no code, it would emit a `vrev`. To fix this, I've introduced a new IR intrinsic to perform a register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's implemented by a trivial isel pattern that expects the input in an MQPR register, and just returns it unchanged. In the clang codegen, I only emit this new intrinsic where it's actually needed: I prefer a bitcast wherever it will have the right effect, because LLVM understands bitcasts better. So we still generate bitcasts in little-endian mode, and even in big-endian when you're casting between two vector types with the same lane size. For testing, I've moved all the codegen tests of vreinterpretq out into their own file, so that they can have a different set of RUN lines to check both big- and little-endian. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73786	2020-02-03 11:20:06 +00:00
Simon Tatham	f8d4afc49a	[ARM,MVE] Add intrinsics for v[id]dupq and v[id]wdupq. Summary: These instructions generate a vector of consecutive elements starting from a given base value and incrementing by 1, 2, 4 or 8. The `wdup` versions also wrap the values back to zero when they reach a given limit value. The instruction updates the scalar base register so that another use of the same instruction will continue the sequence from where the previous one left off. At the IR level, I've represented these instructions as a family of target-specific intrinsics with two return values (the constructed vector and the updated base). The user-facing ACLE API provides a set of intrinsics that throw away the written-back base and another set that receive it as a pointer so they can update it, plus the usual predicated versions. Because the intrinsics return two values (as do the underlying instructions), the isel has to be done in C++. This is the first family of MVE intrinsics that use the `imm_1248` immediate type in the clang Tablegen framework, so naturally, I found I'd given it the wrong C integer type. Also added some tests of the check that the immediate has a legal value, because this is the first time those particular checks have been exercised. Finally, I also had to fix a bug in MveEmitter which failed an assertion when I nested two `seq` nodes (the inner one used to extract the two values from the pair returned by the IR intrinsic, and the outer one put on by the predication multiclass). Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73357	2020-02-03 11:20:06 +00:00
Simon Tatham	cf7e98e6f7	[ARM,MVE] Add intrinsics for vdupq. Summary: The unpredicated case of this is trivial: the clang codegen just makes a vector splat of the input, and LLVM isel is already prepared to handle that. For the predicated version, I've generated a `select` between the same vector splat and the `inactive` input parameter, and added new Tablegen isel rules to match that pattern into a predicated `MVE_VDUP` instruction. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73356	2020-02-03 11:20:06 +00:00
Clement Courbet	5b2c5e261f	[llvm-exegesis] Add pfm counters for Zen2 (znver2). Summary: There are no counters for individual ports, but this is already enough to find a lot of issues in the current model (upcoming patch). Reviewers: dblaikie, gchatelet Subscribers: hiraditya, tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72032	2020-02-03 10:57:41 +01:00
Jay Foad	97d9a76afc	[AMDGPU] Don't remove short branches over kills Summary: D68092 introduced a new SIRemoveShortExecBranches optimization pass and broke some graphics shaders. The problem is that it was removing branches over KILL pseudo instructions, and the fix is to explicitly check for that in mustRetainExeczBranch. Reviewers: critson, arsenm, nhaehnle, cdevadas, hakzsam Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73771	2020-02-03 09:26:52 +00:00
Craig Topper	cf20fde1d1	[X86] Remove a couple unnecessary calls to ConvertCmpIfNecessary. We only need to call this on floating point comparisons. In this case these are known to be integer compares. One of them even has a SUB opcode instead of CMP.	2020-02-02 21:36:51 -08:00
Shengchen Kan	db7d2ab03d	[NFC] Fix helptext for opt/llc after https://reviews.llvm.org/D68411 Remove "cl::value_desc("jcc, fused, jmp, call, ret, indirect"),", which makes the option+it's cl::value_desc too long in all of help.	2020-02-03 12:31:42 +08:00
Craig Topper	ee85415dbb	[X86] Use MVT::f80 for the result type of the FLD used to convert from SSE register to X87 register in FP_TO_INTHelper.	2020-02-02 13:24:37 -08:00
Simon Pilgrim	5d86ac82a6	Fix a few spelling mistakes in comments. NFCI.	2020-02-02 18:27:43 +00:00
Simon Pilgrim	17e91b7dd2	[X86][SSE] combineBitcastvxi1 - add pre-AVX512 v64i1 handling	2020-02-02 18:00:09 +00:00
David Green	d50e188a07	Revert "[ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPS" This reverts commit `e34801c8e6` and the followup due to multiple problems. I've tried to keep the tests and RDA parts where possible, as those still seem useful.	2020-02-02 13:24:05 +00:00
Nicolai Hähnle	ba8110161d	AMDGPU/GFX10: Fix NSA reassign pass when operands are undef Summary: Virtual registers that are undef have an empty LiveInterval at this point, which means beginIndex() and endIndex() cannot be used. We only need those indices to determine the range in which to scan for affected other NSA instructions, and undef operands cannot contribute to that range. Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73831	2020-02-01 22:41:40 +01:00
Craig Topper	a57dd66d5e	[X86] In X86FastEmitSSESelect, fall back to SelectionDAG if the inputs to the compare can't be found in registers. We were checking that the original Value * for the compare operands were null. But that can never happen. I believe we intended to check for 0 registers here instead. Fixes PR44749.	2020-02-01 12:24:55 -08:00
Craig Topper	d975910c50	[X86] Don't exit from foldOffsetIntoAddress if the Offset is 0, but AM.Disp is non-zero. This is an alternate fix for the issue D73606 was trying to solve. The main issue here is that we bailed out of foldOffsetIntoAddress if Offset is 0. But if we just found a symbolic displacement and AM.Disp became non-zero earlier, we still need to validate that AM.Disp with the symbolic displacement. This is my second attempt at committing this after failing build bots previously. One thing I realized about the previous attempt is that its possible that AM.Disp is already non-zero and the new Offset changes it back to zero. In that case my previous attempt failed to update AM.Disp to zero. So this patch removes the early out for 0 and appropriately handle the 0 case in each check so we still update AM.Disp at the end.	2020-02-01 11:26:17 -08:00
Craig Topper	943b5561d6	[LegalizeTypes][X86] Add a new strategy for type legalizing f16 type that softens it to i16, but promotes to f32 around arithmetic ops. This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes. It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from. This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle. Differential Revision: https://reviews.llvm.org/D73749	2020-02-01 11:21:04 -08:00
Alex Richardson	24ee9c8496	Don't mark MIPS TRAP as isTerminator This was causing machine verifier errors when compiling libunwind. Reviewed By: atanasyan Differential Revision: https://reviews.llvm.org/D73648	2020-02-01 15:50:22 +00:00
Matt Arsenault	c0b12916a7	AMDGPU/GlobalISel: Use more wide vector load/stores This improves the type breakdown for some large vectors. For example, we now get a <4 x s32> and s32 store instead of 5 s32 stores for <5 x s32>.	2020-02-01 10:47:21 -05:00
Matt Arsenault	e3117e5c30	AMDGPU/GlobalISel: Improve legalization of wide stores This fixes legalizations of global stores > 128-bits. It seems work is needed on how this split actually occurs. For example, we get the right code for s160, with an s128 and s32 load, but get 5 s32 loads for <5 x s32>.	2020-02-01 10:47:03 -05:00
Matt Arsenault	98aaed2980	AMDGPU/GlobalISel: Fix forming G_TRUNC with vcc result This somehow got lost when I fixed the boolean handling.	2020-01-31 20:29:41 -05:00
Luís Marques	24cba3312f	[RISCV] Implement jump pseudo-instruction Summary: Implements the jump pseudo-instruction, which is used in e.g. the Linux kernel. Reviewers: asb, lenary Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D73178	2020-01-31 22:28:26 +00:00
Jessica Paquette	b9bf9305d1	[AArch64][GlobalISel] Walk through G_TRUNC in getTestBitReg When you encounter a G_TRUNC, you are moving from a larger type to a smaller type. Asking for the i-th bit on a larger value is the same as asking for the i-th bit on a smaller value. So, we should always be able to walk through G_TRUNC when computing the bit for a TB(N)Z. Differential Revision: https://reviews.llvm.org/D73748	2020-01-31 11:09:55 -08:00
alex-t	5df1ac7846	[AMDGPU] fixed divergence driven shift operations selection Differential Revision: https://reviews.llvm.org/D73483 Reviewers: rampitec	2020-01-31 20:49:56 +03:00
Jay Foad	2a1b5af299	[GlobalISel] Tidy up unnecessary calls to createGenericVirtualRegister Summary: As a side effect some redundant copies of constant values are removed by CSEMIRBuilder. Reviewers: aemerson, arsenm, dsanders, aditya_nandakumar Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, hiraditya, jrtc27, atanasyan, volkan, Petar.Avramovic, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73789	2020-01-31 17:07:16 +00:00
Danilo Carvalho Grael	44a4f5fc6a	[AArch64][SVE] Add SVE2 mla unpredicated intrinsics. Summary: Add intrinsics for the MLA unpredicated sve2 instructions: - smlalb, smlalt, umlalb, umlalt, smlslb, smlslt, umlslb, umlslt - sqdmlalb, sqdmlalt, sqdmlslb, sqdmlslt - sqdmlalbt, sqdmlslbt Reviewers: efriedma, sdesmalen, cameron.mcinally, c-rhodes, rengolin, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D73746	2020-01-31 11:39:12 -05:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
Matt Arsenault	b3726ecea4	AMDGPU: Fix potential use of undefined value	2020-01-31 10:38:58 -05:00
Matt Arsenault	6fb544d1d2	AMDGPU/GlobalISel: Combine FMIN_LEGACY/FMAX_LEGACY Try out using combine definition rules. This really should be a post-legalizer combine, but the combiner pass is currently pre-legalize. Most of the target combines are really post-legalize, so we should probably move the pass.	2020-01-31 06:58:04 -08:00
Matt Arsenault	49e424e08e	AMDGPU/GlobalISel: Select global MUBUF atomicrmw	2020-01-31 06:05:41 -08:00
Matt Arsenault	0426c2d07d	Reapply "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `6a4acb9d80`.	2020-01-31 06:01:28 -08:00
Jay Foad	31e29d4afe	AMDGPU/GlobalISel: Make use of MachineIRBuilder helper functions. NFC.	2020-01-31 13:53:39 +00:00
Kerry McLaughlin	69558c8487	[AArch64][SVE] Add remaining SVE2 intrinsics for uniform DSP operations Summary: Implements the following intrinsics: - @llvm.aarch64.sve.[s\|u]qadd - @llvm.aarch64.sve.[s\|u]qsub - @llvm.aarch64.sve.suqadd - @llvm.aarch64.sve.usqadd - @llvm.aarch64.sve.[s\|u]qsubr - @llvm.aarch64.sve.[s\|u]rshl - @llvm.aarch64.sve.[s\|u]qshl - @llvm.aarch64.sve.[s\|u]qrshl - @llvm.aarch64.sve.[s\|u]rshr - @llvm.aarch64.sve.sqshlu - @llvm.aarch64.sve.sri - @llvm.aarch64.sve.sli - @llvm.aarch64.sve.[s\|u]sra - @llvm.aarch64.sve.[s\|u]rsra - @llvm.aarch64.sve.[s\|u]aba Reviewers: efriedma, sdesmalen, dancgr, cameron.mcinally, c-rhodes, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73551	2020-01-31 10:51:57 +00:00
Fangrui Song	5b22bcc2b7	[X86][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local For a MC_GlobalAddress reference to a dso_local external GlobalValue with a definition, emit .Lfoo$local to avoid a relocation. -fno-pic and -fpie can infer dso_local but -fpic cannot. In the future, we can explore the possibility of inferring dso_local with -fpic. As the description of D73228 says, LLVM's existing IPO optimization behaviors (like -fno-semantic-interposition) and a previous assembly behavior give us enough license to be aggressive here. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D73230	2020-01-30 17:52:35 -08:00
Matt Arsenault	6a4acb9d80	Revert "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `17dbc6611d`. A test is failing on some bots	2020-01-30 15:39:51 -08:00
Matt Arsenault	17dbc6611d	AMDGPU: Cleanup and fix SMRD offset handling I believe this also fixes bugs with CI 32-bit handling, which was incorrectly skipping offsets that look like signed 32-bit values. Also validate the offsets are dword aligned before folding.	2020-01-30 15:04:21 -08:00
Jessica Paquette	c8c987d310	[AArch64][GlobalISel] Fold in G_ANYEXT/G_ZEXT into TB(N)Z This is similar to the code in getTestBitOperand in AArch64ISelLowering. Instead of implementing all of the TB(N)Z optimizations at once, this patch implements the simplest case first. The way that this is set up should make it fairly easy to add the rest as we go along. The idea here is that after determining that we can use a TB(N)Z, we can continue looking through instructions and perform further folding. In this case, when we have a G_ZEXT or G_ANYEXT where the extended bits are not used, we can fold it into the TB(N)Z. Differential Revision: https://reviews.llvm.org/D73673	2020-01-30 14:51:26 -08:00
Amara Emerson	6170272ab9	[AArch64][GlobalISel] Disallow vectors in convertPtrAddToAdd. Found by inspection, but there's no test for this yet because G_PTR_ADD is currently illegal for vectors. I'll add the test at a later time when the legalizer support has landed.	2020-01-30 14:50:44 -08:00
Matt Arsenault	f7521dc292	AMDGPU: Replace subtarget check with an assert This is already checked by the pattern subtarget predicate.	2020-01-30 14:15:26 -08:00
Matt Arsenault	97a1d4bc02	AMDGPU: Don't use separate cache arguments for s_buffer_load node There's not much value to this separate node from the intrinsic. Make the operand structure the same as the intrinsic, so we can reuse the same pattern for GlobalISel.	2020-01-30 14:15:26 -08:00
hsmahesha	1d9e08ec35	[AMDGPU] Add file headers for few files where it is missing. Summary: Added file headers for files which implement iterative lightweight scheduling strategies. Which is basically an exercise which I undertook in order to get used to LLVM development process. Reviewers: arsenm, vpykhtin, cdevadas Reviewed By: vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73417	2020-01-31 02:06:41 +05:30
Fangrui Song	06b8e32d4f	[AArch64] -fpatchable-function-entry=N,0: place patch label after BTI Summary: For -fpatchable-function-entry=N,0 -mbranch-protection=bti, after `9a24488cb6`, we place the NOP sled after the initial BTI. ``` .Lfunc_begin0: bti c nop nop .section __patchable_function_entries,"awo",@progbits,f,unique,0 .p2align 3 .xword .Lfunc_begin0 ``` This patch adds a label after the initial BTI and changes the __patchable_function_entries entry to reference the label: ``` .Lfunc_begin0: bti c .Lpatch0: nop nop .section __patchable_function_entries,"awo",@progbits,f,unique,0 .p2align 3 .xword .Lpatch0 ``` This placement is compatible with the resolution in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 . A local linkage function whose address is not taken does not need a BTI. Placing the patch label after BTI has the advantage that code does not need to differentiate whether the function has an initial BTI. Reviewers: mrutland, nickdesaulniers, nsz, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73680	2020-01-30 11:11:52 -08:00
Danilo Carvalho Grael	0610637aac	[AArch64][SVE] Add remaining SVE2 mla indexed intrinsics. Summary: Add remaining SVE2 mla indexed intrinsics: - sqdmlalb, sqdmlalt, sqdmlslb, sqdmlslt Add suffix _lanes and switch immediate types to i32 for all mla indexed intrinsics to align with ACLE builtin definitions. Reviewers: efriedma, sdesmalen, cameron.mcinally, c-rhodes, rengolin, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D73633	2020-01-30 13:32:11 -05:00
Nikita Popov	70d345e687	[AArch64][ARM] Always expand ordered vector reductions (PR44600) fadd/fmul reductions without reassoc are lowered to VECREDUCE_STRICT_FADD/FMUL nodes, which don't have legalization support. Until that is in place, expand these intrinsics on ARM and AArch64. Other targets always expand the vector reduction intrinsics. Additionally expand fmax/fmin reductions without nonan flag on AArch64, as the backend asserts that the flag is present when lowering VECREDUCE_FMIN/FMAX. This fixes https://bugs.llvm.org/show_bug.cgi?id=44600. Differential Revision: https://reviews.llvm.org/D73135	2020-01-30 18:40:24 +01:00
Yonghong Song	795bbb3662	[BPF] fix a bug in BPFMISimplifyPatchable pass with -O0 The recommended optimization level for BPF programs is O2 since (1). BPF is running inside the kernel and linux kernel won't work at -O0 level, and (2). Verifier is not able to handle O0 code properly, e.g., potential large stack size and a lot of spills. But we should keep -O0 at least compiling. This patch fixed a bug in BPFMISimplifyPatchable phase where with -O0, a segmentation fault will happen for a simple program like: int test(int a, int b) { return a + b; } A test case is added to capture such a case. Differential Revision: https://reviews.llvm.org/D73681	2020-01-30 08:28:39 -08:00
jasonliu	3bbe7a681e	[XCOFF][AIX] Support basic relocation type on AIX Summary: This patch intends to support three most common relocation type on AIX: R_POS, R_TOC, R_RBR. These three relocation type will be needed for object file generation on AIX for small code model. We will have follow up patches to bring relocation support for large code model on AIX. Reviewers: hubert.reinterpretcast, daltenty, DiggerLin Differential Revision: https://reviews.llvm.org/D72027	2020-01-30 15:59:09 +00:00
Stefan Pintilie	9de1241bb2	[PowerPC][Future] Branch Distance Estimation For Prefixed Instructions By adding the prefixed instructions the branch distances are no longer computed correctly. Since prefixed instructions cannot cross a 64 byte boundary we have to assume that a prefixed instruction may have a nop prepended to it. This patch tries to take that nop into consideration when computing the size of basic blocks. Differential Revision: https://reviews.llvm.org/D72572	2020-01-30 08:54:33 -06:00
Matt Arsenault	d6b83d6ba5	AMDGPU/GlobalISel: Don't use pointless getConstantVRegVal This is always a G_CONSTANT already	2020-01-30 09:38:43 -05:00
Nemanja Ivanovic	6cc6e89c11	Fix helptext for opt/llc after `14fc20ca6` The commit https://reviews.llvm.org/rG14fc20ca6 added some options to the X86 back end that cause the help text for opt/llc to become much harder to read. The issue is that the cl::value_desc is part of the option name and is used to compute the indentation of the description text (i.e. the maximum length option name is what everything aligns to). Since the commit puts a large number of characters into that text, everything is aligned to that width. This patch just reformats the option so that the description is contained in the description and the list of possible values is within the angle brackets. Note: the readability issue of the helptext was fixed in commit `70cbf8c71c`, but the re-formatting wasn't added on that commit so I am still committing this. Differential revision: https://reviews.llvm.org/D73267	2020-01-30 08:35:55 -06:00
John Brawn	0bb9a27c98	[FPEnv][AArch64] Add lowering and instruction selection for strict conversions Strict fp-to-int and int-to-fp conversions can be handled in the same way that the non-strict versions are (by using the appropriate instruction or converting to a function call when we have no instruction). Differential Revision: https://reviews.llvm.org/D73625	2020-01-30 13:50:06 +00:00
Matt Arsenault	ea956685a1	GlobalISel: Implement s32->s64 G_FPTOSI lowering Port directly from DAG version. The lowering for G_FPTOUI used to fail on AMDGPU because it uses G_FPTOSI.	2020-01-30 08:47:07 -05:00
Matt Arsenault	b21571f4d5	AMDGPU/GlobalISel: Handle s64->s64 G_FPTOSI/G_FPTOUI	2020-01-30 08:46:37 -05:00
Matt Arsenault	8184176efd	AMDGPU/GlobalISel: Custom lower G_LOG/G_LOG10 I'm pretty sure this is wrong and we should expand these in a correct way, but this matches the existing behavior.	2020-01-30 08:38:50 -05:00
Matt Arsenault	872e899b75	AMDGPU/GlobalISel: Legalize unpacked d16 image operations On targets that don't have the normal packed f16 layout, handle these during legalization. Directly modify the register types. We can infer this was a d16 load based on the mem operand size during selection. A16 operands should possibly be handled here as well, but don't worry about that yet.	2020-01-30 08:36:11 -05:00
Matt Arsenault	d21182d692	AMDGPU/GlobalISel: Only map VOP operands to VGPRs This trivially avoids violating the constant bus restriction. Previously this was allowing one SGPR in the first source operand, which technically also avoided violating this for most operations (but not for special cases reading vcc). We do need to write some new, smarter operand folds to pick the optimal SGPR to use in some kind of post-isel fold, but that's purely an optimization. I was originally thinking we would pick which operands should be SGPRs in RegBankSelect, but I think this isn't really manageable. There would be additional complexity to handle every G_* instruction, and then any nontrivial instruction patterns would need to know when to avoid violating it, which is likely to be very error prone. I think having all inputs being canonically copies to VGPRs will simplify the operand folding logic. The current folding we do is backwards, and only considers one operand at a time, relative to operands it already has. It therefore poorly handles the case where there is already a constant bus operand user. If all operands are copies, it's somewhat simpler to consider all input operands at once to choose the optimal constant bus user. Since the failure mode for constant bus violations is now a verifier error and not an selection failure, this moves towards a place where we can turn on the fallback mode. The SGPR copy folding optimizations can be left for later.	2020-01-30 08:32:35 -05:00
Matt Arsenault	b4a0766c8d	AMDGPU/GlobalISel: Select llvm.amdgcn.buffer.atomic.cmpswap	2020-01-30 08:22:43 -05:00
Stefan Pintilie	f00be8da62	[PowerPC][Future] Prefixed Instructions 64 Byte Boundary Support A known limitation for Future CPU is that the new prefixed instructions may not cross 64 Byte boundaries. All instructions are already 4 byte aligned so the only situation where this can occur is when the prefix is in one 64 byte block and the instruction that is prefixed is at the top of the next 64 byte block. To fix this case PPCELFStreamer was added to intercept EmitInstruction. When a prefixed instruction is emitted we try to align it to 64 Bytes by adding a maximum of 4 bytes. If the prefixed instruction crosses the 64 Byte boundary then the alignment would trigger and a 4 byte nop would be added to push the instruction into the next 64 byte block. Differential Revision: https://reviews.llvm.org/D72570	2020-01-30 06:52:30 -06:00
John Brawn	258d8dd76a	[FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUND This gets selected to the appropriate fcvt instruction. Handling from there on isn't fully correct yet, as we need to model fcvt reading and writing to fpsr and fpcr. Differential Revision: https://reviews.llvm.org/D73201	2020-01-30 12:51:25 +00:00
Sam Parker	06e12893ff	[ARM][LowOverheadLoops] Skip debug values While iterating through the loop, don't inspect any dbg values. Differential Revision: https://reviews.llvm.org/D73688	2020-01-30 11:51:58 +00:00
John Brawn	2224407ef5	Add lowering of STRICT_FSETCC and STRICT_FSETCCS These become STRICT_FCMP and STRICT_FCMPE, which then get selected to the corresponding FCMP and FCMPE instructions, though the handling from there on isn't fully correct as we don't model reads and writes to FPCR and FPSR. Differential Revision: https://reviews.llvm.org/D73368	2020-01-30 10:40:55 +00:00
Connor Abbott	ce06d50756	AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns Summary: The code was assuming in a few places that if there was only one exit from the function that it was a normal return, which is invalid. It could be an infinite loop, in which case we still need to insert the usual fake edge so that the null export happens. This fixes shaders that end with an infinite loop that discards. Reviewers: arsenm, nhaehnle, critson Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71192	2020-01-30 10:55:02 +01:00
Clement Courbet	c5344d857f	[X86][Sched] A bunch of fixes to the Zen2 sched model latencies. Summary: As determined with `llvm-exegesis`. Some of these look like typos/misunderstandings of the sched model td spec: - latency defaults to `1` when not set => Maybe we can avoid having a default ? - problems with regexps not being anchored by default (XCHG matching CMPXHG) Note that this is not complete, it fixes only the most obvious mistakes, and only for latency (not uops). Reviewers: RKSimon, GGanesh Subscribers: hiraditya, jfb, mstojanovic, hfinkel, craig.topper, andreadb, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73172	2020-01-30 10:20:31 +01:00
Sam Parker	6726d67bfd	[ARM][LowOverheadLoops] Check scalar predicates When trying to remove the loop iteration count, check that the instruction will always execute. Differential Revision: https://reviews.llvm.org/D73682	2020-01-30 09:13:04 +00:00
Amara Emerson	610f1d22f1	[AArch64][GlobalISel] During ISel try to convert G_PTR_ADD to G_ADD. This lowering tries to look for G_PTR_ADD instructions and then converts them to a standard G_ADD with a COPY on the source, and G_INTTOPTR on the result. This is ok for address space 0 on AArch64 as p0 can be treated as s64. The motivation behind this is to expose the add semantics to the imported tablegen patterns. We shouldn't need to check for uses being loads/stores, because the selector works bottom up, uses before defs. By the time we end up trying to select a G_PTR_ADD, we should have already attempted to fold this into addressing modes and were therefore unsuccessful. This gives some performance and code size improvements across the board. Differential Revision: https://reviews.llvm.org/D73673	2020-01-29 23:04:52 -08:00
Craig Topper	007a6a155c	Revert "[X86] Don't exit from foldOffsetIntoAddress if the Offset is 0, but AM.Disp is non-zero." Possibly causing build bot failures.	2020-01-29 22:59:05 -08:00
Shengchen Kan	eb054577e9	[X86] Add function isPrefix() Currently some prefixes are emitted as instructions, to distinguish them from real instruction, fuction isPrefix() is added. The kinds of prefix are consistent with X86GenInstrInfo.inc. Differential Revision: https://reviews.llvm.org/D73013	2020-01-30 14:11:50 +08:00
Craig Topper	1ef8e8b414	[X86] Don't exit from foldOffsetIntoAddress if the Offset is 0, but AM.Disp is non-zero. This is an alternate fix for the issue D73606 was trying to solve. The main issue here is that we bailed out of foldOffsetIntoAddress if Offset is 0. But if we just found a symbolic displacement and AM.Disp became non-zero earlier, we still need to validate that AM.Disp with the symbolic displacement. This passes fold-add-pcrel.ll. Differential Revision: https://reviews.llvm.org/D73608	2020-01-29 21:32:16 -08:00
Craig Topper	35625464c6	[X86] Fix the cost model for v16i16->v16i32 zero_extend/sign_extend with AVX2 We seem to be inheriting the cost from sse4.1. But if we have 256-bit registers we should be able to do this with just one extract to split the 16i16 and two v8i16->v8i32 operations so our cost should be 3 not 4. Differential Revision: https://reviews.llvm.org/D73646	2020-01-29 15:52:10 -08:00
Matt Arsenault	c5fffa4da3	GlobalISel: Add observer argument to legalizeIntrinsic This is passed to legalizeCustom, but not intrinsic. Also remove the MRI argument, since you can get that from the MachineIRBuilder. I'm not sure why MachineIRBuilder has a private observer member, and this is passed separately.	2020-01-29 18:33:45 -05:00
Matt Arsenault	7f3280ecdd	AMDGPU/GlobalISel: Select permlane16/permlanex16	2020-01-29 17:55:31 -05:00
Cameron McInally	4f2e2acc4b	[NFC][AArch64][SVE] Rename Destructive enumerator from DestructiveInstType Rename Destructive enumerator in preparation for a larger set of patches to support prefixing destructive oeprations with MOVPRFX. Differential Revision: https://reviews.llvm.org/D73212	2020-01-29 15:42:26 -06:00
Jessica Paquette	050cd443ca	[AArch64][GlobalISel] Fix TBNZ/TBZ opcode selection When the bit is <= 32, we have to use the W register variant for TB(N)Z. This is because of the way the instruction is encoded. Differential Revision: https://reviews.llvm.org/D73660	2020-01-29 13:11:18 -08:00
Cameron McInally	00c2249910	[NFCI][AArch64][SVE] Set default DestructiveInstType in AArch64Inst class Some housekeeping for the DestructiveInstType enum before a larger set of patches to support prefixing destructive oeprations with MOVPRFX. Differential Revision: https://reviews.llvm.org/D73141	2020-01-29 15:00:19 -06:00
Victor Huang	1492b70a03	[PowerPC][Future] Add prefixed loads and stores for future CPU A previous patch should have added pld and pstd and any support code in the backend that is required for prefixed load and store type operations. This patch adds a number of additional prefixed load and store type instructions for the future CPU. Differential Revision: https://reviews.llvm.org/D72577	2020-01-29 14:45:56 -06:00
Huihui Zhang	8f6761aa41	Revert "[AArch64] Fix data race on RegisterBank initialization." Buildbot failure, revert first while looking at the issue. This reverts commit `a5a4a47d69`.	2020-01-29 11:17:19 -08:00
Huihui Zhang	af620fc36a	Revert "[AMDGPU] Fix data race on RegisterBank initialization." There looks to be buildbot failure related. This reverts commit `8bb6c8a22a`.	2020-01-29 11:16:27 -08:00
Huihui Zhang	2ec954579a	Revert "[ARM] Fix data race on RegisterBank initialization." There looks to be buildbot failure related. This reverts commit `91618d940e`.	2020-01-29 11:15:27 -08:00
Austin Kerbow	2605adb69c	[AMDGPU][GlobalISel] Select 8-byte LDS Ops with 4-byte alignment Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73585	2020-01-29 10:42:12 -08:00
Huihui Zhang	91618d940e	[ARM] Fix data race on RegisterBank initialization. Summary: The initialization of RegisterBank needs to be done only once. The logic of AlreadyInit has data race, use llvm::call_once instead. This is continuing work of D73587. Reviewers: arsenm, rovka, dsanders, t.p.northover, efriedma, apazos Reviewed By: arsenm Subscribers: wdng, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73605	2020-01-29 10:15:37 -08:00
Huihui Zhang	8bb6c8a22a	[AMDGPU] Fix data race on RegisterBank initialization. Summary: The initialization of RegisterBank needs to be done only once. The logic of AlreadyInit has data race, use llvm::call_once instead. This is continuing work of D73587. Reviewers: arsenm, tstellar, ronlieb, efriedma, apazos, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73604	2020-01-29 10:14:40 -08:00
Huihui Zhang	a5a4a47d69	[AArch64] Fix data race on RegisterBank initialization. Summary: The initialization of RegisterBank needs to be done only once. The logic of AlreadyInit has a data race, use llvm::call_once instead. This issue was identified through thread sanitizer. Reviewers: efriedma, apazos, qcolombet, dsanders Reviewed By: efriedma Subscribers: arsenm, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73587	2020-01-29 10:12:52 -08:00
Craig Topper	90c31b0f42	[X86] Custom lower ISD::FROUND with SSE4.1 to avoid a libcall. ISD::FROUND is defined to round to nearest with ties rounding away from 0. This mode isn't supported in hardware on X86. But as long as we aren't compiling with trapping math, we can emulate this with floor(X + copysign(nextafter(0.5, 0.0), X)). We have to use nextafter to avoid some corner cases that adding 0.5 would have. For example, if X is nextafter(0.5, 0.0) it should round to 0.0, but adding 0.5 would need one extra bit of mantissa than can be stored so it rounds to 1.0. Adding nextafter(0.5, 0.0) instead will just increase the exponent by 1 and leave the mantissa as all 1s. This would be nextafter(1.0, 0.0) which will floor to 0.0. Techically this requires -fno-trapping-math which isn't our default. But if we care about exceptions we should be using constrained intrinsics. Constrained intrinsics would use STRICT_FROUND which won't go through this code. Fixes PR42195. Differential Revision: https://reviews.llvm.org/D73607	2020-01-29 09:10:02 -08:00
Jay Foad	d07a789579	[AMDGPU] Cluster FLAT instructions with both vaddr and saddr Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73634	2020-01-29 17:01:35 +00:00
Craig Topper	e5edd641fd	[X86] Use a shorter sequence to implement FLT_ROUNDS This code needs to map from the FPCW 2-bit encoding for rounding mode to the 2-bit encoding defined for FLT_ROUNDS. The previous implementation did some clever swapping of bits and adding 1 modulo 4 to do the mapping. This patch instead uses an 8-bit immediate as a lookup table of four 2-bit values. Then we use the 2-bit FPCW encoding to index the lookup table by using a right shift and an AND. This requires extracting the 2-bit value from FPCW and multipying it by 2 to make it usable as a shift amount. But still results in less code. Differential Revision: https://reviews.llvm.org/D73599	2020-01-29 08:56:33 -08:00
Matt Arsenault	62129878a6	AMDGPU/GlobalISel: Fix tablegen selection for scalar bin ops Fixes selection for scalar G_SMULH/G_UMULH. Also switches to using tablegen selected add/sub, which switch to the signed version of the opcode. This matches the current DAG behavior. We can't drop the manual selection for add/sub yet, because it's still both for VALU add/sub and for G_PTR_ADD.	2020-01-29 08:55:54 -08:00
Kazushi (Jam) Marukawa	fef80a2946	[VE] (conditional) branch modification & isel patterns Summary: InstInfo for branch modification, (conditional) branch isel patterns and tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73632	2020-01-29 17:40:57 +01:00
Matt Arsenault	68b102b97a	AMDGPU: Directly select 16-bank LDS case of llvm.amdgcn.interp.p1.f16 Manually select this is as a tablegen workraound. Both SelectionDAG and GlobalISel end up misplacing the copy to m0 when both instructions in the output need it. Neither considers that both output instructions depend on m0. I don't know of any other pattern we need to handle this case, so it's less effort to just workaround this for now.	2020-01-29 08:24:31 -08:00
Matt Arsenault	96352e0a1b	AMDGPU/GlobalISel: Handle LDS with relocations case	2020-01-29 08:18:55 -08:00
Connor Abbott	87d98c1495	AMDGPU: Fix handling of infinite loops in fragment shaders Summary: Due to the fact that kill is just a normal intrinsic, even though it's supposed to terminate the thread, we can end up with provably infinite loops that are actually supposed to end successfully. The AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because there's no obvious place to make the loop branch to, it just makes it return immediately, which skips the exports that are supposed to happen at the end and hangs the GPU if all the threads end up being killed. While it would be nice if the fact that kill terminates the thread were modeled in the IR, I think that the structurizer as-is would make a mess if we did that when the kill is inside control flow. For now, we just add a null export at the end to make sure that it always exports something, which fixes the immediate problem without penalizing the more common case. This means that we sometimes do two "done" exports when only some of the threads enter the discard loop, but from tests the hardware seems ok with that. This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70781	2020-01-29 17:13:25 +01:00
Matt Arsenault	94e8ef4d4c	AMDGPU/GlobalISel: Look through copies for source modifiers When all VOP instructions are legalized to VGPRs, any SGPR source modifiers will have a copy in the way.	2020-01-29 08:08:13 -08:00
Stanislav Mekhanoshin	c2ad7ee1a9	[AMDGPU] override isHighLatencyDef SIMachineScheduler uses isHighLatencyInstruction with the same sematincs, but TargetInstrInfo has virtual isHighLatencyDef method, so override it instead. Added FLAT to the list of high latency opcodes and a check for mayLoad since stores are not technically high latency in terms of data dependency. This change did not produce any visible impact on our tests. Differential Revision: https://reviews.llvm.org/D73582	2020-01-29 08:01:29 -08:00
Connor Abbott	08b205bb48	Revert "AMDGPU: Fix handling of infinite loops in fragment shaders" This reverts commit `0994c485e6`.	2020-01-29 16:14:52 +01:00
Connor Abbott	13ab22ab22	Revert "AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns" This reverts commit `323bfde20c`.	2020-01-29 16:14:49 +01:00
Kazushi (Jam) Marukawa	0bec0e7151	[VE] udiv/sdiv/urem/srem/mul isel patterns Summary: udiv/sdiv/urem/srem/mul integer isel patterns and tests. Pretend for now that integer division were always cheap in HW. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73623	2020-01-29 15:59:50 +01:00
Matt Arsenault	02adfb5155	AMDGPU/GlobalISel: Manually select scalar f64 G_FNEG This should be no problem to support with a pattern, but it turns out there are just too many yaks to shave. The main problem is in the DAG emitter, which I have no desire to sink effort into fixing. If we had a bit to disable patterns in the DAG importer, fixing the GlobalISelEmitter is more manageable.	2020-01-29 06:49:16 -08:00
Connor Abbott	323bfde20c	AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns Summary: The code was assuming in a few places that if there was only one exit from the function that it was a normal return, which is invalid. It could be an infinite loop, in which case we still need to insert the usual fake edge so that the null export happens. This fixes shaders that end with an infinite loop that discards. Reviewers: arsenm, nhaehnle, critson Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71192	2020-01-29 15:08:46 +01:00
Connor Abbott	0994c485e6	AMDGPU: Fix handling of infinite loops in fragment shaders Summary: Due to the fact that kill is just a normal intrinsic, even though it's supposed to terminate the thread, we can end up with provably infinite loops that are actually supposed to end successfully. The AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because there's no obvious place to make the loop branch to, it just makes it return immediately, which skips the exports that are supposed to happen at the end and hangs the GPU if all the threads end up being killed. While it would be nice if the fact that kill terminates the thread were modeled in the IR, I think that the structurizer as-is would make a mess if we did that when the kill is inside control flow. For now, we just add a null export at the end to make sure that it always exports something, which fixes the immediate problem without penalizing the more common case. This means that we sometimes do two "done" exports when only some of the threads enter the discard loop, but from tests the hardware seems ok with that. This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70781	2020-01-29 15:08:46 +01:00
Sanne Wouda	2939fc13c8	[AArch64] Add IR intrinsics for sq(r)dmulh_lane(q) Summary: Currently, sqdmulh_lane and friends from the ACLE (implemented in arm_neon.h), are represented in LLVM IR as a (by vector) sqdmulh and a vector of (repeated) indices, like so: %shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3> %vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle) When %v's values are known, the shufflevector is optimized away and we are no longer able to select the lane variant of sqdmulh in the backend. This defeats a (hand-coded) optimization that packs several constants into a single vector and uses the lane intrinsics to reduce register pressure and trade-off materialising several constants for a single vector load from the constant pool, like so: int16x8_t v = {2,3,4,5,6,7,8,9}; a = vqdmulh_laneq_s16(a, v, 0); b = vqdmulh_laneq_s16(b, v, 1); c = vqdmulh_laneq_s16(c, v, 2); d = vqdmulh_laneq_s16(d, v, 3); [...] In one microbenchmark from libjpeg-turbo this accounts for a 2.5% to 4% performance difference. We could teach the compiler to recover the lane variants, but this would likely require its own pass. (Alternatively, "volatile" could be used on the constants vector, but this is a bit ugly.) This patch instead implements the following LLVM IR intrinsics for AArch64 to maintain the original structure through IR optmization and into instruction selection: - sqdmulh_lane - sqdmulh_laneq - sqrdmulh_lane - sqrdmulh_laneq. These 'lane' variants need an additional register class. The second argument must be in the lower half of the 64-bit NEON register file, but only when operating on i16 elements. Note that the existing patterns for shufflevector and sqdmulh into sqdmulh_lane (etc.) remain, so code that does not rely on NEON intrinsics to generate these instructions is not affected. This patch also changes clang to emit these IR intrinsics for the corresponding NEON intrinsics (AArch64 only). Reviewers: SjoerdMeijer, dmgreen, t.p.northover, rovka, rengolin, efriedma Reviewed By: efriedma Subscribers: kristof.beyls, hiraditya, jdoerfert, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71469	2020-01-29 13:25:23 +00:00

1 2 3 4 5 ...

55835 Commits