llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	057c9c7ee0	[X86][SSE] MatchVectorAllZeroTest - handle OR vector reductions This patch extends MatchVectorAllZeroTest to handle OR vector reduction patterns where the result is compared against zero. Fixes PR45378 Differential Revision: https://reviews.llvm.org/D81547	2020-06-16 09:42:34 +01:00
Fangrui Song	a3b5f428c1	[AArch64] Print the immediate operand for SPACE pseudo instruction Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D81814	2020-06-15 20:55:53 -07:00
Amara Emerson	1035a416a6	[AArch64][GlobalISel] Emit constant pool loads for 64 bit fp immediates. Note: don't do this for integer 64 bit materialization to match SDAG. Differential Revision: https://reviews.llvm.org/D81893	2020-06-15 20:53:09 -07:00
Qiu Chaofan	e62912b190	[LLParser] Delete temp CallInst when error occurs Only functions with floating-point return type accepts fast-math flags. When adding such flags to function returning integer, we'll see a crash, because there's still an undeleted value referencing the argument. This patch manually removes the temporary instruction when error occurs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D78355	2020-06-16 11:41:25 +08:00
Craig Topper	255d5dbae1	[X86] Add support for inline assembly 'x' constraint for i128. Limiting to x86-64 since that's when __int128 is legal in clang. Differential Revision: https://reviews.llvm.org/D81817	2020-06-15 19:34:02 -07:00
Jessica Paquette	5a4c3f6b06	[GlobalISel] Look through extends etc in CombinerHelper::matchConstantOp It's possible to end up with a zext or something in the way of a G_CONSTANT, even pre-legalization. This can happen with memsets. e.g. https://godbolt.org/z/Bjc8cw To make sure we can catch these cases, use `getConstantVRegValWithLookThrough` instead of `mi_match`. Differential Revision: https://reviews.llvm.org/D81875	2020-06-15 16:34:25 -07:00
Stanislav Mekhanoshin	9ee272f13d	[AMDGPU] Add gfx1030 target Differential Revision: https://reviews.llvm.org/D81886	2020-06-15 16:18:05 -07:00
Amara Emerson	fc905ae003	[GlobalISel] Don't emit multiply by magic constant for zero memset values.	2020-06-15 14:42:14 -07:00
Nick Desaulniers	2d8e105db6	[PPCAsmPrinter] support 'L' output template for memory operands Summary: L is meant to support the second word used by 32b calling conventions for 64b arguments. This is required for build 32b PowerPC Linux kernels after upstream commit 334710b1496a ("powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'") Thanks for the report from @nathanchance, and reference to GCC's implementation from @segher. Fixes: pr/46186 Fixes: https://github.com/ClangBuiltLinux/linux/issues/1044 Reviewers: echristo, hfinkel, MaskRay Reviewed By: MaskRay Subscribers: MaskRay, wuzish, nemanjai, hiraditya, kbarton, steven.zhang, llvm-commits, segher, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D81767	2020-06-15 14:31:44 -07:00
Jessica Paquette	7c93a19790	NFC: Remove disabled rule from postlegalizer-combiner-zip.mir test Apparently an x86 bot doesn't like the disabled rule in this test. http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/6569 Remove disabled rule and update the test to try and pacify the bot.	2020-06-15 13:15:02 -07:00
Jessica Paquette	3495b884de	[AArch64][GlobalISel] Add G_EXT and select ext using it Add selection support for ext via a new opcode, G_EXT and a post-legalizer combine which matches it. Add an `applyEXT` function, because the AArch64ext patterns require a register for the immediate. So, we have to create a G_CONSTANT to get these without writing new patterns or modifying the existing ones. Tests are the same as arm64-ext.ll. Also prevent ext from firing on the zip test. It has higher priority, so we don't want it potentially getting in the way of mask tests. Also fix up the shuffle-splat test, because ext is now selected there. The test was incorrectly regbank selected before, which could cause a verifier failure when you emit copies. Differential Revision: https://reviews.llvm.org/D81436	2020-06-15 12:20:59 -07:00
Matt Arsenault	1a7f115dce	AMDGPU/GlobalISel: Extend load/store workaround to i128 vectors	2020-06-15 14:55:11 -04:00
Matt Arsenault	362eedcbb4	AMDGPU/GlobalISel: Correct memory size in test	2020-06-15 14:12:28 -04:00
Craig Topper	d72cb4ce21	Recommit "[X86] Separate imm from relocImm handling." Fix the copy/paste mistake that caused it to fail previously	2020-06-15 10:59:43 -07:00
Jessica Paquette	1ac8451a9b	[GlobalISel] Simplify G_ADD when it has (0-X) on the LHS or RHS This implements the following combines: ((0-A) + B) -> B-A (A + (0-B)) -> A-B Porting over the basic algebraic combines from the DAGCombiner. There are several combines which fold adds away into subtracts. This is just the simplest one. I noticed that add combines are some of the most commonly hit across CTMark, (via print statements when they fire), so I'm porting over some of the obvious ones. This gives some minor code size improvements on CTMark at -O3 on AArch64. Differential Revision: https://reviews.llvm.org/D77453	2020-06-15 09:43:24 -07:00
Francesco Petrogalli	28a00ac9ba	[llvm][SVE] IR intrinsics for quadword permutation instructions. Summary: Adding intrinsics and codegen patterns for: * trn1 <Zd>.q, <Zm>.q, <Zn>.q * trn2 <Zd>.q, <Zm>.q, <Zn>.q * zip1 <Zd>.q, <Zm>.q, <Zn>.q * zip2 <Zd>.q, <Zm>.q, <Zn>.q * uzp1 <Zd>.q, <Zm>.q, <Zn>.q * uzp2 <Zd>.q, <Zm>.q, <Zn>.q These instructions are defined in Armv8.6-A. Reviewers: sdesmalen, efriedma, kmclaughlin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80850	2020-06-15 16:21:56 +00:00
Matt Arsenault	2ca552322c	AMDGPU/GlobalISel: Fix 8-byte aligned, 96-bit scalar loads These are legal since we can do a 96-bit load on some subtargets, but this is only for vector loads. If we can't widen the load, it needs to be broken down once known scalar. For 16-byte alignment, widen to a 128-bit load.	2020-06-15 11:33:16 -04:00
Wouter van Oortmerssen	d9e0bbd17b	[WebAssembly] Adding 64-bit versions of all load & store ops. Context: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md This is just a first step, adding the new instruction variants while keeping the existing 32-bit functionality working. Some of the basic load/store tests have new wasm64 versions that show that the basics of the target are working. Further features need implementation, but these will be added in followups to keep things reviewable. Differential Revision: https://reviews.llvm.org/D80769	2020-06-15 08:31:56 -07:00
Stefan Pintilie	57c9dc0521	[PowerPC] Do not add the relocation addend to the instruction encoding We should not be adding the relocation addend to the instruction encoding. This patch removes that and sets those bits to zero. Differential Revision: https://reviews.llvm.org/D81082	2020-06-15 09:51:34 -05:00
Simon Pilgrim	ae33cbc494	[X86][SSE] LowerVectorAllZeroTest - add support for >256-bit vectors Reduce by splitting the vector until we reach the target size for PTEST/MOVMSK_PCMPEQ. There might be some cases where AVX512 can perform this with 512-bit vectors but so far I haven't encountered any such pattern that reaches LowerVectorAllZeroTest. Prep work for D81547	2020-06-15 15:30:24 +01:00
Hans Wennborg	f47a776628	Revert "[X86] Separate imm from relocImm handling." > relocImm was a complexPattern that handled both ConstantSDNode > and X86Wrapper. But it was only applied selectively because using > it would cause patterns to be not importable into FastISel or > GlobalISel. So it only got applied to flag setting instructions, > stores, RMW arithmetic instructions, and rotates. > > Most of the test changes are a result of making patterns available > to GlobalISel or FastISel. The absolute-cmp.ll change is due to > this fixing a pattern ordering issue to make an absolute symbol > match to an 8-bit immediate before trying a 32-bit immediate. > > I tried to use PatFrags to reduce the repetition, but I was getting > errors from TableGen. This caused "Invalid EmitNode" assertions, see the llvm-commits thread for discussion.	2020-06-15 16:14:59 +02:00
Yvan Roux	ffe8f6d33b	[ARM][MachineOutliner] Fix no-lr-save testcase. Now that saving LR into a register is handled, some register constraints are needed to keep machine-outliner-no-lr-save.mir meaningful.	2020-06-15 16:09:31 +02:00
Yvan Roux	669066de65	[ARM][MachineOutliner] Add LR RegSave mode. Outline chunks of code which need to save and restore the link register when a spare register can be used to it. Differential Revision: https://reviews.llvm.org/D80127	2020-06-15 15:22:08 +02:00
Daniel Kiss	b8ae3fdfa5	[AArch64] Fix BTI instruction emission. Summary: SCTLR_EL1.BT[01] controls the PACI[AB]SP compatibility with PBYTE 11 (see [1]) This bit will be set to zero so PACI[AB]SP are equal to BTI C instruction only. [1] https://developer.arm.com/docs/ddi0595/b/aarch64-system-registers/sctlr_el1 Reviewers: chill, tamas.petz, pbarrio, ostannard Reviewed By: tamas.petz, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81746	2020-06-15 15:04:36 +02:00
Matt Arsenault	dae9554b2b	AMDGPU/GlobalISel: Workaround some load/store type selection patterns The logic is written for what loads/stores should be selectable. There are a set of cases that should be selectable, but due to missing MVTs and/or selection patterns, will fail to select. I think eventually load/store select patterns should ignore the type and only look at the value size, but until that happens, bitcast these to equivalent i32 vectors.	2020-06-15 07:42:20 -04:00
Matt Arsenault	96229606f9	AMDGPU/GlobalISel: Use less artifical example to avoid abort=0 These were failing due to an unlegalizable G_CONCAT_VECTORS due to registers with types that are naturally illegal.	2020-06-15 07:37:15 -04:00
Matt Arsenault	33e9086501	GlobalISel: Support lowering vector->vector G_BITCAST Extract subvectors and cast to the result element type before remerging.	2020-06-15 07:36:30 -04:00
Simon Pilgrim	298377f4b0	[X86][SSE] Add tests for and/or reduction results compared to zero These should fold to memcmp/ptest/movmsk+cmpeq patterns	2020-06-15 10:40:45 +01:00
Kazushi (Jam) Marukawa	e026f147f7	[VE] Support relocation information in MC layer Summary: Change VEAsmParser to support identification with relocation information in assmebler. Change VEAsmBackend to support relocation information in MC layer. Change VEDisassembler and VEMCCodeEmitter to support binary generation of branch target operands. Add REFLONG fixup and variant kind to support new R_VE_REFLONG ELF symbol. And, add regression test in both MC and CodeGen to check binary genaration with relocation information. Differential Revision: https://reviews.llvm.org/D81553	2020-06-15 11:24:53 +02:00
Dominik Montada	c87bf29149	[MachineVerifier][GlobalISel] Check that branches have a MBB operand or are declared indirect. Add missing properties to G_BRJT, G_BRINDIRECT Summary: Teach MachineVerifier to check branches for MBB operands if they are not declared indirect. Add `isBarrier`, `isIndirectBranch` to `G_BRINDIRECT` and `G_BRJT`. Without these, `MachineInstr.isConditionalBranch()` was giving a false-positive for those instructions. Reviewers: aemerson, qcolombet, dsanders, arsenm Reviewed By: dsanders Subscribers: hiraditya, wdng, simoncook, s.egerton, arsenm, rovka, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81587	2020-06-15 11:17:09 +02:00
Chen Zheng	bd7096b977	[PowerPC] fma chain break to expose more ILP This patch tries to reassociate two patterns related to FMA to expose more ILP on PowerPC. // Pattern 1: // A = FADD X, Y (Leaf) // B = FMA A, M21, M22 (Prev) // C = FMA B, M31, M32 (Root) // --> // A = FMA X, M21, M22 // B = FMA Y, M31, M32 // C = FADD A, B // Pattern 2: // A = FMA X, M11, M12 (Leaf) // B = FMA A, M21, M22 (Prev) // C = FMA B, M31, M32 (Root) // --> // A = FMUL M11, M12 // B = FMA X, M21, M22 // D = FMA A, M31, M32 // C = FADD B, D Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D80175	2020-06-15 00:00:04 -04:00
Chen Zheng	163162a0a4	[PowerPC] fold a bug for rlwinm folding when with full mask. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D81006	2020-06-14 21:27:01 -04:00
Simon Pilgrim	3d8149c2a1	[X86][SSE] Fold BITOP(MOVMSK(X),MOVMSK(Y)) -> MOVMSK(BITOP(X,Y)) Reduce XMM->GPR traffic by performing bitops on the vectors, and using a single MOVMSK call. This requires us to use vectors of the same size and element width, but we can mix fp/int type equivalents with suitable bitcasting.	2020-06-14 21:37:58 +01:00
Matt Arsenault	df0c4bfc95	AMDGPU: Add some baseline immediate encoding test changes Add some encoding checks and add a few new cases.	2020-06-14 13:29:35 -04:00
Matt Arsenault	804397dde6	AMDGPU: Do not bundle inline asm Fixes bug 46285	2020-06-14 13:24:50 -04:00
Matt Arsenault	fb51d508ee	AMDGPU/GlobalISel: Select general case for G_PTRMASK	2020-06-14 13:12:29 -04:00
Matt Arsenault	46579471fd	AMDGPU: Fix spill/restore of 192-bit registers I tried to use an IR inline asm test, but that doesn't work since the inline asm handling asserts without an MVT to use.	2020-06-14 13:12:01 -04:00
Simon Pilgrim	1c3d7709de	[X86][SSE] Add tests for missing BITOP(MOVMSK(X),MOVMSK(Y)) -> MOVMSK(BITOP(X,Y)) fold This would help reduce XMM->GPR traffic for some reduction cases.	2020-06-14 17:10:03 +01:00
Qiu Chaofan	13edcd696e	[PowerPC] Support constrained rounding operations This patch adds handling of constrained FP intrinsics about round, truncate and extend for PowerPC target, with necessary tests. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D64193	2020-06-14 23:43:31 +08:00
Qiu Chaofan	7315d221a2	[PowerPC] Exploit vnmsubfp instruction On PowerPC, we have vnmsubfp Altivec instruction for fnmsub operation on v4f32 type. Default pattern for this instruction never works since we don't have legal fneg for v4f32 when VSX disabled. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D80617	2020-06-14 23:19:17 +08:00
Qiu Chaofan	f8ef7c99a0	[DAGCombiner] Require ninf for division estimation Current implementation of division estimation isn't correct for some cases like 1.0/0.0 (result is nan, not expected inf). And this change exposes a potential infinite loop: we use isConstOrConstSplatFP in combineRepeatedFPDivisors to look up if the divisor is some constant. But it doesn't work after legalized on some platforms. This patch restricts the method to act before LegalDAG. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D80542	2020-06-14 22:58:22 +08:00
Simon Pilgrim	e0cff30c17	[X86][SSE] LowerVectorAllZeroTest - add support for pre-SSE41 targets Even without PTEST, we can still efficiently perform an OR reduction as PMOVMSKB(PCMPEQB(X,0)) == 0, avoiding xmm->gpr extractions.	2020-06-14 13:41:56 +01:00
Simon Pilgrim	a404bae288	[X86][SSE] Add non-SSE41 target PTEST tests Ensure codegen is still reasonable - ideally we'd make use of MOVMSK for this.	2020-06-14 12:23:10 +01:00
Craig Topper	0cbe713c69	[X86] Automatically harden inline assembly RET instructions against Load Value Injection (LVI) Previously, the X86AsmParser would issue a warning whenever a ret instruction is encountered. This patch changes the behavior to automatically transform each ret instruction in an inline assembly stream into: shlq $0, (%rsp) lfence ret which is secure, according to https://software.intel.com/security-software-guidance/insights/deep-dive-load-value-injection#specialinstructions. Patch by Scott Constable with some minor changes by Craig Topper.	2020-06-13 15:16:05 -07:00
Craig Topper	cb5072d187	[X86] Teach combineBitcastvxi1 to prefer movmsk on avx512 in more cases If the input to the bitcast is a sign bit test, it makes sense to directly use vpmovmskb or vmovmskps/pd. This removes the need to copy the sign bits to a k-register and then to a GPR. Fixes PR46200. Differential Revision: https://reviews.llvm.org/D81327	2020-06-13 14:50:13 -07:00
Craig Topper	93264a2e4f	[X86] Enable the EVEX->VEX compression pass at -O0. A lot of what EVEX->VEX does is equivalent to what the prioritization in the assembly parser does. When an AVX mnemonic is used without any EVEX features or XMM16-31, the parser will pick the VEX encoding. Since codegen doesn't go through the parser, we should also use VEX instructions when we can so that the code coming out of integrated assembler matches what you'd get from outputing an assembly listing and parsing it. The pass early outs if AVX isn't enabled and uses TSFlags to check for EVEX instructions before doing the more costly table lookups. Hopefully that's enough to keep this from impacting -O0 compile times.	2020-06-13 12:29:04 -07:00
Craig Topper	8885a7640b	[X86] Separate imm from relocImm handling. relocImm was a complexPattern that handled both ConstantSDNode and X86Wrapper. But it was only applied selectively because using it would cause patterns to be not importable into FastISel or GlobalISel. So it only got applied to flag setting instructions, stores, RMW arithmetic instructions, and rotates. Most of the test changes are a result of making patterns available to GlobalISel or FastISel. The absolute-cmp.ll change is due to this fixing a pattern ordering issue to make an absolute symbol match to an 8-bit immediate before trying a 32-bit immediate. I tried to use PatFrags to reduce the repetition, but I was getting errors from TableGen.	2020-06-13 11:29:28 -07:00
Amanieu d'Antras	6973125cb7	Fix FastISel dropping srcloc metadata from InlineAsm Summary: Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46060 I've also added the Extra_IsConvergent flag which was missing from FastISel. Reviewers: echristo Reviewed By: echristo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80759	2020-06-13 16:52:37 +01:00
Michael Liao	ec02635d10	[amdgpu] Skip OR combining on 64-bit integer before legalizing ops. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81710	2020-06-12 15:22:38 -04:00
Amara Emerson	1cbebd95de	[AArch64][GlobalISel] Legalize vector G_PTR_ADD and enable selection. Differential Revision: https://reviews.llvm.org/D81419	2020-06-12 11:25:17 -07:00

1 2 3 4 5 ...

34401 Commits