llvm-project

Commit Graph

Author	SHA1	Message	Date
Nicolai Haehnle	c4a2ff0950	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 344696	2018-10-17 15:37:30 +00:00
Sam Parker	2ef3c0dad6	[ARM] bottom-top mul support in ARMParallelDSP Previously reverted in rL343082. Original commit message: On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 344693	2018-10-17 13:02:48 +00:00
Petar Jovanovic	8a08412533	[MIPS GlobalISel] Legalize constants Legalize s1, s8, s16 and s64 G_CONSTANT for MIPS32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D53077 llvm-svn: 344684	2018-10-17 10:30:03 +00:00
Sjoerd Meijer	64cfb74a61	[ARM] Do not fuse VADD and VMUL, continued (2/2) This is patch 2/2, following up on D53314, and is the functional change to prevent fusing mul + add sequences into VFMAs. Differential revision: https://reviews.llvm.org/D53315 llvm-svn: 344683	2018-10-17 10:05:44 +00:00
Sjoerd Meijer	ff3ab33ec8	[ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2) This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs for performance reasons on the Cortex-M4 and Cortex-M33. This is a serie of 2 patches, that is trying to achieve the same for VFMA. The second column in the table below shows what we were generating before rL342874, the third column what changed with rL342874, and the last column what we want to achieve with these 2 patches: -------------------------------------------------------- \| Opt \| < rL342874 \| >= rL342874 \| \| \|------------------------------------------------------\| \|-O3 \| vmla \| vmul \| vmul \| \| \| \| vadd \| vadd \| \|------------------------------------------------------\| \|-Ofast \| vfma \| vfma \| vmul \| \| \| \| \| vadd \| \|------------------------------------------------------\| \|-Oz \| vmla \| vmla \| vmla \| -------------------------------------------------------- This patch 1/2, is a cleanup of the spaghetti predicate logic on the different VMLA and VFMA codegen rules, so that we can make the final functional change in patch 2/2. This also fixes a typo in the regression test added in rL342874. Differential revision: https://reviews.llvm.org/D53314 llvm-svn: 344671	2018-10-17 07:26:35 +00:00
Craig Topper	e0a992918b	[X86] Match (cmp (and (shr X, C), mask), 0) to BEXTR+TEST. Without this we match the CMP+AND to a TEST and then match the SHR separately. I'm trusting analyzeCompare to remove the TEST during the peephole pass. Otherwise we need to check the flag users to see if they only use the Z flag. This recovers a case lost by r344270. Differential Revision: https://reviews.llvm.org/D53310 llvm-svn: 344649	2018-10-16 22:29:36 +00:00
Krasimir Georgiev	547d824da6	Revert "[WebAssembly] LSDA info generation" This reverts commit r344575. Newly introduced test eh-lsda.ll.test fails with use-after-free under ASAN build. llvm-svn: 344639	2018-10-16 18:50:09 +00:00
Leonard Chan	699b3b54da	[Intrinsic] Signed Saturation Addition Intrinsic Add an intrinsic that takes 2 integers and perform saturation addition on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53053 llvm-svn: 344629	2018-10-16 17:35:41 +00:00
Aleksandar Beserminji	a5949439ca	[mips][micromips] Fix how values in .gcc_except_table are calculated When a landing pad is calculated in a program that is compiled for micromips, it will point to an even address. Such an error will cause a segmentation fault, as the instructions in micromips are aligned on odd addresses. This patch sets the last bit of the offset where a landing pad is, to 1, which will effectively be an odd address and point to the instruction exactly. Differential Revision: https://reviews.llvm.org/D52985 llvm-svn: 344591	2018-10-16 08:27:28 +00:00
Heejin Ahn	0981eaab47	[WebAssembly] LSDA info generation Summary: This adds support for LSDA (exception table) generation for wasm EH. Wasm EH mostly follows the structure of Itanium-style exception tables, with one exception: a call site table entry in wasm EH corresponds to not a call site but a landing pad. In wasm EH, the VM is responsible for stack unwinding. After an exception occurs and the stack is unwound, the control flow is transferred to wasm 'catch' instruction by the VM, after which the personality function is called from the compiler-generated code. (Refer to WasmEHPrepare pass for more information on this part.) This patch: - Changes wasm.landingpad.index intrinsic to take a token argument, to make this 1:1 match with a catchpad instruction - Stores landingpad index info and catch type info MachineFunction in before instruction selection - Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an exception table - Adds WasmException class with overridden methods for table generation - Adds support for LSDA section in Wasm object writer Reviewers: dschuff, sbc100, rnk Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52748 llvm-svn: 344575	2018-10-16 00:09:12 +00:00
Nicolai Haehnle	6d92ca61f9	StructurizeCFG,AMDGPU: Test case of a redundant phi and codegen consequences Change-Id: I9681f9e41ca30f82576f3d1f965c3a550a34b171 llvm-svn: 344569	2018-10-15 22:37:46 +00:00
Craig Topper	2909a3d9d0	[X86] Fix a bad bitcast in the load form of vXi16 uniform shift patterns for EVEX encoded instructions. llvm-svn: 344563	2018-10-15 21:51:32 +00:00
Craig Topper	548e8b5b47	[X86] Add test cases showing failure to fold load into vpsrlw when EVEX encoded instructions are used. There's a bad bitcast being used in the isel patterns for the vXi16 shift instructions. llvm-svn: 344562	2018-10-15 21:51:29 +00:00
Craig Topper	125c54ba10	[X86] Disable the peephole pass on avx2-intrinsics-x86.ll and avx512bw-intrinsics.ll to ensure any load folding tests are testing isel not load folding tables. llvm-svn: 344561	2018-10-15 21:51:26 +00:00
Craig Topper	da0e544953	[X86] Regenerate avx2-intrinsics-x86.ll to compress the 32 vs 64 bit mode checks. llvm-svn: 344560	2018-10-15 21:51:22 +00:00
Simon Pilgrim	095a7fe635	[AARCH64] Improve vector popcnt lowering with ADDLP AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors. This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258. Differential Revision: https://reviews.llvm.org/D53259 llvm-svn: 344554	2018-10-15 21:15:58 +00:00
Konstantin Zhuravlyov	94dfcc2eb2	AMDGPU: Generate .amdgcn_target for object code v3 Differential Revision: https://reviews.llvm.org/D53221 llvm-svn: 344552	2018-10-15 20:37:47 +00:00
Sanjay Patel	4cf1da0e02	[SelectionDAG] allow FP binops in SimplifyDemandedVectorElts This is intended to make the backend on par with functionality that was added to the IR version of SimplifyDemandedVectorElts in: rL343727 ...and the original motivation is that we need to improve demanded-vector-elements in several ways to avoid problems that would be exposed in D51553. Differential Revision: https://reviews.llvm.org/D52912 llvm-svn: 344541	2018-10-15 18:05:34 +00:00
Sanjay Patel	8bd74785f0	[DAGCombiner] allow undef elts in vector fmul matching llvm-svn: 344534	2018-10-15 16:54:07 +00:00
Sanjay Patel	7cf5733f7f	[AArch64] add tests for fmul x, -2.0 with undef elts; NFC Also, add tests with commuted operands. There was no coverage for that case. llvm-svn: 344531	2018-10-15 16:44:00 +00:00
Sanjay Patel	9e7e0fd828	[DAGCombiner] allow undef elts in vector fma matching llvm-svn: 344528	2018-10-15 15:56:39 +00:00
Sanjay Patel	475a53649e	[x86] add tests for fma with undef elts; NFC llvm-svn: 344527	2018-10-15 15:47:37 +00:00
Sanjay Patel	4e970ff022	[DAGCombiner] allow undef elts in vector fma matching llvm-svn: 344525	2018-10-15 15:38:38 +00:00
Sanjay Patel	b06ac18ee9	[x86] add tests for fma with undef elts; NFC llvm-svn: 344523	2018-10-15 15:28:44 +00:00
Simon Pilgrim	5abb607ebe	[ARM][NEON] Improve vector popcnt lowering with PADDL (PR39281) As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type. This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case...... Differential Revision: https://reviews.llvm.org/D53257 llvm-svn: 344512	2018-10-15 13:20:41 +00:00
Nicolai Haehnle	1bb242e201	AMDGPU: Test showing a scalar buffer load deficiency Change-Id: I5b64a565f22a8482aa0712488d85e45163ac3d12 llvm-svn: 344506	2018-10-15 11:37:04 +00:00
Bjorn Pettersson	064944352e	[TwoAddressInstructionPass] Replace subregister uses when processing tied operands Summary: TwoAddressInstruction pass typically rewrites %1:short = foo %0.sub_lo:long as %1:short = COPY %0.sub_lo:long %1:short = foo %1:short when having tied operands. If there are extra un-tied operands that uses the same reg and subreg, such as the second and third inputs to fie here: %1:short = fie %0.sub_lo:long, %0.sub_hi:long, %0.sub_lo:long then there was a bug which replaced the register %0 also for the un-tied operand, but without changing the subregister indices. So we used to get: %1:short = COPY %0.sub_lo:long %1:short = fie %1, %1.sub_hi:short, %1.sub_lo:short With this fix we instead get: %1:short = COPY %0.sub_lo:long %1:short = fie %1, %0.sub_hi:long, %1 Reviewers: arsenm, JesperAntonsson, kparzysz, MatzeB Reviewed By: MatzeB Subscribers: bjope, kparzysz, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D36224 llvm-svn: 344492	2018-10-15 08:36:03 +00:00
Craig Topper	b44b22c6fe	[X86] Autogenerate checks. NFC llvm-svn: 344490	2018-10-15 05:31:24 +00:00
Craig Topper	06aea1720a	[X86] Move promotion of vector and/or/xor from legalization to DAG combine Summary: I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that. This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better. In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53107 llvm-svn: 344487	2018-10-15 01:51:58 +00:00
Craig Topper	671779456a	[X86] Add 128 MOVDDUP to the constant pool printing in X86AsmPrinter::EmitInstruction. We use this instruction to broadcast a single 64-bit value to a v2i64/v2f64 vector. llvm-svn: 344486	2018-10-15 01:51:53 +00:00
Craig Topper	b5000974fe	[X86] Autogenerate complete checks. NFC llvm-svn: 344485	2018-10-15 01:51:50 +00:00
Simon Pilgrim	861cd0ba44	[X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 shuffle lowering Extends D53148 from v4f64 now that we have test coverage for v16i16/v32i8 shuffles. llvm-svn: 344481	2018-10-14 17:34:20 +00:00
Simon Pilgrim	9afb1e66e5	[ARM] Regenerate cttz tests Improve codegen view as part of PR32655 llvm-svn: 344479	2018-10-14 16:49:04 +00:00
Craig Topper	ec4b75f47a	[X86] Type legalize v2f32 stores by widening to v4f32, casting to v2f64, extracting f64 and storing. Summary: This is similar to what D52528 did for loads. It should match what generic type legalization does in 64-bit mode where it uses a v2i64 cast and an i64 store. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53173 llvm-svn: 344470	2018-10-14 03:36:27 +00:00
Simon Pilgrim	2ac03ec2c4	[AARCH64] Regenerate popcnt tests Improve codegen view as part of PR32655 llvm-svn: 344466	2018-10-13 21:50:15 +00:00
Simon Pilgrim	247ea88090	[ARM] Regenerate popcnt tests Improve codegen view as part of PR32655 llvm-svn: 344465	2018-10-13 21:32:49 +00:00
Craig Topper	189e5b4ab6	[LegalizeTypes] Prevent an assertion from PromoteIntRes_BSWAP and PromoteIntRes_BITREVERSE if the shift amount is too large for the VT returned by getShiftAmountTy Summary: getShiftAmountTy for X86 returns MVT::i8. If a BSWAP or BITREVERSE is created that requires promotion and the difference between the original VT and the promoted VT is more than 255 then we won't able to create the constant. This patch adds a check to replace the result from getShiftAmountTy to MVT::i32 if the difference won't fit. This should get legalized later when the shift is ultimately expanded since its clearly an illegal type that we're only promoting to make it a power of 2 bit width. Alternatively we could base the decision completely on the largest shift amount the promoted VT could use. Vectors should be immune here because getShiftAmountTy always returns the incoming VT for vectors. Only the scalar shift amount can be changed by the targets. Reviewers: eli.friedman, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53232 llvm-svn: 344460	2018-10-13 17:47:20 +00:00
Simon Pilgrim	1c6d320351	[X86][SSE] combineIncDecVector - use isConstantSplat Use isConstantSplat instead of ISD::isConstantSplatVector to let us us peek through to illegal types (in this case for i686 targets to recognise i64 constants) llvm-svn: 344452	2018-10-13 14:45:44 +00:00
Simon Pilgrim	f64e654d62	[X86][SSE] Improve CTTZ lowering when CTLZ is legal If we have better CTLZ support than CTPOP, then use cttz(x) = width - ctlz(~x & (x - 1)) - and remove the CTTZ_ZERO_UNDEF handling as it no longer gives better codegen. Similar to rL344447, this is also closer to LegalizeDAG's approach llvm-svn: 344448	2018-10-13 13:05:19 +00:00
Simon Pilgrim	afead139cf	[X86][SSE] Change CTTZ vector lowering to cttz(x) = ctpop(~x & (x - 1)) This patch changes the vector CTTZ lowering from: cttz(x) = ctpop((x & -x) - 1) to: cttz(x) = ctpop(~x & (x - 1)) Not only does this make better use of the PANDN instruction, but it also matches the LegalizeDAG method which should allow us to remove the x86 specific code at some point in the future (we need to fix some issues with the bitcasted logic ops and CTPOP lowering first). Differential Revision: https://reviews.llvm.org/D53214 llvm-svn: 344447	2018-10-13 12:12:06 +00:00
Simon Pilgrim	f3952413f7	[X86][AVX] Add lowerVectorShuffleAsLanePermuteAndPermute for v4f64 shuffles (PR39161) Add shuffle lowering for the case where we can shuffle the lanes into place followed by an in-lane permute. This is mainly for cases where we can have non-repeating permutes in each lane, but for now I've just enabled it for v4f64 unary shuffles to fix PR39161 - there is no test coverage for other shuffles that might benefit yet. We now have several cross-lane shuffle lowering methods that all do something similar - I've looked at merging some of these (notably by making the repeated mask mechanism in lowerVectorShuffleByMerging128BitLanes optional), but there is a lot of assertions/assumptions in the way that makes this tricky - I ended up going for adding yet another relatively simple method instead. Differential Revision: https://reviews.llvm.org/D53148 llvm-svn: 344446	2018-10-13 11:38:10 +00:00
Arnaud A. de Grandmaison	162435e7b5	[AArch64] Swap comparison operands if that enables some folding. Summary: AArch64 can fold some shift+extend operations on the RHS operand of comparisons, so swap the operands if that makes sense. This provides a fix for https://bugs.llvm.org/show_bug.cgi?id=38751 Reviewers: efriedma, t.p.northover, javed.absar Subscribers: mcrosier, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53067 llvm-svn: 344439	2018-10-13 07:43:56 +00:00
Thomas Lively	3afc346dd0	[WebAssembly] SIMD min and max Summary: Depends on D52324 and D52764. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52325 llvm-svn: 344438	2018-10-13 07:26:10 +00:00
Alex Bradbury	748d080e62	[RISCV] Eliminate unnecessary masking of promoted shift amounts SelectionDAGBuilder::visitShift will always zero-extend a shift amount when it is promoted to the ShiftAmountTy. This results in zero-extension (masking) which is unnecessary for RISC-V as the shift operations only read the lower 5 or 6 bits (RV32 or RV64). I initially proposed adding a getExtendForShiftAmount hook so the shift amount can be any-extended (D52975). @efriedma explained this was unsafe, so I have instead eliminate the unnecessary and operations at instruction selection time in a manner similar to X86InstrCompiler.td. Differential Revision: https://reviews.llvm.org/D53224 llvm-svn: 344432	2018-10-12 23:18:52 +00:00
Craig Topper	3e76b2d736	[X86] Improve type legalization of (v2i32/v4i16/v8i16 (bitcast (v2f32))) to avoid a stack stack temporary. llvm-svn: 344425	2018-10-12 22:00:04 +00:00
Craig Topper	435e38a5df	[LegalizeVectorTypes] When widening the result of a bitcast from a scalar type, use a scalar_to_vector to turn the scalar into a vector intead of a build vector full of mostly undefs. This is more consistent with what we usually do and matches some code X86 custom emits in some cases that I think I can cleanup. The MIPS test change just looks to be an instruction ordering change. llvm-svn: 344422	2018-10-12 21:59:55 +00:00
Craig Topper	1bb0c6041a	[LegalizeVectorTypes] When widening the operands to a concat_vectors, see if we can use the widened operand 0 if the width matches and the other operands are undef. This saves a conversion to extracts and build_vector. We already do this when both the result and the input need to be widened to the same type. This changed the sse-intrinsics-fast-isel test because we don't lower (insert_vector_elt (scalar_to_vector X), Y, 1) well. We turn it into (vector_shuffle (scalar_to_vector X), (scalar_to_vector Y), <0, 4, 2, 3>) losing track of the fact that the upper elts could be undef. We should probably find a way to prevent the scalarization of the <2 x f32> load on these tests. llvm-svn: 344404	2018-10-12 19:37:49 +00:00
Simon Pilgrim	1d6b938132	Regenerate test. NFCI. llvm-svn: 344399	2018-10-12 19:03:54 +00:00
Sanjay Patel	e28c8ecd72	[x86] add and use fast horizontal vector math subtarget feature This is the planned follow-up to D52997. Here we are reducing horizontal vector math codegen by default. AMD Jaguar (btver2) should have no difference with this patch because it has fast-hops. (If we want to set that bit for other CPUs, let me know.) The code changes are small, but there are many test diffs. For files that are specifically testing for hops, I added RUNs to distinguish fast/slow, so we can see the consequences side-by-side. For files that are primarily concerned with codegen other than hops, I just updated the CHECK lines to reflect the new default codegen. To recap the recent horizontal op story: 1. Before rL343727, we were producing hops for all subtargets for a variety of patterns. Hops were likely not optimal for all targets though. 2. The IR improvement in r343727 exposed a hole in the backend hop pattern matching, so we reduced hop codegen for all subtargets. That was bad for Jaguar (PR39195). 3. We restored the hop codegen for all targets with rL344141. Good for Jaguar, but probably bad for other CPUs. 4. This patch allows us to distinguish when we want to produce hops, so everyone can be happy. I'm not sure if we have the best predicate here, but the intent is to undo the extra hop-iness that was enabled by r344141. Differential Revision: https://reviews.llvm.org/D53095 llvm-svn: 344361	2018-10-12 16:41:02 +00:00
Nick Desaulniers	47bab69a2e	[MC][ELF] fix newly added test Summary: Reland of - r344197 "[MC][ELF] compute entity size for explicit sections" - r344206 "[MC][ELF] Fix section_mergeable_size.ll" after being reverted in r344278 due to build breakages from not specifying a target triple. Move test from test/CodeGen/Generic/ to test/MC/ELF/. Add explicit target triple so we don't try to run this test on non ELF targets. Reported: https://reviews.llvm.org/D53056#1261707 Reviewers: fhahn, rnk, espindola, NoQ Reviewed By: fhahn, rnk Subscribers: NoQ, MaskRay, rengolin, emaste, arichardson, llvm-commits, pirama, srhines Differential Revision: https://reviews.llvm.org/D53146 llvm-svn: 344360	2018-10-12 16:35:44 +00:00

1 2 3 4 5 ...

26207 Commits