llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	526b70a089	[X86] Use fsub in the movddup scheduling tests to prevent a future patch from folding movddup as a broadcast load. llvm-svn: 315767	2017-10-13 21:56:45 +00:00
Craig Topper	5d692917f4	[X86] Add initial skeleton support for knm cpu This adds Intel's Knights Mill CPU to valid CPU names for the backend. For now its an alias of "knl", but ultimately we need to support AVX5124FMAPS and AVX5124VNNIW instruction sets for it. Differential Revision: https://reviews.llvm.org/D38811 llvm-svn: 315722	2017-10-13 18:10:17 +00:00
Simon Pilgrim	c4977fa9a1	[X86] Test scalar integer absolutes on 32-bit targets with/without CMOV llvm-svn: 315711	2017-10-13 17:09:20 +00:00
Reid Kleckner	be3724b5e1	Not all buildbots seem to dump the nuw flag in SDAG llvm-svn: 315710	2017-10-13 17:00:49 +00:00
Simon Pilgrim	df9611e178	[X86] Updated scalar integer absolute tests to cover i8/i16/i32/i64 llvm-svn: 315706	2017-10-13 16:53:07 +00:00
Reid Kleckner	c687a34870	Update test to expect nuw flag in SDAG dump, fixes test after r315690 llvm-svn: 315698	2017-10-13 16:13:23 +00:00
Craig Topper	bf0de9d3b6	[X86] Remove patterns that select unmasked vbroadcastf2x32/vbroadcasti2x32. Prefer vbroadcastsd/vpbroadcastq instead. There's no advantage to using these instructions when they aren't masked. This enables some additional execution domain switching without needing to update the table. llvm-svn: 315674	2017-10-13 06:07:10 +00:00
Craig Topper	11655b22dc	[X86] Add the test case for r315613 that I forgot to 'git add'. llvm-svn: 315649	2017-10-13 00:20:47 +00:00
Craig Topper	3dc37cc592	[X86] Add a bunch of -mcpu strings to the cpus.ll test. We were missing most of the "core" aliases as well as skylake, cannonlake, and knights landing. llvm-svn: 315606	2017-10-12 18:55:57 +00:00
Wei Mi	1736efd16a	Revert r307036 because of PR34919. llvm-svn: 315540	2017-10-12 00:24:52 +00:00
Sanjay Patel	6c0aef77aa	[x86] avoid infinite loop from SoftenFloatOperand (PR34866) Legalization of fp128 assumes things that we should have asserts for, so that's another potential improvement. Differential Revision: https://reviews.llvm.org/D38771 llvm-svn: 315485	2017-10-11 18:24:21 +00:00
Sanjay Patel	34fd5eaaf0	[DAGCombiner] convert insertelement of bitcasted vector into shuffle Eg: insert v4i32 V, (v2i16 X), 2 --> shuffle v8i16 V', X', {0,1,2,3,8,9,6,7} This is a generalization of the IR fold in D38316 to handle insertion into a non-undef vector. We may want to abandon that one if we can't find value in squashing the more specific pattern sooner. We're using the existing legal shuffle target hook to avoid AVX512 horror with vXi1 shuffles. There may be room for improvement in the shuffle lowering here, but that would be follow-up work. Differential Revision: https://reviews.llvm.org/D38388 llvm-svn: 315460	2017-10-11 14:12:16 +00:00
Uriel Korach	782f28bf2f	[X86] Added tests for TESTM and TESTNM (NFC) Adding this test files now so after another commit that will add a new pattern for TESTM and TESTNM instructions will show the improvemnts that have been done. Change-Id: If3908b7f91897d764053312365a2bc1de78b291d llvm-svn: 315443	2017-10-11 08:39:25 +00:00
Craig Topper	6ce20bd184	[X86] Add 128-bit version of vbroadcasti32x2 to shuffle comment decoding. llvm-svn: 315395	2017-10-11 00:11:53 +00:00
Craig Topper	bb0e316dc7	[X86] Add broadcast patterns that allow a scalar_to_vector between the broadcast and the load. We already have these patterns for AVX512VL, but not AVX1 or 2. llvm-svn: 315382	2017-10-10 22:40:31 +00:00
Sanjay Patel	b74063d21f	[x86] fix prefix typos for CHECK lines; NFC llvm-svn: 315368	2017-10-10 21:12:47 +00:00
Adrian Prantl	16b8b47152	Debug Info: Fix the SDLoc propagation for a DAGCombiner rule This patch ensures that the rule: fold (zext (load x)) -> (zext (truncate (zextload x))) propagates the SDLoc of the load to the zextload. <rdar://problem/33755881> llvm-svn: 315340	2017-10-10 18:08:32 +00:00
Simon Pilgrim	053a299a9b	[X86][AVX512] Regenerate element insertion/extraction tests llvm-svn: 315322	2017-10-10 15:58:54 +00:00
Sanjay Patel	7d52c7ca74	[x86] add tests for insertelement; NFC llvm-svn: 315312	2017-10-10 13:45:25 +00:00
Gadi Haber	2b132eb4f8	[X86][SKYLAKE] Update regression test to differentiate between HASWELL and SKYLAKE scheduling.<NFC> NFC. Updated 6 regression tests to differentiate between HASWELL and SKYLAKE scheduling information. The fix is in preparation of a patch to update the information of the Skylake Client scheduling to include the appropriate load and store latencies. Reviewers: zvi, RKSimon Differential Revision: https://reviews.llvm.org/D38685 Change-Id: Ifc6b98d9eaf266913698f24c766fd994fc977555 llvm-svn: 315291	2017-10-10 09:53:18 +00:00
Craig Topper	a88306e6fb	[AVX512] Add patterns to commute integer comparison instructions during isel. This enables broadcast loads to be commuted and allows normal loads to be folded without the peephole pass. llvm-svn: 315274	2017-10-10 06:36:46 +00:00
Reid Kleckner	ab23dace56	[MC] Suppress .Lcfi labels when emitting textual assembly Summary: This suppresses the generation of .Lcfi labels in our textual assembler. It was annoying that this generated cascading .Lcfi labels: llc foo.ll -o - \| llvm-mc \| llvm-mc After three trips through MCAsmStreamer, we'd have three labels in the output when none are necessary. We should only bother creating the labels and frame data when making a real object file. This supercedes D38605, which moved the entire .seh_ implementation into MCObjectStreamer. This has the advantage that we do more checking when emitting textual assembly, as a minor efficiency cost. Outputting textual assembly is not performance critical, so this shouldn't matter. Reviewers: majnemer, MatzeB Subscribers: qcolombet, nemanjai, javed.absar, eraman, hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D38638 llvm-svn: 315259	2017-10-10 00:57:36 +00:00
Aditya Nandakumar	c3bfc81a1f	[GISel]: Fix generation of illegal COPYs during CallLowering We end up creating COPY's that are either truncating/extending and this should be illegal. https://reviews.llvm.org/D37640 Patch for X86 and ARM by igorb, rovka llvm-svn: 315240	2017-10-09 20:07:43 +00:00
Zvi Rackover	c1d5955684	[X86] Unsigned saturation subtraction canonicalization [the backend part] Summary: On behalf of julia.koval@intel.com The patch transforms canonical version of unsigned saturation, which is sub(max(a,b),a) or sub(a,min(a,b)) to special psubus insturuction on targets, which support it(8bit and 16bit uints). umax(a,b) - b -> subus(a,b) a - umin(a,b) -> subus(a,b) There is also extra case handled, when right part of sub is 32 bit and can be truncated, using UMIN(this transformation was discussed in https://reviews.llvm.org/D25987). The example of special case code: ``` void foo(unsigned short p, int max, int n) { int i; unsigned m; for (i = 0; i < n; i++) { m = --p; p = (unsigned short)(m >= max ? m-max : 0); } } ``` Max in this example is truncated to max_short value, if it is greater than m, or just truncated to 16 bit, if it is not. It is vaid transformation, because if max > max_short, result of the expression will be zero. Here is the table of types, I try to support, special case items are bold: \| Size \| 128 \| 256 \| 512 \| ----- \| ----- \| ----- \| ----- \| i8 \| v16i8 \| v32i8 \| v64i8 \| i16 \| v8i16 \| v16i16 \| v32i16 \| i32 \| \| v8i32* \| v16i32 \| i64 \| \| \| v8i64 Reviewers: zvi, spatel, DavidKreitzer, RKSimon Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37534 llvm-svn: 315237	2017-10-09 20:01:10 +00:00
Sanjay Patel	2a61a821a0	[DAG] combine assertsexts around a trunc This was a suggested follow-up to: D37017 / https://reviews.llvm.org/rL313577 llvm-svn: 315206	2017-10-09 15:22:20 +00:00
Sanjay Patel	8557e29408	[x86] regenerate test checks; NFC llvm-svn: 315204	2017-10-09 15:01:58 +00:00
Craig Topper	4f8656a7af	[X86] Enable extended comparison predicate support for SETUEQ/SETONE when targeting AVX instructions. We believe that despite AMD's documentation, that they really do support all 32 comparision predicates under AVX. Differential Revision: https://reviews.llvm.org/D38609 llvm-svn: 315201	2017-10-09 01:05:15 +00:00
Simon Pilgrim	135a2639f4	[X86][SSE] Add test case for PR27708 llvm-svn: 315186	2017-10-08 19:18:10 +00:00
Craig Topper	977c546b0c	[X86] Regenerate fast-isel-select-pseudo-cmov.ll to prepare for D38609. llvm-svn: 315184	2017-10-08 17:54:50 +00:00
Simon Pilgrim	dc32c844f9	[X86] getTargetConstantBitsFromNode - add support for decoding scalar constants llvm-svn: 315182	2017-10-08 17:21:18 +00:00
Craig Topper	c97775c03c	[X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI versions of scalar arithmetic patterns Summary: We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD. Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary. Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd. This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type. The rest of the patch is removing all the excess scalar patterns. Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38023 llvm-svn: 315181	2017-10-08 16:57:23 +00:00
Simon Pilgrim	6410ca70aa	[X86][XOP] Add XOP oddshuffles tests XOP codegen is often different to generic AVX - thank you vpperm! llvm-svn: 315176	2017-10-08 12:58:15 +00:00
Gadi Haber	684944b822	[X86][SKX] Adding the scheduling information for the SKX target. Adding the scheduling information for the SkylakeServer (SKX) target. This patch adds the instruction scheduling information for the SkylakeServer (SKX) architecture target by adding the file X86SchedSkylakeServer.td located under the X86 Target. We used the scheduling information retrieved from the Skylake architects in order to create the file. The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction. The patch continues the scheduling replacement and insertion effort started with the SNB target in r310792, the HSW target in r311879 and the SkylakeClient (SKL) target in rL313613. Please expect some performance fluctuations due to code alignment effects. Reviewers: zvi, RKSimon, craig.topper, chandlerc, aymanmu Differential Revision: https://reviews.llvm.org/D38443 Change-Id: I5c228fcc09e9e5a99b6116e62b356c4f9b971185 llvm-svn: 315175	2017-10-08 12:52:54 +00:00
Craig Topper	bbca2f2978	[X86] Stop LowerSIGN_EXTEND_AVX512 from creating v8i16/v16i16/v16i8 vselects with a v8i1/v16i1 condition when BWI is not available. Some of the tests in vector-shuffle-v1.ll would get into an infinite loop without this. llvm-svn: 315172	2017-10-08 08:50:59 +00:00
Craig Topper	27170fee8d	[X86] If we see an insert of a bitcast into zero vector, canonicalize it to move the bitcast to the other side of the insert. This improves detection of zeroing of upper bits during isel. llvm-svn: 315161	2017-10-08 01:33:41 +00:00
Simon Pilgrim	9508fe7924	[X86][SSE] Match bitcasted BUILD_VECTOR of constants for v2i64 shifts on 64-bit targets (PR34855) Extension to rL315155, generate constant shifts on 64-bits as well as 32-bits. llvm-svn: 315156	2017-10-07 17:57:22 +00:00
Simon Pilgrim	70e1db78db	[X86][SSE] Match bitcasted v4i32 BUILD_VECTORS for v2i64 shifts on 64-bit targets (PR34855) We were already doing this for 32-bit targets, but we can generate these on 64-bits as well. llvm-svn: 315155	2017-10-07 17:42:17 +00:00
Craig Topper	2f60295364	[X86] Add X86ISD::CMOV to computeKnownBitsForTargetNode and ComputeNumSignBitsForTargetNode. Summary: Implementations based on ISD::SELECT. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38663 llvm-svn: 315153	2017-10-07 16:51:19 +00:00
Simon Pilgrim	73f143e774	[X86][SSE] Improve shuffling combining with horizontal operations Recognise cases when we can merge the shuffles with their horizontal (HADD/HSUB/PACK) instruction inputs. Replaces an older implementation which performed some of this during lowering, expanding an existing target shuffle combine stage instead. Differential Revision: https://reviews.llvm.org/D38506 llvm-svn: 315150	2017-10-07 12:42:23 +00:00
Cameron McInally	9d64101fe8	[AVX512] Fix TERNLOG when folding broadcast Patch to fix ternlog instructions with a folded broadcast. The broadcast decorator, e.g. {1toX}, was missing. Differential Revision: https://reviews.llvm.org/D38649 llvm-svn: 315122	2017-10-06 22:31:29 +00:00
Simon Pilgrim	a29dbdf2ca	[X86][SSE] Add SKX cpu tests to SSE/AVX scheduling tests (D38443) llvm-svn: 315061	2017-10-06 13:40:29 +00:00
Artur Pilipenko	7b15254c8f	[X86] Fix chains update when lowering BUILD_VECTOR to a vector load The code which lowers BUILD_VECTOR of consecutive loads into a single vector load doesn't update chains properly. As a result the vector load can be reordered with the store to the same location. The current code in EltsFromConsecutiveLoads only updates the chain following the first load. The fix is to update the chains following all the loads comprising the vector. This is a fix for PR10114. Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D38547 llvm-svn: 314988	2017-10-05 16:28:21 +00:00
Simon Pilgrim	9edbe110e8	[X86][AVX] Improve (i8 bitcast (v8i1 x)) handling for v8i64/v8f64 512-bit vector compare results. AVX1/AVX2 targets were missing a chance to use vmovmskps for v8f32/v8i32 results for bool vector bitcasts llvm-svn: 314921	2017-10-04 18:00:42 +00:00
Hans Wennborg	2a6c9adb2f	Revert r314886 "[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)" It broke the Chromium / SQLite build; see PR34830. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy > > Reviewed By: lsaba > > Subscribers: jmolloy, spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 314919	2017-10-04 17:54:06 +00:00
Simon Pilgrim	b47b3f2564	[X86][SSE] Add support for lowering v8i16 binary shuffles to PACKSS/PACKUS Missed in D38472 llvm-svn: 314916	2017-10-04 17:31:28 +00:00
Craig Topper	6fb55716e9	[X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input instead of FR32/FR64 This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted. This should fix PR33079 Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results. Differential Revision: https://reviews.llvm.org/D38449 llvm-svn: 314914	2017-10-04 17:20:12 +00:00
Simon Pilgrim	bd5d2f0284	[X86][SSE] Add support for lowering unary shuffles to PACKSS/PACKUS Extension to D38472 llvm-svn: 314901	2017-10-04 13:12:08 +00:00
Jatin Bhateja	3c29bacd43	[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.) Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy Reviewed By: lsaba Subscribers: jmolloy, spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 314886	2017-10-04 09:02:10 +00:00
Martin Storsjo	e14145dcb0	[X86] Fix using the SJLJ jump table on x86_64 The previous version didn't work if the jump table base address didn't fit in 32 bit, since it was encoded as an immediate offset. And in case the jump table is encoded as 32 bit label differences, we need to load and add them to the table base first. This solves the first half of the issues mentioned in PR34720. Also fix some of the errors pointed out by -verify-machineinstrs, by using GR32_NOSPRegClass. Differential Revision: https://reviews.llvm.org/D38333 llvm-svn: 314876	2017-10-04 05:12:10 +00:00
Simon Pilgrim	261a6c29ea	[X86] Add non-SSE tests for PR15215 as well llvm-svn: 314815	2017-10-03 17:04:36 +00:00
Geoff Berry	fabedbad11	Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" This reverts commit r314729. Another bug has been encountered in an out-of-tree target reported by Quentin. llvm-svn: 314814	2017-10-03 16:59:13 +00:00
Simon Pilgrim	46a804cfd9	[X86][SSE] Add bool vector extraction test cases from PR15215 llvm-svn: 314813	2017-10-03 16:56:57 +00:00
Simon Pilgrim	cf99d069c3	[X86][SSE] Add support for decoding PACKSS/PACKUS shuffles masks with UNDEF llvm-svn: 314792	2017-10-03 12:41:39 +00:00
Simon Pilgrim	f5f291d129	[X86][SSE] Add support for lowering shuffles to PACKSS/PACKUS If the upper bits of a truncation shuffle patterns have at least the minimum number of sign/zero bits on their inputs then we can safely use PACKSS/PACKUS as shuffles. Partial fix for https://bugs.llvm.org/show_bug.cgi?id=34773 Differential Revision: https://reviews.llvm.org/D38472 llvm-svn: 314788	2017-10-03 12:01:31 +00:00
Simon Pilgrim	640fbf5132	[X86][SSE] Add support for shuffle combining from PACKSS/PACKUS Mentioned in D38472 llvm-svn: 314777	2017-10-03 09:54:03 +00:00
Simon Pilgrim	19d535e75b	[X86][SSE] Add support for PACKSS/PACKUS constant folding Pulled out of D38472 llvm-svn: 314776	2017-10-03 09:41:00 +00:00
Martin Storsjo	1e54738676	[X86] Provide the LSDA pointer with RIP relative addressing if necessary This makes sure the LSDA pointer isn't truncated to 32 bit. Make LowerINTRINSIC_WO_CHAIN a member function instead of a static function, so that it can use the getGlobalWrapperKind method. This solves the second half of the issues mentioned in PR34720. Differential Revision: https://reviews.llvm.org/D38343 llvm-svn: 314767	2017-10-03 06:29:58 +00:00
Geoff Berry	bfc5fb4571	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues addressed since original review: - Avoid bug in regalloc greedy/machine verifier when forwarding to use in an instruction that re-defines the same virtual register. - Fixed bug when forwarding to use in EarlyClobber instruction slot. - Fixed incorrect forwarding to register definitions that showed up in explicit_uses() iterator (e.g. in INLINEASM). - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 314729	2017-10-02 22:01:37 +00:00
Simon Pilgrim	65bde8806f	[X86][SSE] Add PACKSS/PACKUS constant folding tests llvm-svn: 314682	2017-10-02 15:43:26 +00:00
Simon Pilgrim	ec528b2cb6	Regenerate test (missing broadcast constant comments). NFCI. Still avoiding the floating point comments to prevent linux/windows discrepancies. llvm-svn: 314681	2017-10-02 15:22:35 +00:00
Simon Pilgrim	050f60370c	Regenerate test (missing broadcast constant comments). NFCI. llvm-svn: 314680	2017-10-02 15:21:14 +00:00
Simon Pilgrim	037f6d10d5	Regenerate test. NFCI. llvm-svn: 314679	2017-10-02 15:16:30 +00:00
Michael Zuckerman	e4084f6bdb	[X86][LLVM]Expanding Supports lowerInterleaved{store\|load}() in X86InterleavedAccess (VF64 stride 3-4) I continue to support different VF interleaved and in this pass for this patch, I added the vf64 stride3 support for both load and store. I also added support fot the stride4 store. Reviewers: 1. zvi 2. dorit 3. igorb 4. guyblank Differential Revision: https://reviews.llvm.org/D37687 Change-Id: I3d238efedf217d1768b348d710de1efa2f19d27b llvm-svn: 314651	2017-10-02 07:35:25 +00:00
Craig Topper	c20b46da2f	[X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMem Summary: Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable. For the register&register form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register&register form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem. I believe this supercedes D38025 which was trying to switch the register&register form back to pre-PR22995. Reviewers: aymanmus, RKSimon, zvi Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38120 llvm-svn: 314639	2017-10-01 23:53:53 +00:00
Simon Pilgrim	df23a2700d	[X86][SSE] Add faux shuffle combining support for PACKUS llvm-svn: 314631	2017-10-01 18:43:48 +00:00
Simon Pilgrim	4f255ad6a0	[X86][AVX2] Simplify PACKUS combine test Trying to use a AND mask is tricky as after legalization its nigh impossible for computeKnownBits to do anything with it llvm-svn: 314630	2017-10-01 18:17:39 +00:00
Simon Pilgrim	836fa6dcfd	[X86][SSE] Improve shuffle combining of PACKSS instructions. Support unary packing and fix the faux shuffle mask for vectors larger than 128 bits. llvm-svn: 314629	2017-10-01 17:54:55 +00:00
Simon Pilgrim	d25c200cd6	[X86][SSE] Add shuffle combining tests with PACKSS/PACKUS llvm-svn: 314628	2017-10-01 17:30:44 +00:00
Jina Nahias	98c7f91e54	pre-commit adding test for broadcastm pattern Differential Revision: https://reviews.llvm.org/D38312 Change-Id: Ifbc4189549f2f59995019a86f85f989c04e4d37d llvm-svn: 314626	2017-10-01 14:25:21 +00:00
Michael Zuckerman	1746895490	Adding test for interleved, case stride 4 vf64 store<NFC>. Change-Id: I9ea62aac81b763c83d26613dca6fcd846997a017 llvm-svn: 314621	2017-10-01 09:37:38 +00:00
Xin Tong	bffac0eb81	Fix typo. NFC llvm-svn: 314615	2017-10-01 00:10:52 +00:00
Xin Tong	c063c3f09d	Revert "Fix typo [NFC]" This reverts commit e60b5028619be1c81bd039d63a0627dac32d38f9. Incorrectly include changes that are not typo fix. llvm-svn: 314614	2017-10-01 00:09:53 +00:00
Xin Tong	efec219e1b	Fix typo [NFC] llvm-svn: 314613	2017-10-01 00:07:24 +00:00
Simon Pilgrim	1fffcc4580	Regenerate mul combine tests to update broadcast comment. llvm-svn: 314607	2017-09-30 22:27:46 +00:00
Simon Pilgrim	a8dd6f4f30	[X86][SSE] Fold (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1 Remove sign extend in register style pattern if the sign is already extended enough llvm-svn: 314599	2017-09-30 17:57:34 +00:00
Craig Topper	619569841a	[AVX-512] Add patterns to make fp compare instructions commutable during isel. llvm-svn: 314598	2017-09-30 17:02:39 +00:00
Simon Pilgrim	5bd43bce07	[X86][SSE] Add vector truncation cases inspired by PR34773 We should be using PACKSS/PACKUS more aggressively when we know the state of the upper bits llvm-svn: 314597	2017-09-30 16:14:59 +00:00
Michael Zuckerman	b92b6d424f	Code refactoring for the interleaved code <NFC> Change-Id: I7831c9febad8e14278a5bc87584a0053dc837be1 llvm-svn: 314596	2017-09-30 14:55:03 +00:00
Gadi Haber	c3b33f0f0d	[X86][SKX] Added codegen regression test for avx512 instructions scheduling.NFC. NFC. Added code gen regression tests for avx512 instructions scheduling called avx512-schedule.ll and avx512-shuffle-schedule.ll. This patch is in preparation of a larger patch of adding all SKX instruction scheduling and therefore the scheduling for the avx512 instructions are still missing. Reviewers: zvi, delena, RKSimon, igorb Differential Revision: https://reviews.llvm.org/D38035 Change-Id: I792762763127a921b9e13684b58af03646536533 llvm-svn: 314594	2017-09-30 14:30:23 +00:00
Craig Topper	d92ade96f4	[X86] Support v64i8 mulhu/mulhs Implemented by splitting into two v32i8 mulhu/mulhs and concatenating the results. Differential Revision: https://reviews.llvm.org/D38307 llvm-svn: 314584	2017-09-30 04:21:46 +00:00
Amara Emerson	7d6c55f8aa	[X86] Improve codegen for inverted overflow checking intrinsics. Adds a new combine for: xor(setcc cc, val), 1 --> setcc (invert(cc), val) Differential Revision: https://reviews.llvm.org/D38161 llvm-svn: 314514	2017-09-29 13:53:44 +00:00
Simon Pilgrim	2b96841d1d	[X86][SSE] Added more tests for vector multiplications as utility for D37896 Added additional tests for vector multiplications with multipliers that are: * powers of 2 displaced by 1, * product of a power of 2 displaced by one with another power of 2. Patch by @pacxx (Michael Haidl) Differential Revision: https://reviews.llvm.org/D38350 llvm-svn: 314504	2017-09-29 10:02:01 +00:00
Craig Topper	6255c7b675	[X86] Don't select (cmp (and, imm), 0) to testw Summary: X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters. I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs. Reviewers: RKSimon, zvi, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38273 llvm-svn: 314474	2017-09-28 23:35:36 +00:00
Sanjay Patel	4664d77316	[x86] add tests for possible insertelement to shuffle transform; NFC See PR34716 and D38316 for more discussion. llvm-svn: 314466	2017-09-28 22:27:25 +00:00
Craig Topper	ed19350293	[X86] Make use of vpmovwb when possible in LowerMULH If we have BWI, we can truncate in a much simpler way by using vpmovwb. This even works without VLX by using the wider zmm->ymm truncate with a subvector extract. Differential Revision: https://reviews.llvm.org/D38375 llvm-svn: 314457	2017-09-28 20:10:34 +00:00
Matthias Braun	5c3e8a450e	MIR: Serialize CaleeSavedInfo Restored flag llvm-svn: 314449	2017-09-28 18:52:14 +00:00
Craig Topper	56bfbfb117	[AVX512] Add avx512bw command lines to 128-bit idiv tests. The multiply lowering on some of the tests can take advantage of the vpmovwb to simplify the truncate. llvm-svn: 314448	2017-09-28 18:45:29 +00:00
Craig Topper	ceff6da6e9	[X86] Use BWI instructions to improve lowering of v32i8 MULHU/S Summary: If we have BWI instructions we can widen to v32i16 to do the multiply instead of splitting. Reviewers: RKSimon, spatel, zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38305 llvm-svn: 314432	2017-09-28 17:00:21 +00:00
Craig Topper	71a8cf9f99	[X86] Use correct subvector index when combining two insert subvectors featuring zero vectors. Previously we were using one of the subvector indices twice. The included test case causes an assert without this change. Thanks to Simon Pilgrim for catching this. llvm-svn: 314429	2017-09-28 16:53:16 +00:00
Amara Emerson	bb16282fb1	[X86] Add overflow intrinsic test in preparation for D38161. This commit adds the test file before codegen changes as requested in D38161 to make it easier to see the difference. llvm-svn: 314416	2017-09-28 13:43:48 +00:00
Jatin Bhateja	75001c9ed8	[X86] Adding more cases to horizontal [f]add/[f]sub for avx512. Reviewers: jbhateja Reviewed By: jbhateja Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38344 llvm-svn: 314385	2017-09-28 07:40:52 +00:00
Jessica Paquette	4cf187b5b4	[MachineOutliner] AArch64: Avoid saving + restoring LR if possible This commit allows the outliner to avoid saving and restoring the link register on AArch64 when it is dead within an entire class of candidates. This introduces changes to the way the outliner interfaces with the target. For example, the target now interfaces with the outliner using a MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and getOutliningFrameOverhead. This also improves several comments on the outliner's cost model. https://reviews.llvm.org/D36721 llvm-svn: 314341	2017-09-27 20:47:39 +00:00
Craig Topper	c16a472966	Revert r314249 "Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST.""" This caused PR34751 llvm-svn: 314339	2017-09-27 20:34:17 +00:00
Than McIntosh	dee2cf67ea	[CodeGen] Emit necessary .note sections for -fsplit-stack Summary: According to https://gcc.gnu.org/wiki/SplitStacks, the linker expects a zero-sized .note.GNU-split-stack section if split-stack is used (and also .note.GNU-no-split-stack section if it also contains non-split-stack functions), so it can handle the cases where a split-stack function calls non-split-stack function. This change adds the sections if needed. Fixes PR #34670. Reviewers: thanm, rnk, luqmana Reviewed By: rnk Subscribers: llvm-commits Patch by Cherry Zhang <cherryyz@google.com> Differential Revision: https://reviews.llvm.org/D38051 llvm-svn: 314335	2017-09-27 19:34:00 +00:00
Craig Topper	05f71dd036	[X86] In combineLoopSADPattern, pad result with zeros and use full size add instead of using a smaller add and inserting. In some cases the result psadbw is smaller than the type of the add that started the match. Currently in these cases we are using a smaller add and inserting the result. If we instead combine the psadbw with zeros and use the full size add we can take advantage of implicit zeroing we get if we emit a narrower move before the add. In a future patch, I want to make isel aware that the psadbw itself already zeroed the upper bits and remove the move entirely. Differential Revision: https://reviews.llvm.org/D37453 llvm-svn: 314331	2017-09-27 18:36:45 +00:00
Gadi Haber	87337a2bb9	[X86][SKX][KNL] Updated regression tests to use -mattr instead of -mcpu flag.NFC. NFC. Updated 8 regression tests to use -mattr instead of -mcpu flag as follows: -mcpu=knl --> -mattr=+avx512f -mcpu=skx --> -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq The updates are as part of the preparation of a large commit to add all instruction scheduling for the SKX target. Reviewers: delena, zvi, RKSimon Differential Revision: https://reviews.llvm.org/D38222 Change-Id: I2381c9b5bb75ecacfca017243c22d054f6eddd14 llvm-svn: 314306	2017-09-27 14:44:15 +00:00
Zvi Rackover	eb7a0bf847	X86 Tests: Unsigned saturation subtraction tests. NFC. Summary: Adding tests for D37534. Commit on behalf of julia.koval@intel.com Reviewers: n.bozhenov, zvi, spatel, DavidKreitzer Reviewed By: zvi Differential Revision: https://reviews.llvm.org/D37510 llvm-svn: 314305	2017-09-27 14:38:05 +00:00
Simon Pilgrim	3b0d9e789e	[X86][AVX] Improve (i4 bitcast (v4i1 x)) handling for 256-bit vector compare results. As commented on D37849 and rL313547, AVX1 targets were missing a chance to use vmovmskpd for v4f64/v4i64 results for bool vector bitcasts llvm-svn: 314293	2017-09-27 10:10:17 +00:00
Martin Storsjo	aa1533bf9b	[X86] Fix SJLJ struct offsets for x86_64 This is necessary, but not sufficient, for having working SJLJ exception handling on x86_64. Differential Revision: https://reviews.llvm.org/D38254 llvm-svn: 314277	2017-09-27 06:08:23 +00:00
Martin Storsjo	eccaf04e40	[X86] Remove erroneous callsite offsetting in SJLJ landing pads The callsite value is already stored indexed from 0 in the _Unwind_Context struct. When accessed via the functions _Unwind_GetIP and _Unwind_SetIP, the value is indexed from 1, but those functions handle the offseting. When reading directly from the struct here, we shouldn't subtract 1. This matches the code generated by the ARM target, where SJLJ exception handling is used by default on iOS. This makes clang-built object files for 32 bit x86 mingw work when linked with libgcc/libstdc++. Differential Revision: https://reviews.llvm.org/D38251 llvm-svn: 314276	2017-09-27 06:08:16 +00:00
Martin Storsjo	233349fe51	[X86] Correct byte offsets and data types in a comment. NFC. This matches the types of the struct members defined in lib/CodeGen/SjLjEHPrepare.cpp, and the definition of this struct in libgcc. Differential Revision: https://reviews.llvm.org/D38248 llvm-svn: 314275	2017-09-27 06:08:04 +00:00
Craig Topper	177a3923ce	[X86] Use extract128BitVector in LowerMULH so we can extract from constant build vectors. llvm-svn: 314274	2017-09-27 06:04:55 +00:00
Craig Topper	31ccd4727b	[X86] Add avx512bw command lines to the 256-bit vector idiv tests. Some of the operations are being sign extended to 512 bits with avx512bw. llvm-svn: 314272	2017-09-27 05:17:15 +00:00
Craig Topper	7f0eeb428b	Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST."" The late MOV8rr_NOREX that caused the crash has been removed. llvm-svn: 314249	2017-09-26 21:35:09 +00:00
Michael Zuckerman	645f777e40	[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF{8\|16\|32} stride 3) This patch expands the support of lowerInterleavedStore to {8\|16\|32}x8i stride 3. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8\|16\|32}) . This patch is part two of two patches and it covers the store (interlevaed) side. The patch goal is to optimize the following sequence: a0 a1 a2 a3 a4 a5 a6 a7 b0 b1 b2 b3 b4 b5 b6 b7 c0 c1 c2 c3 c4 c5 c6 c7 into a0 b0 c0 a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a6 b6 c6 a7 b7 c7 Reviewers: zvi guyblank dorit Ayal Differential Revision: https://reviews.llvm.org/D37117 Change-Id: I56ced8bcbea809a37654060771911ade20246ccc llvm-svn: 314234	2017-09-26 18:49:11 +00:00
Craig Topper	f51913155c	[X86] Add support for v16i32 UMUL_LOHI/SMUL_LOHI Summary: This patch extends the v8i32/v4i32 custom lowering to support v16i32 Reviewers: zvi, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38274 llvm-svn: 314221	2017-09-26 16:43:57 +00:00
Coby Tayree	f191fdc3fb	[x86] fix pr29061 https://bugs.llvm.org//show_bug.cgi?id=29061 Don't try referencing REX-needed regs when not on 64bit mode Aligns to GCC Differetial Revision: https://reviews.llvm.org/D37801 llvm-svn: 314203	2017-09-26 13:28:05 +00:00
Benjamin Kramer	4b2113a303	Revert "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST." Makes llc crash. This reverts commit r314151. llvm-svn: 314199	2017-09-26 10:25:27 +00:00
Uriel Korach	0ecc984b1b	[X86] Finishing broadcastf32x2 and broadcasti32x2 intrinsics lowering to IR. llvm side. Removing X86 broadcast(f/i)32x2 intrinsics from llvm. Adding autoUpgrade support. Moving matching tests from avx512dq-intrinsics.ll to avx512dq-intrinsics-upgrade.ll and from avx512dqvl-intrinsics.ll to avx512dqvl-intrinsics-upgrade.ll. Differential Revision: https://reviews.llvm.org/D38220 llvm-svn: 314195	2017-09-26 07:39:39 +00:00
Saleem Abdulrasool	2e0d72311b	X86: remove R12 from CSR on Windows x64 SwiftCC R12 is used for the SwiftError parameter. It is no longer a CSR as it is used for transfer the SwiftError, and the caller must preserve it if they need to. llvm-svn: 314165	2017-09-25 22:00:17 +00:00
Craig Topper	5124a14d9c	[X86] Don't select anyext GR32->GR64 to SUBREG_TO_REG. Use INSERT_SUBREG instead. As far as I know SUBREG_TO_REG is stating that the upper bits are 0. But if we are just converting the GR32 with no checks, then we have no reason to say the upper bits are 0. I don't really know how to test this today since I can't find anything that looks that closely at SUBREG_TO_REG. The test changes here seems to be some perturbance of register allocation. Differential Revision: https://reviews.llvm.org/D38001 llvm-svn: 314152	2017-09-25 21:14:59 +00:00
Craig Topper	d830f276c1	[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST. llvm-svn: 314151	2017-09-25 21:14:55 +00:00
Craig Topper	5bc10ede53	[SelectionDAG] Teach simplifyDemandedBits to handle shifts by constant splat vectors This teach simplifyDemandedBits to handle constant splat vector shifts. This required changing some uses of getZExtValue to getLimitedValue since we can't rely on legalization using getShiftAmountTy for the shift amount. I believe there may have been a bug in the ((X << C1) >>u ShAmt) handling where we didn't check if the inner shift was too large. I've fixed that here. I had to add new patterns to ARM because the zext/sext the patterns were trying to look for got turned into an any_extend with this patch. Happy to split that out too, but not sure how to test without this change. Differential Revision: https://reviews.llvm.org/D37665 llvm-svn: 314139	2017-09-25 19:26:08 +00:00
Reid Kleckner	8898cd8dcf	[DebugInfo] Sort the SDDbgValue list before assuming it is in IR order Summary: This code iterates the 'Orders' vector in parallel with the DbgValue list, emitting all DBG_VALUEs that occurred between the last IR order insertion point and the next insertion point. This assumes the SDDbgValue list is sorted in IR order, which it usually is. However, it is not sorted when a node with a debug value is replaced with another one. When this happens, TransferDbgValues is called, and the new value is added to the end of the list. The problem can be solved by stably sorting the list by IR order. Reviewers: aprantl, Ka-Ka Reviewed By: aprantl Subscribers: MatzeB, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D38197 llvm-svn: 314114	2017-09-25 16:14:53 +00:00
Michael Zuckerman	4a97df01c4	[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF8 stride 4): This patch expands the support of lowerInterleavedStore to 8x8i stride 4. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future. The patch goal is to optimize the following sequence: At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding each 8 chars: c0, c1, , c7 m0, m1, , m7 y0, y1, , y7 k0, k1, ., k7 And these need to be transposed/interleaved and stored like so: c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 .... Reviewers DavidKreitzer Farhana zvi igorb guyblank RKSimon Ayal Differential Revision: https://reviews.llvm.org/D36058 Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32 llvm-svn: 314109	2017-09-25 14:50:38 +00:00
Craig Topper	47e14ead54	[X86] Make IFMA instructions during isel so we can fold broadcast loads. This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us. llvm-svn: 314083	2017-09-24 19:30:55 +00:00
Craig Topper	4ffd90c504	[X86] Add tests to show missed opportunities to fold broadcast loads into IFMA instructions when the load is on operand1 of the instrinsic. We need to enable commuting during isel to catch this since the load folding tables can't handle broadcasts. llvm-svn: 314082	2017-09-24 19:30:54 +00:00
Craig Topper	23f1830748	[X86] Add IFMA instructions to the load folding tables and make them commutable for the multiply operands. llvm-svn: 314080	2017-09-24 17:28:14 +00:00
Simon Pilgrim	e1335b1c75	[X86][SSE] Add more tests for shuffle combining with extracted vector elements (PR22415) llvm-svn: 314077	2017-09-24 13:45:49 +00:00
Simon Pilgrim	a705db9a9e	[X86][SSE] Add support for extending bool vectors bitcasted from scalars This patch acts as a reverse to combineBitcastvxi1 - bitcasting a scalar integer to a boolean vector and extending it 'in place' to the requested legal type. Currently this doesn't handle AVX512 at all - but the current mask register approach is lacking for some cases. Differential Revision: https://reviews.llvm.org/D35320 llvm-svn: 314076	2017-09-24 13:42:31 +00:00
Craig Topper	eb5c411218	[AVX-512] Add pattern for selecting masked version of v8i32/v8f32 compare instructions when VLX isn't available. We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'. llvm-svn: 314072	2017-09-24 05:24:52 +00:00
Simon Pilgrim	026727f861	[X86] Regenerate i64 to v2f32 bitcast test llvm-svn: 314068	2017-09-23 19:18:29 +00:00
Sanjay Patel	fa8bad8a0f	[x86] reduce 64-bit mask constant to 32-bits by right shifting This is a follow-up from D38181 (r314023). We have to put 64-bit constants into a register using a separate instruction, so we should try harder to avoid that. From what I see, we're not likely to encounter this pattern in the DAG because the upstream setcc combines from this don't (usually?) produce this pattern. If we fix that, then this will become more relevant. Since the cost of handling this case is just loosening the predicate of the existing fold, we might as well do it now. llvm-svn: 314064	2017-09-23 14:32:07 +00:00
Sanjay Patel	5ca9f7a0cb	[x86] add an add+shift test for follow-up suggestion from D38181; NFC llvm-svn: 314063	2017-09-23 14:24:07 +00:00
Sanjay Patel	ac76201d4e	[x86] remove over-specified platform from test config llvm-svn: 314027	2017-09-22 21:07:13 +00:00
Sanjay Patel	3339954fa3	[x86] swap order of srl (and X, C1), C2 when it saves size The (non-)obvious win comes from saving 3 bytes by using the 0x83 'and' opcode variant instead of 0x81. There are also better improvements based on known-bits that allow us to eliminate the mask entirely. As noted, this could be extended. There are potentially other wins from always shifting first, but doing that reveals a tangle of problems in other pattern matching. We do this transform generically in instcombine, but we often have icmp IR that doesn't match that pattern, so we must account for this in the backend. Differential Revision: https://reviews.llvm.org/D38181 llvm-svn: 314023	2017-09-22 19:37:21 +00:00
Sanjay Patel	ae42181db4	[x86] remove unnecessary OS specifier from test llvm-svn: 313986	2017-09-22 14:38:57 +00:00
Sanjay Patel	04fd5b8cdc	[x86] auto-generate complete checks; NFC llvm-svn: 313985	2017-09-22 14:30:52 +00:00
Sanjay Patel	8dca7080b0	[x86] update test to use FileCheck; NFC llvm-svn: 313984	2017-09-22 14:29:47 +00:00
Alexander Ivchenko	34498ba052	[X86] Combining CMOVs with [ANY,SIGN,ZERO]_EXTEND for cases where CMOV has constant arguments Combine CMOV[i16]<-[SIGN,ZERO,ANY]_EXTEND to [i32,i64] into CMOV[i32,i64]. One example of where it is useful is: before (20 bytes) <foo>: test $0x1,%dil mov $0x307e,%ax mov $0xffff,%cx cmovne %ax,%cx movzwl %cx,%eax retq after (18 bytes) <foo>: test $0x1,%dil mov $0x307e,%ecx mov $0xffff,%eax cmovne %ecx,%eax retq Reviewers: craig.topper, aaboud, spatel, RKSimon, zvi Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36711 llvm-svn: 313982	2017-09-22 13:21:39 +00:00
Jatin Bhateja	c034d36024	[X86] Updating the test case for FMF propagation. Differential Revision: https://reviews.llvm.org/D38163 llvm-svn: 313964	2017-09-22 05:48:20 +00:00
Sanjay Patel	58f02afecd	[x86] add more tests for node-level FMF; NFC llvm-svn: 313893	2017-09-21 17:40:58 +00:00
Simon Pilgrim	1efe0c7224	[X86][SSE] Add PSHUFLW/PSHUFHW tests inspired by PR34686 llvm-svn: 313883	2017-09-21 15:11:51 +00:00
Jatin Bhateja	1a86c382d4	[X86] Adding a testpoint for fast-math flags propagation. Reviewers: jbhateja Reviewed By: jbhateja Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38127 llvm-svn: 313869	2017-09-21 09:53:21 +00:00
Saleem Abdulrasool	aff96d907b	X86: treat SwiftCC as Win64_CC on Win64 The Swift CC is identical to Win64 CC with the exception of swift error being passed in r12 which is a CSR. However, since this calling convention is only used in swift -> swift code, it does not impact interoperability and can be treated entirely as Win64 CC. We would previously incorrectly lower the frame setup as we did not treat the frame as conforming to Win64 specifications. llvm-svn: 313813	2017-09-20 21:00:40 +00:00
Saleem Abdulrasool	432b88e5f4	CodeGen: support SwiftError SwiftCC on Windows x64 Add support for passing SwiftError through a register on the Windows x64 calling convention. This allows the use of swifterror attributes on parameters which is used by the swift front end for the `Error` parameter. This partially enables building the swift standard library for Windows x86_64. llvm-svn: 313791	2017-09-20 18:40:59 +00:00
Simon Pilgrim	d202ad15c1	[X86][SSE] Add PR22415 test case llvm-svn: 313755	2017-09-20 13:49:52 +00:00
Florian Hahn	ceb4494786	Recommit [MachineCombiner] Update instruction depths incrementally for large BBs. This version of the patch fixes an off-by-one error causing PR34596. We do not need to use std::next(BlockIter) when calling updateDepths, as BlockIter already points to the next element. Original commit message: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors. > In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 313751	2017-09-20 11:54:37 +00:00
Simon Pilgrim	d5e2878252	[X86][SSE] Add 'redundant pand' test case from PR34620 llvm-svn: 313632	2017-09-19 14:02:16 +00:00
Sanjay Patel	bd7958d7ca	[x86] regenerate checks; NFC llvm-svn: 313631	2017-09-19 13:43:09 +00:00
Daniel Sanders	28887fe548	[globalisel] Add support for intrinsic_w_chain. This maps directly to G_INTRINSIC_W_SIDE_EFFECTS. llvm-svn: 313627	2017-09-19 12:56:36 +00:00
Jina Nahias	ccfb8d4fe8	[x86] Lowering Mask Set1 intrinsics to LLVM IR This patch, together with a matching clang patch (https://reviews.llvm.org/D37668), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37669 llvm-svn: 313625	2017-09-19 11:03:06 +00:00
Andrei Elovikov	142516b456	Test commit. llvm-svn: 313617	2017-09-19 07:56:20 +00:00
Gadi Haber	6f8fbf4b86	[X86][Skylake] Adding the scheduling information for the SkylakeClient target This patch adds the instruction scheduling information for the SkylakeClient (SKL) architecture target by adding the file X86SchedSkylakeClient.td located under the X86 Target. We used the scheduling information retrieved from the Skylake architects in order to create the file. The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction. The patch continues the scheduling replacement and insertion effort started with the SNB target in r307529 and r310792 and for HSW in r311879. Please expect some performance fluctuations due to code alignment effects. Reviewers: craig.topper, zvi, chandlerc, igorb, aymanmus, RKSimon, delena Differential Revision: https://reviews.llvm.org/D37294 llvm-svn: 313613	2017-09-19 06:19:27 +00:00
Craig Topper	a80949feb5	[X86] Add VPERMPD/VPERMQ and VPERMPS/VPERMD to the execution domain fixing table. llvm-svn: 313610	2017-09-19 04:39:55 +00:00
Sanjay Patel	f31b1a00ea	[DAGCombiner] fold assertzexts separated by trunc If we have an AssertZext of a truncated value that has already been AssertZext'ed, we can assert on the wider source op to improve the zext-y knowledge: assert (trunc (assert X, i8) to iN), i1 --> trunc (assert X, i1) to iN This moves a fold from being Mips-specific to general combining, and x86 shows improvements. Differential Revision: https://reviews.llvm.org/D37017 llvm-svn: 313577	2017-09-18 22:05:35 +00:00
Sanjay Patel	7765c93be2	[DAG, x86] allow store merging before and after legalization (PR34217) rL310710 allowed store merging to occur after legalization to catch stores that are created late, but this exposes a logic hole seen in PR34217: https://bugs.llvm.org/show_bug.cgi?id=34217 We will miss merging stores if the target lowers vector extracts into target-specific operations. This patch allows store merging to occur both before and after legalization if the target chooses to get maximum merging. I don't think the potential regressions in the other tests are relevant. The tests are for correctness of weird IR constructs rather than perf tests, and I think those are still correct. Differential Revision: https://reviews.llvm.org/D37987 llvm-svn: 313564	2017-09-18 20:54:26 +00:00
Craig Topper	39cdb84560	[X86] Make sure we still emit zext for GR32 to GR64 when the source of the zext is AssertZext The AssertZext we might see in this case is only giving information about the lower 32 bits. It isn't providing information about the upper 32 bits. So we should emit a zext. This fixes PR28540. Differential Revision: https://reviews.llvm.org/D37729 llvm-svn: 313563	2017-09-18 20:49:13 +00:00
Sanjay Patel	74d12b5697	[x86] add tests for PR34217; NFC llvm-svn: 313548	2017-09-18 18:07:50 +00:00
Simon Pilgrim	4aa28b9730	[X86][AVX] Improve (i8 bitcast (v8i1 x)) handling for 256-bit vector compare results. As commented on D37849, AVX1 targets were missing a chance to use vmovmskps for v8f32/v8i32 results for bool vector bitcasts llvm-svn: 313547	2017-09-18 17:58:31 +00:00
Sanjay Patel	078d5d978c	[x86] regenerate checks; NFC llvm-svn: 313545	2017-09-18 17:33:47 +00:00
Simon Pilgrim	0b21ef1fa3	[SelectionDAG] Add BITCAST handling to ComputeNumSignBits for splatted sign bits. For cases where we are BITCASTing to vectors of smaller elements, then if the entire source was a splatted sign (src's NumSignBits == SrcBitWidth) we can say that the dst's NumSignBit == DstBitWidth, as we're just splitting those sign bits across multiple elements. We could generalize this but at the moment the only use case I have is to peek through bitcasts to vector comparison results. Differential Revision: https://reviews.llvm.org/D37849 llvm-svn: 313543	2017-09-18 16:45:05 +00:00
Craig Topper	77d7f331dd	[X86] Fix two more places to prefer VPERMQ/PD over VPERM2X128 when AVX2 is enabled The shuffle combining and lowerVectorShuffleAsLanePermuteAndBlend were both still trying to use VPERM2XF128 for unary shuffles when AVX2 is enabled. VPERM2X128 takes two inputs meaning when we use it for a unary shuffle one of those inputs is left undefined creating a false dependency on whatever register gets allocated there. If we have VPERMQ/PD we should prefer those since they only have a single input. Differential Revision: https://reviews.llvm.org/D37947 llvm-svn: 313542	2017-09-18 16:39:49 +00:00
Simon Pilgrim	00161c9961	[X86][SSE] Improve support for vselect(Cond, 0, X) -> ANDN(Cond, X) As discussed on PR28925 and D37849. Differential Revision: https://reviews.llvm.org/D37975 llvm-svn: 313532	2017-09-18 14:23:23 +00:00
Simon Pilgrim	360629d170	[X86][SSE] Add vselect with zero tests (PR28925) llvm-svn: 313529	2017-09-18 13:32:33 +00:00
Nikolai Bozhenov	84af99b3b1	[X86FixupBWInsts] More precise register liveness if no <imp-use> on MOVs. Summary: Subregister liveness tracking is not implemented for X86 backend, so sometimes the whole super register is said to be live, when only a subregister is really live. That might happen if the def and the use are located in different MBBs, see added fixup-bw-isnt.mir test. However, using knowledge of the specific instructions handled by the bw-fixup-pass we can get more precise liveness information which this change does. Reviewers: MatzeB, DavidKreitzer, ab, andrew.w.kaylor, craig.topper Reviewed By: craig.topper Subscribers: n.bozhenov, myatsina, llvm-commits, hiraditya Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D37559 llvm-svn: 313524	2017-09-18 10:17:59 +00:00
Mohammed Agabaria	77cb080c2d	[X86][Codegen] adding masked gathers tests for avx2 related to patch: https://reviews.llvm.org/D35772 adding llvm gathers test before gathers codegen support. Differential Revision: https://reviews.llvm.org/D37800 llvm-svn: 313516	2017-09-18 06:49:54 +00:00
Craig Topper	a6054328e8	[X86] Teach the execution domain fixing tables to use movlhps inplace of unpcklpd for the packed single domain. MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter. llvm-svn: 313509	2017-09-18 04:40:58 +00:00
Craig Topper	87f7381edf	[X86] Teach execution domain fixing to convert between FP and int unpack instructions. llvm-svn: 313508	2017-09-18 03:29:54 +00:00
Craig Topper	d4341920d5	[X86] Teach execution domain fixing to convert between VPERMILPS and VPSHUFD. llvm-svn: 313507	2017-09-18 03:29:47 +00:00
Craig Topper	ee6646d7de	[X86] Teach shuffle lowering to use MOVLHPS/MOVHLPS for lowering v4f32 unary shuffles with SSE1 only. llvm-svn: 313504	2017-09-17 22:36:41 +00:00
Craig Topper	6c221690a3	[X86] Add a couple more unary shuffles to the sse1 shuffle test. These can be implemented with movlhps and movhlps. llvm-svn: 313503	2017-09-17 22:36:39 +00:00
Jatin Bhateja	356e3e2c1d	Adding test cases for PR34629 & PR34634. Differential Revision: https://reviews.llvm.org/D37962 llvm-svn: 313490	2017-09-17 18:16:26 +00:00
Igor Breger	f1d388a5c5	[GlobalISel][X86] Legalize i1 G_ADD/G_SUB/G_MUL/G_XOR/G_OR/G_AND instructions. llvm-svn: 313483	2017-09-17 11:34:17 +00:00
Igor Breger	0f382ccb68	[GlobalISel][X86] Use correct physical register in mir tests.NFC. llvm-svn: 313479	2017-09-17 08:30:42 +00:00
Igor Breger	21200ed7af	[GlobalISel][X86] G_FCONSTANT support. Summary: G_FCONSTANT support, port the implementation from X86FastIsel. Reviewers: zvi, delena, guyblank Reviewed By: delena Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D37734 llvm-svn: 313478	2017-09-17 08:08:13 +00:00
Sanjay Patel	65d6780703	[x86] enable storeOfVectorConstantIsCheap() target hook This allows vector-sized store merging of constants in DAGCombiner using the existing code in MergeConsecutiveStores(). All of the twisted logic that decides exactly what vector operations are legal and fast for each particular CPU are handled separately in there using the appropriate hooks. For the motivating tests in merge-store-constants.ll, we already produce the same vector code in IR via the SLP vectorizer. So this is just providing a backend backstop for code that doesn't go through that pass (-O1). More details in PR24449: https://bugs.llvm.org/show_bug.cgi?id=24449 (this change should be the last step to resolve that bug) Differential Revision: https://reviews.llvm.org/D37451 llvm-svn: 313458	2017-09-16 13:29:12 +00:00
Craig Topper	23f78c1662	[X86] Add isel patterns to be able to fold loads into VPERM2F128 even when the load is on the first input to the SDNode. We just need to toggle bits 1 and 5 of the immediate and swap the sources. The peephole pass could trigger commuting/folding for this later, but its easy enough to fix in isel. Disable the peephole pass on the main vperm2x128 test so we know we're doing this through isel. llvm-svn: 313455	2017-09-16 09:16:48 +00:00
Craig Topper	0d1b519f78	[X86] Remove unused check lines that got left behind when I moved tests to the instrinsic upgrade file and regenerated. llvm-svn: 313454	2017-09-16 09:16:46 +00:00
Craig Topper	8374ffde08	[X86] Remove the vperm2f128 test file I just added in r313450. I missed the we already had a pretty thorough test file for these instructions. llvm-svn: 313451	2017-09-16 07:51:01 +00:00
Craig Topper	f264fcc704	[X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native shuffles. I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates. llvm-svn: 313450	2017-09-16 07:36:14 +00:00
Craig Topper	aa499c1cb2	[X86] Fix some FileCheck lines that use the wrong prefix. Assume they were moved during autoupgrading and not changed. llvm-svn: 313448	2017-09-16 07:13:39 +00:00
Craig Topper	950b19515a	[X86] Don't set reserved bits in the immediate in the test cases for vperm2f128. I'm going to autoupgrade these intrinsics in a future commit. This bit will never be set in the resulting output so pre-removing the bit. llvm-svn: 313434	2017-09-16 02:11:21 +00:00
Craig Topper	9313df747d	[X86] Remove slash in front of a CHECK line in a test. llvm-svn: 313433	2017-09-16 01:43:21 +00:00
Craig Topper	d02179cd9c	[X86] Remove usages of vperm2f intrinsics from fast isel tests to match what clang generates after r313418. llvm-svn: 313424	2017-09-15 23:53:43 +00:00
Hans Wennborg	534bfbd3ba	Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs." This caused PR34629: asserts firing when building Chromium. It also broke some buildbots building test-suite as reported on the commit thread. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet > > Reviewed By: lsaba > > Subscribers: spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313376	2017-09-15 18:40:26 +00:00
Craig Topper	7a183e2760	[X86] Prefer VPERMQ over VPERM2F128 for any unary shuffle, not just the ones that can be done with a insertf128 The early out for AVX2 in lowerV2X128VectorShuffle is positioned in a weird spot below some shuffle mask equivalency checks. But I think we want to allow VPERMQ for any unary shuffle. Differential Revision: https://reviews.llvm.org/D37893 llvm-svn: 313373	2017-09-15 18:11:13 +00:00
Craig Topper	e0d724cf51	[X86] Don't create i64 constants on 32-bit targets when lowering v64i1 constant build vectors When handling a v64i1 build vector of constants on 32-bit targets we were creating an illegal i64 constant that we then bitcasted back to v64i1. We need to instead create two 32-bit constants, bitcast them to v32i1 and concat the result. We should also take care to handle the halves being all zeros/ones after the split. This patch splits the build vector and then recursively lowers the two pieces. This allows us to handle the all ones and all zeros cases with minimal effort. Ideally we'd just do the split and concat, and let lowering get called again on the new nodes, but getNode has special handling for CONCAT_VECTORS that reassembles the pieces back into a single BUILD_VECTOR. Hopefully the two temporary BUILD_VECTORS we had to create to do this that don't get returned don't cause any issues. Fixes PR34605. Differential Revision: https://reviews.llvm.org/D37858 llvm-svn: 313366	2017-09-15 17:09:03 +00:00
Craig Topper	143797eb89	[X86] Add isel pattern infrastructure to begin recognizing when we're inserting 0s into the upper portions of a vector register and the producing instruction as already produced the zeros. Currently if we're inserting 0s into the upper elements of a vector register we insert an explicit move of the smaller register to implicitly zero the upper bits. But if we can prove that they are already zero we can skip that. This is based on a similar idea of what we do to avoid emitting explicit zero extends for GR32->GR64. Unfortunately, this is harder for vector registers because there are several opcodes that don't have VEX equivalent instructions, but can write to XMM registers. Among these are SHA instructions and a MMX->XMM move. Bitcasts can also get in the way. So for now I'm starting with explicitly allowing only VPMADDWD because we emit zeros in combineLoopMAddPattern. So that is placing extra instruction into the reduction loop. I'd like to allow PSADBW as well after D37453, but that's currently blocked by a bitcast. We either need to peek through bitcasts or canonicalize insert_subvectors with zeros to remove bitcasts on the value being inserted. Longer term we should probably have a cleanup pass that removes superfluous zeroing moves even when the producer is in another basic block which is something these isel tricks can't do. See PR32544. Differential Revision: https://reviews.llvm.org/D37653 llvm-svn: 313365	2017-09-15 17:09:00 +00:00
Simon Pilgrim	905e79c4dc	[X86][SSE] Add test cases vector for integer multiplies Mainly inspired by PR34474 / D37896 llvm-svn: 313353	2017-09-15 11:17:42 +00:00
Jatin Bhateja	908c8b37c2	[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs. Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet Reviewed By: lsaba Subscribers: spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313343	2017-09-15 05:29:51 +00:00
Simon Pilgrim	0b220c7524	[X86] Regenerate test. NFCI. llvm-svn: 313259	2017-09-14 13:00:27 +00:00
Simon Pilgrim	47d8f62472	Regenerate test (broadcast comment). NFCI. llvm-svn: 313258	2017-09-14 12:41:19 +00:00
Ayman Musa	ab68449c53	[X86] When applying the shuffle-to-zero-extend transformation on floating point, bitcast to integer first. Fix issue described in PR34577. Differential Revision: https://reviews.llvm.org/D37803 llvm-svn: 313256	2017-09-14 12:06:38 +00:00
Simon Pilgrim	8bd2d8780a	[DAGCombine] (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) We already have a combine for this pattern when the input to shl is add, so we just need to enable the transformation when the input is or. Original patch by @tstellar Differential Revision: https://reviews.llvm.org/D19325 llvm-svn: 313251	2017-09-14 10:38:30 +00:00
Simon Pilgrim	11e2969a35	Fix line endings. NFCI. llvm-svn: 313246	2017-09-14 10:30:22 +00:00
Dean Michael Berris	01fd7c8bd4	[XRay][CodeGen] Use the current function symbol as the associated symbol for the instrumentation map Summary: XRay had been assuming that the previous section is the "text" section of the function when lowering the instrumentation map. Unfortunately this is not a safe assumption, because we may be coming from lowering debug type information for the function being lowered. This fixes an issue with combining -gsplit-dwarf, -generate-type-units, -debug-compile and -fxray-instrument for sole member functions. When the split dwarf section is stripped, we're left with references from the xray_instr_map to the debug section. The change now uses the function's symbol instead of the previous section's start symbol. We found the bug while attempting to strip the split debug sections off an XRay-instrumented object file, which had a peculiar edge-case for single-function classes where the single function is being lowered. Because XRay had assocaited the instrumentation map for a function to the debug types section instead of the function's section, the objcopy call will fail due to the misplaced reference from the xray_instr_map section. Reviewers: pcc, dblaikie, echristo Subscribers: llvm-commits, aprantl Differential Revision: https://reviews.llvm.org/D37791 llvm-svn: 313233	2017-09-14 07:08:23 +00:00
NAKAMURA Takumi	38fac5905e	Move llvm/test/CodeGen/X86/clear-liverange-spillreg.mir to SystemZ. It was in wrong place. llvm-svn: 313218	2017-09-14 00:03:23 +00:00
Hans Wennborg	06e2a384c2	Revert r312719 "[MachineCombiner] Update instruction depths incrementally for large BBs." This caused PR34596. > [MachineCombiner] Update instruction depths incrementally for large BBs. > > Summary: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors. > > In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 313213	2017-09-13 23:23:09 +00:00
Wei Mi	a2a135a01c	Add a comment for the test. NFC. llvm-svn: 313199	2017-09-13 21:47:13 +00:00
Wei Mi	c0d066468e	[RegAlloc] Keep a copy of live interval for the spilled vregs in HoistSpillHelper. This is to fix PR34502. After rL311401, the live range of spilled vreg will be cleared. HoistSpill need to use the live range of the original vreg before splitting to know the moving range of the spills. The patch saves a copy of live interval for the spilled vreg inside of HoistSpillHelper. Differential Revision: https://reviews.llvm.org/D37578 llvm-svn: 313197	2017-09-13 21:41:30 +00:00
Gadi Haber	35f4d7ca46	[X86][Skylake] Replacing -mcpu=skx by -mattr in a codegen test. NFC. NFC. Replacing -mcpu=skx by -mattr in the run command of the codegen test: avx512-gather-scatter-intrin.ll. Reviewers: delena Revision: https://reviews.llvm.org/D37799 llvm-svn: 313144	2017-09-13 12:39:18 +00:00
Simon Pilgrim	f613a45bf3	[X86][FMA4] Test FMA4 commutation with repeated ops as well as FMA3 llvm-svn: 313143	2017-09-13 11:21:38 +00:00
Simon Pilgrim	322fc53725	[X86][FMA] Added *213 fma instructions to scheduling tests Annoyingly the 132/231 variants are pretty tricky to create when you need to due to weak FMA commutation patterns. llvm-svn: 313142	2017-09-13 11:12:56 +00:00
Gadi Haber	a753080d1e	[X86][Skylake][KNL] Updating code gen regression test to use the KNL and SKYLAKE prefixes. NFC. NFC. Updating the code gen regression test bmi2-schedule.ll to use the KNL and SKYLAKE prefixes for the run commands that use the knl and Skylake mcpu options. The fix is in preparation for a large patch of adding all SKL scheduling information. Reviewers: delena, zvi, RKSimon Revision: https://reviews.llvm.org/D37796 llvm-svn: 313138	2017-09-13 09:28:25 +00:00
Gadi Haber	04de4ce9e2	[X86][Skylake][KNL] Updating code gen regression test to use the KNL and SKYLAKE prefixes. NFC. NFC. Updating the code gen regression test bmi2-schedule.ll to use the KNL and SKYLAKE prefixes for the run commands that use the knl and Skylake mcpu options. The fix is in preparation for a large patch of adding all SKL scheduling information. Reviewers: delena, zvi Revision: https://reviews.llvm.org/D37796 llvm-svn: 313137	2017-09-13 09:28:18 +00:00
Gadi Haber	fb47ab7cdd	NFC. Updating codegen test bmi2-schedule.ll to use the SKYLAKE and KNL prefix as preparatipn for an upcoming patch to add all SKL scheduling information. llvm-svn: 313136	2017-09-13 09:27:39 +00:00
Igor Breger	5c721199dd	[GlobalISel][X86] support G_FPEXT operation. Summary: Support G_FPEXT operation. Selection done via TableGen'erated code. Reviewers: zvi, guyblank, aymanmus, m_zuckerman Reviewed By: zvi Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34816 llvm-svn: 313135	2017-09-13 09:05:23 +00:00
Uriel Korach	5d5da5f531	[X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm) This patch, together with a matching clang patch (https://reviews.llvm.org/D37694), implements the lowering of X86 ABS intrinsics to IR. differential revision: https://reviews.llvm.org/D37693. llvm-svn: 313134	2017-09-13 09:02:36 +00:00
Uriel Korach	53872a2d89	[X86] Add explicit mc-encoding checks to X86/viabs.ll. NFC. Add explicit mc-encoding checks showing that the AVX512VL ABS intrinsics are actually mapped to EVEX encoding. This is a pre-commit for a soon to come patch which will lower x86 target specific ABS intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37688 llvm-svn: 313131	2017-09-13 08:33:55 +00:00
Craig Topper	2b6bfda561	[X86] Make sure we emit a SUBREG_TO_REG after the MOV32ri when creating a BEXTR64rr instruction from a shift/and pair. Fixes PR34589. llvm-svn: 313126	2017-09-13 07:53:21 +00:00
Elena Demikhovsky	6cab129464	[X86 CodeGen] Optimization of ZeroExtendLoad for v2i8 vector Load with zero-extend and sign-extend from v2i8 to v2i32 is "Legal" since SSE4.1 and may be performed using PMOVZXBD , PMOVSXBD instructions. llvm-svn: 313121	2017-09-13 06:40:26 +00:00
Sanjay Patel	659279450e	[x86] eliminate unnecessary vector compare for AVX masked store The masked store instruction only cares about the sign-bit of each mask element, so the compare s<0 isn't needed. As noted in PR11210: https://bugs.llvm.org/show_bug.cgi?id=11210 ...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR. (Although more testing will be needed to confirm that.) I filed a bug to track improvements for AVX512: https://bugs.llvm.org/show_bug.cgi?id=34584 Differential Revision: https://reviews.llvm.org/D37446 llvm-svn: 313089	2017-09-12 23:24:05 +00:00
Craig Topper	958106d0f1	[X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp. This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together. I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel. I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch. Differential Revision: https://reviews.llvm.org/D37592 llvm-svn: 313054	2017-09-12 17:40:25 +00:00
Elena Demikhovsky	18ff5c1374	Added "zext" from v2i8 to v2i32. In the next patch I'll optimize the sequence. llvm-svn: 313052	2017-09-12 17:27:53 +00:00
Simon Pilgrim	76418aae74	[X86][AVX2] Add gather/movntdqa/pmaskmov/pmovmskb/pslldq/psrldq instructions to scheduling tests llvm-svn: 313039	2017-09-12 15:52:01 +00:00
Simon Pilgrim	0af5a772e0	[X86][AVX2] Add further instructions to scheduling tests llvm-svn: 313032	2017-09-12 15:01:20 +00:00
Simon Pilgrim	d2d2b37cc9	[X86][AVX2] Add integer broadcast scheduling tests llvm-svn: 313026	2017-09-12 12:59:20 +00:00
Simon Pilgrim	5a931c641e	[X86][AVX2] Add additional fp-broadcast/subvector/shuffle scheduling tests llvm-svn: 313022	2017-09-12 11:17:01 +00:00
Simon Pilgrim	ef9a9d709a	[X86][AVX] Add vperm2f128 scheduling test llvm-svn: 313021	2017-09-12 11:10:59 +00:00
Simon Pilgrim	f336d9ce3c	[X86][AVX2] Remove old (unused) intrinsic declarations llvm-svn: 313020	2017-09-12 11:09:30 +00:00
Yael Tsafrir	47668b5e03	[X86] Lower _mm[256\|512]_[mask[z]]_avg_epu[8\|16] intrinsics to native llvm IR Differential Revision: https://reviews.llvm.org/D37560 llvm-svn: 313013	2017-09-12 07:50:35 +00:00
Craig Topper	afdc36ed74	[X86] Add an extra instruction to TruncAssertSext.ll to prevent the 'or' from being narrowed so that the movl is really required to avoid a miscompile. If we allow the OR to be narrowed then the upper bits really are zero and we can't tell if the zeroing movl was removed on purpose. While here regenerate the test with update_llc_test_checks.py llvm-svn: 312995	2017-09-12 03:50:44 +00:00
Craig Topper	66e4ace1c8	[X86] Rename TruncAssertZext.ll test to TruncAssertSext.ll. Since its testing AssertSext. llvm-svn: 312991	2017-09-12 01:30:10 +00:00
Adrian Prantl	16aa4cf7ef	llvm-dwarfdump: Make -brief the default and add a -verbose option instead. Differential Revision: https://reviews.llvm.org/D37717 llvm-svn: 312972	2017-09-11 23:05:20 +00:00
Adrian Prantl	7bc1b28291	llvm-dwarfdump: Replace -debug-dump=sect option with individual options. As discussed on llvm-dev in http://lists.llvm.org/pipermail/llvm-dev/2017-September/117301.html this changes the command line interface of llvm-dwarfdump to match the one used by the dwarfdump utility shipping on macOS. In addition to being shorter to type this format also has the advantage of allowing more than one section to be specified at the same time. In a nutshell, with this change $ llvm-dwarfdump --debug-dump=info $ llvm-dwarfdump --debug-dump=apple-objc becomes $ dwarfdump --debug-info --apple-objc Differential Revision: https://reviews.llvm.org/D37714 llvm-svn: 312970	2017-09-11 22:59:45 +00:00
Zvi Rackover	255488a1e0	X86 Tests: More AVX512 conversions tests. NFC Adding more tests for AVX512 fp<->int conversions that were missing. llvm-svn: 312921	2017-09-11 15:54:38 +00:00
Simon Pilgrim	b092bd321a	[X86][SSE] Add support for X86ISD::PACKSS to ComputeNumSignBitsForTargetNode Helps improve combineLogicBlendIntoPBLENDV support by allowing us to peek into through PACKSS truncations of vector comparison results. Differential Revision: https://reviews.llvm.org/D37680 llvm-svn: 312916	2017-09-11 14:03:47 +00:00
Simon Pilgrim	d0ff65b50e	[X86][SSE] Add further test cases showing failure to compute sign bits through PACKSS Suggested in D37680 Note: had to drop AVX512VL tests as there is an infinite loop in the new tests that needs further investigation (not relevant to D37680). llvm-svn: 312910	2017-09-11 12:18:43 +00:00
Gadi Haber	3ddffced43	[X86][SKX][KNL] Updating several CodeGen tests to use the attr flag instead of mcpu flag NFC. Updated 3 Codegen regression tests to use the -mattr flag instead of the -mcpu flags as follows: Instead of -mcpu=skx use -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq Instead of -mcpu=knl use -mattr=+avx512f Reviewers: delena Revision: https://reviews.llvm.org/D37674 llvm-svn: 312909	2017-09-11 11:26:20 +00:00
Michael Zuckerman	9707ba0957	[Interleved][Stride 3]Adding test for case the VF=64 target with AVX512. llvm-svn: 312907	2017-09-11 10:57:15 +00:00
Simon Pilgrim	f6fa1d0369	[X86][SSE] Add test showing failure to compute sign bits through PACKSS Prevents combineLogicBlendIntoPBLENDV from merging to PBLENDV llvm-svn: 312906	2017-09-11 10:50:03 +00:00
Igor Breger	1f14364d64	[GlobalISel][X86] G_ANYEXT support. Summary: G_ANYEXT support Reviewers: zvi, delena Reviewed By: delena Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D37675 llvm-svn: 312903	2017-09-11 09:41:13 +00:00
Elena Demikhovsky	cc477bbcea	Fixed a bug in splitting Scatter operation in the Type Legalizer. After the split of the Scatter operation, the order of the new instructions is well defined - Lo goes before Hi. Otherwise the semantic of Scatter (from LSB to MSB) is broken. I'm chaining 2 nodes to prevent reordering. Differential Revision https://reviews.llvm.org/D37670 llvm-svn: 312894	2017-09-11 06:18:15 +00:00
Elena Demikhovsky	9afc3d7b82	Added a test that demonstrates a ug in Scatter scheduling. The bug is going to be fixed in an upcomming patch. llvm-svn: 312883	2017-09-10 13:20:42 +00:00
Simon Pilgrim	ed27bea373	[X86] Add v2i4 store test case (PR20012) llvm-svn: 312874	2017-09-09 20:28:50 +00:00
Simon Pilgrim	e932c7fafa	[X86] Add v2i2 test case (PR20011) llvm-svn: 312873	2017-09-09 20:22:35 +00:00
Simon Pilgrim	da41ca5a25	[X86][FMA] Regenerate FMA tests llvm-svn: 312871	2017-09-09 19:25:59 +00:00
Simon Pilgrim	97a56866a2	[X86][SSE] i32 vector multiplications test cases from PR6399 llvm-svn: 312868	2017-09-09 18:18:17 +00:00
Simon Pilgrim	a866a190d6	[X86][MOVBE] Fix typo in MOVBE scheduling test names Copy+paste is not your friend llvm-svn: 312867	2017-09-09 17:52:44 +00:00
Craig Topper	3be1db82b6	[X86] Don't disable slow INC/DEC if optimizing for size Summary: Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size. This appears to match gcc behavior. Reviewers: chandlerc, zvi, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37177 llvm-svn: 312866	2017-09-09 17:11:59 +00:00
Craig Topper	56af2cad89	[X86] Simplify the slow-incdec test and add test cases with optsize. I think we want to consider using inc/dec with optsize. llvm-svn: 312804	2017-09-08 17:33:54 +00:00
Simon Pilgrim	2e4fb24173	[X86] Added PR31045 test case Reduced version of 'addr-calc-crash.ll' that was included in D27044, that had been fixed already by D31286/rL298633 llvm-svn: 312786	2017-09-08 10:49:11 +00:00
Jatin Bhateja	a251312719	[X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum' Differential Revision: https://reviews.llvm.org/D37614 llvm-svn: 312778	2017-09-08 09:15:36 +00:00
Chandler Carruth	acbcf06f03	[x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to cover the bitwise operators. Nothing really exciting here, this just stamps out the rest of the core operations that can RMW memory and set flags. Still not implemented here: ADC, SBB. Those will require more interesting logic to channel the flags in, and I'm not currently planning to try to tackle that. It might be interesting for someone who wants to improve our code generation for bignum implementations. Differential Revision: https://reviews.llvm.org/D37141 llvm-svn: 312768	2017-09-08 00:17:12 +00:00
Chandler Carruth	52a31bf268	[x86] Extend the manual ISel of `add` and `sub` with both RMW memory operands and used flags to support matching immediate operands. This is a bit trickier than register operands, and we still want to fall back on a register operands even for things that appear to be "immediates" when they won't actually select into the operation's immediate operand. This also requires us to handle things like selecting `sub` vs. `add` to minimize the number of bits needed to represent the immediate, and picking the shortest immediate encoding. In order to that, we in turn need to scan to make sure that CF isn't used as it will get inverted. The end result seems very nice though, and we're now generating optimal instruction sequences for these patterns IMO. A follow-up patch will further expand this to other operations with RMW memory operands. But handing `add` and `sub` are useful starting points to flesh out the machinery and make sure interesting and complex cases can be handled. Thanks to Craig Topper who provided a few fixes and improvements to this patch in addition to the review! Differential Revision: https://reviews.llvm.org/D37139 llvm-svn: 312764	2017-09-07 23:54:24 +00:00
Paul Robinson	bb92137080	[DWARF] Line 0 should not have a discriminator. It's meaningless and takes up extra space in the line table. Differential Revision: https://reviews.llvm.org/D37364 llvm-svn: 312751	2017-09-07 22:15:44 +00:00
Michael Zuckerman	5a385940d3	[X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess (VF{8\|16\|32} stride 3). This patch expands the support of lowerInterleavedload to {8\|16\|32}x8i stride 3. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8\|16\|32}) and we plan to include the store (deinterleved side). The patch goal is to optimize the following sequence: a0 b0 c0 a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a6 b6 c6 a7 b7 c7 into a0 a1 a2 a3 a4 a5 a6 a7 b0 b1 b2 b3 b4 b5 b6 b7 c0 c1 c2 c3 c4 c5 c6 c7 Reviewers 1. zvi 2. igor 3. guyblank 4. dorit 5. Ayal llvm-svn: 312722	2017-09-07 14:02:13 +00:00
Florian Hahn	d39b8a3533	[MachineCombiner] Update instruction depths incrementally for large BBs. Summary: For large basic blocks with lots of combinable instructions, the MachineTraceMetrics computations in MachineCombiner can dominate the compile time, as computing the trace information is quadratic in the number of instructions in a BB and it's relevant successors/predecessors. In most cases, knowing the instruction depth should be enough to make combination decisions. As we already iterate over all instructions in a basic block, the instruction depth can be computed incrementally. This reduces the cost of machine-combine drastically in cases where lots of instructions are combined. The major drawback is that AFAIK, computing the critical path length cannot be done incrementally. Therefore we only compute instruction depths incrementally, for basic blocks with more instructions than inc_threshold. The -machine-combiner-inc-threshold option can be used to set the threshold and allows for easier experimenting and checking if using incremental updates for all basic blocks has any impact on the performance. Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn Reviewed By: fhahn Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 312719	2017-09-07 12:49:39 +00:00
Alexander Ivchenko	f3a3cd198e	[x86] Update to cmov promotion tests for D36711; NFC Adding i8 -> [i16, i32, i64] and i32 -> i64 cases. This way we can see what the current codegen looks like. llvm-svn: 312707	2017-09-07 08:59:05 +00:00
Zvi Rackover	25799d93f0	X86: Improve AVX512 fptoui lowering Summary: Add patterns for fptoui <16 x float> to <16 x i8> fptoui <16 x float> to <16 x i16> Reviewers: igorb, delena, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37505 llvm-svn: 312704	2017-09-07 07:40:34 +00:00
Sanjay Patel	e96f875deb	[x86] fix triple and regenerate checks for psubus; NFC Patch by Yulia Koval! Differential Revision: https://reviews.llvm.org/D37523 llvm-svn: 312662	2017-09-06 19:05:20 +00:00
Wei Mi	818d50a93d	[TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent function return the intrinsics's first argument. llvm.memcpy/memset/memmove return void but they will return the first argument after they are expanded as libcalls. Now if the parent function has any return value, llvm.memcpy cannot be turned into tail call after expansion. The patch is to handle that case in SelectionDAGBuilder so when caller function return the same value as the first argument of llvm.memcpy, tail call is allowed. Differential Revision: https://reviews.llvm.org/D37406 llvm-svn: 312641	2017-09-06 16:05:17 +00:00
Chandler Carruth	585bfc8443	[x86] Fix PR34377 by disabling cmov conversion when we relied on it performing a zext of a register. On the PR there is discussion of how to more effectively handle this, but this patch prevents us from miscompiling code. Differential Revision: https://reviews.llvm.org/D37504 llvm-svn: 312620	2017-09-06 06:28:08 +00:00
Zvi Rackover	5ebe94a84d	X86 Tests: Tidy up AVX512 conversion tests. NFC. Rename functions to a consistent format to make it easier to track coverage. llvm-svn: 312619	2017-09-06 05:33:04 +00:00
Jatin Bhateja	80b5e38c4e	Updating a test reference for rL312608. Differential Revision: https://reviews.llvm.org/D37501 llvm-svn: 312614	2017-09-06 03:58:14 +00:00
Jatin Bhateja	2c139f77c7	[X86] Allow cross-lane permutations for sub targets supporting AVX2. Summary: Most instructions in AVX work “in-lane”, that is, each source element is applied only to other elements of the same lane, thus a cross lane permutation is costly and needs more than one instrution. AVX2 includes instructions to perform any-to-any permutation of words over a 256-bit register and vectorized table lookup. This should also Fix PR34369 Differential Revision: https://reviews.llvm.org/D37388 llvm-svn: 312608	2017-09-06 02:58:47 +00:00
Reid Kleckner	e33c94f1b0	Add llvm.codeview.annotation to implement MSVC __annotation Summary: This intrinsic represents a label with a list of associated metadata strings. It is modelled as reading and writing inaccessible memory so that it won't be removed as dead code. I think the intention is that the annotation strings should appear at most once in the debug info, so I marked it noduplicate. We are allowed to inline code with annotations as long as we strip the annotation, but that can be done later. Reviewers: majnemer Subscribers: eraman, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D36904 llvm-svn: 312569	2017-09-05 20:14:58 +00:00
Craig Topper	784fa8a4e3	[X86] Remove unnecessary (v4f32 (X86vzmovl (v4f32 (scalar_to_vector FR32X)))) patterns We had already disabled the pattern for SSE4.1 and SSE4.2. But it got re-enabled for AVX and AVX512. With SSE41 we rely on a separate (v4f32 (X86vzmovl VR128)) pattern to select blendps with a xorps to create zeroess. And a separate (v4f32 (scalar_to_vector FR32X)) to select a COPY_TO_REG_CLASS to move FR32 to VR128 The same thing can happen for AVX with vblendps and those separate patterns already exist. For AVX512, (v4f32 (X86vzmov VR128)) will select a VMOVSS instruction instead of VBLENDPS due to their not being a EVEX VBLENDPS. This is what we were getting out of the larger pattern anyway. So the larger pattern is unneeded for AVX512 too. For SSE1-SSSE3 we can rely on (v4f32 (X86vzmov VR128)) selecting a MOVSS similar to AVX512. Again this is what the larger pattern did too. So the only real change here is that AVX1/2 now properly outputs a VBLENDPS during isel instead of a VMOVSS to match SSE41. Most tests didn't notice because the two address instruction pass knows how to turn VMOVSS into VBLENDPS to get an independent destination register. llvm-svn: 312564	2017-09-05 19:09:02 +00:00
Zvi Rackover	2096893f34	X86 Tests: Adding missing AVX512 fptoui coverage tests. NFC. Some of the cases show missing pattern i intend to fix shortly. llvm-svn: 312560	2017-09-05 18:24:39 +00:00

... 3 4 5 6 7 ...

10598 Commits