Commit Graph

10598 Commits

Author SHA1 Message Date
Craig Topper 526b70a089 [X86] Use fsub in the movddup scheduling tests to prevent a future patch from folding movddup as a broadcast load.
llvm-svn: 315767
2017-10-13 21:56:45 +00:00
Craig Topper 5d692917f4 [X86] Add initial skeleton support for knm cpu
This adds Intel's Knights Mill CPU to valid CPU names for the backend. For now its an alias of "knl", but ultimately we need to support AVX5124FMAPS and AVX5124VNNIW instruction sets for it.

Differential Revision: https://reviews.llvm.org/D38811

llvm-svn: 315722
2017-10-13 18:10:17 +00:00
Simon Pilgrim c4977fa9a1 [X86] Test scalar integer absolutes on 32-bit targets with/without CMOV
llvm-svn: 315711
2017-10-13 17:09:20 +00:00
Reid Kleckner be3724b5e1 Not all buildbots seem to dump the nuw flag in SDAG
llvm-svn: 315710
2017-10-13 17:00:49 +00:00
Simon Pilgrim df9611e178 [X86] Updated scalar integer absolute tests to cover i8/i16/i32/i64
llvm-svn: 315706
2017-10-13 16:53:07 +00:00
Reid Kleckner c687a34870 Update test to expect nuw flag in SDAG dump, fixes test after r315690
llvm-svn: 315698
2017-10-13 16:13:23 +00:00
Craig Topper bf0de9d3b6 [X86] Remove patterns that select unmasked vbroadcastf2x32/vbroadcasti2x32. Prefer vbroadcastsd/vpbroadcastq instead.
There's no advantage to using these instructions when they aren't masked. This enables some additional execution domain switching without needing to update the table.

llvm-svn: 315674
2017-10-13 06:07:10 +00:00
Craig Topper 11655b22dc [X86] Add the test case for r315613 that I forgot to 'git add'.
llvm-svn: 315649
2017-10-13 00:20:47 +00:00
Craig Topper 3dc37cc592 [X86] Add a bunch of -mcpu strings to the cpus.ll test.
We were missing most of the "core" aliases as well as skylake, cannonlake, and knights landing.

llvm-svn: 315606
2017-10-12 18:55:57 +00:00
Wei Mi 1736efd16a Revert r307036 because of PR34919.
llvm-svn: 315540
2017-10-12 00:24:52 +00:00
Sanjay Patel 6c0aef77aa [x86] avoid infinite loop from SoftenFloatOperand (PR34866)
Legalization of fp128 assumes things that we should have asserts for,
so that's another potential improvement.

Differential Revision: https://reviews.llvm.org/D38771

llvm-svn: 315485
2017-10-11 18:24:21 +00:00
Sanjay Patel 34fd5eaaf0 [DAGCombiner] convert insertelement of bitcasted vector into shuffle
Eg:
insert v4i32 V, (v2i16 X), 2 --> shuffle v8i16 V', X', {0,1,2,3,8,9,6,7}

This is a generalization of the IR fold in D38316 to handle insertion into a non-undef vector. 
We may want to abandon that one if we can't find value in squashing the more specific pattern sooner.

We're using the existing legal shuffle target hook to avoid AVX512 horror with vXi1 shuffles.

There may be room for improvement in the shuffle lowering here, but that would be follow-up work.

Differential Revision: https://reviews.llvm.org/D38388

llvm-svn: 315460
2017-10-11 14:12:16 +00:00
Uriel Korach 782f28bf2f [X86] Added tests for TESTM and TESTNM (NFC)
Adding this test files now so after another commit that will add a new pattern for
TESTM and TESTNM instructions will show the improvemnts that have been done.

Change-Id: If3908b7f91897d764053312365a2bc1de78b291d
llvm-svn: 315443
2017-10-11 08:39:25 +00:00
Craig Topper 6ce20bd184 [X86] Add 128-bit version of vbroadcasti32x2 to shuffle comment decoding.
llvm-svn: 315395
2017-10-11 00:11:53 +00:00
Craig Topper bb0e316dc7 [X86] Add broadcast patterns that allow a scalar_to_vector between the broadcast and the load.
We already have these patterns for AVX512VL, but not AVX1 or 2.

llvm-svn: 315382
2017-10-10 22:40:31 +00:00
Sanjay Patel b74063d21f [x86] fix prefix typos for CHECK lines; NFC
llvm-svn: 315368
2017-10-10 21:12:47 +00:00
Adrian Prantl 16b8b47152 Debug Info: Fix the SDLoc propagation for a DAGCombiner rule
This patch ensures that the rule:
  fold (zext (load x)) -> (zext (truncate (zextload x)))
propagates the SDLoc of the load to the zextload.

<rdar://problem/33755881>

llvm-svn: 315340
2017-10-10 18:08:32 +00:00
Simon Pilgrim 053a299a9b [X86][AVX512] Regenerate element insertion/extraction tests
llvm-svn: 315322
2017-10-10 15:58:54 +00:00
Sanjay Patel 7d52c7ca74 [x86] add tests for insertelement; NFC
llvm-svn: 315312
2017-10-10 13:45:25 +00:00
Gadi Haber 2b132eb4f8 [X86][SKYLAKE] Update regression test to differentiate between HASWELL and SKYLAKE scheduling.<NFC>
NFC.
Updated 6 regression tests to differentiate between HASWELL and SKYLAKE scheduling information.

The fix is in preparation of a patch to update the information of the Skylake Client scheduling to include the appropriate load and store latencies.

Reviewers: zvi, RKSimon
Differential Revision: https://reviews.llvm.org/D38685

Change-Id: Ifc6b98d9eaf266913698f24c766fd994fc977555
llvm-svn: 315291
2017-10-10 09:53:18 +00:00
Craig Topper a88306e6fb [AVX512] Add patterns to commute integer comparison instructions during isel.
This enables broadcast loads to be commuted and allows normal loads to be folded without the peephole pass.

llvm-svn: 315274
2017-10-10 06:36:46 +00:00
Reid Kleckner ab23dace56 [MC] Suppress .Lcfi labels when emitting textual assembly
Summary:
This suppresses the generation of .Lcfi labels in our textual assembler.
It was annoying that this generated cascading .Lcfi labels:
  llc foo.ll -o - | llvm-mc | llvm-mc

After three trips through MCAsmStreamer, we'd have three labels in the
output when none are necessary. We should only bother creating the
labels and frame data when making a real object file.

This supercedes D38605, which moved the entire .seh_ implementation into
MCObjectStreamer.

This has the advantage that we do more checking when emitting textual
assembly, as a minor efficiency cost. Outputting textual assembly is not
performance critical, so this shouldn't matter.

Reviewers: majnemer, MatzeB

Subscribers: qcolombet, nemanjai, javed.absar, eraman, hiraditya, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D38638

llvm-svn: 315259
2017-10-10 00:57:36 +00:00
Aditya Nandakumar c3bfc81a1f [GISel]: Fix generation of illegal COPYs during CallLowering
We end up creating COPY's that are either truncating/extending and this
should be illegal.

https://reviews.llvm.org/D37640

Patch for X86 and ARM by igorb, rovka

llvm-svn: 315240
2017-10-09 20:07:43 +00:00
Zvi Rackover c1d5955684 [X86] Unsigned saturation subtraction canonicalization [the backend part]
Summary:
On behalf of julia.koval@intel.com

The patch transforms canonical version of unsigned saturation, which is sub(max(a,b),a) or sub(a,min(a,b)) to special psubus insturuction on targets, which support it(8bit and 16bit uints).
umax(a,b) - b -> subus(a,b)
a - umin(a,b) -> subus(a,b)

There is also extra case handled, when right part of sub is 32 bit and can be truncated, using UMIN(this transformation was discussed in https://reviews.llvm.org/D25987).

The example of special case code:

```
void foo(unsigned short *p, int max, int n) {

  int i;
  unsigned m;
  for (i = 0; i < n; i++) {
    m = *--p;
    *p = (unsigned short)(m >= max ? m-max : 0);
  }
}
```
Max in this example is truncated to max_short value, if it is greater than m, or just truncated to 16 bit, if it is not. It is vaid transformation, because if max > max_short, result of the expression will be zero.

Here is the table of types, I try to support, special case items are bold:

| Size | 128 | 256 | 512
| -----  | -----  | -----   | -----
| i8 | v16i8 | v32i8 | v64i8
| i16 | v8i16 | v16i16 | v32i16
| i32 | | **v8i32** | **v16i32**
| i64 | | | **v8i64**

Reviewers: zvi, spatel, DavidKreitzer, RKSimon

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37534

llvm-svn: 315237
2017-10-09 20:01:10 +00:00
Sanjay Patel 2a61a821a0 [DAG] combine assertsexts around a trunc
This was a suggested follow-up to:
D37017 / https://reviews.llvm.org/rL313577

llvm-svn: 315206
2017-10-09 15:22:20 +00:00
Sanjay Patel 8557e29408 [x86] regenerate test checks; NFC
llvm-svn: 315204
2017-10-09 15:01:58 +00:00
Craig Topper 4f8656a7af [X86] Enable extended comparison predicate support for SETUEQ/SETONE when targeting AVX instructions.
We believe that despite AMD's documentation, that they really do support all 32 comparision predicates under AVX.

Differential Revision: https://reviews.llvm.org/D38609

llvm-svn: 315201
2017-10-09 01:05:15 +00:00
Simon Pilgrim 135a2639f4 [X86][SSE] Add test case for PR27708
llvm-svn: 315186
2017-10-08 19:18:10 +00:00
Craig Topper 977c546b0c [X86] Regenerate fast-isel-select-pseudo-cmov.ll to prepare for D38609.
llvm-svn: 315184
2017-10-08 17:54:50 +00:00
Simon Pilgrim dc32c844f9 [X86] getTargetConstantBitsFromNode - add support for decoding scalar constants
llvm-svn: 315182
2017-10-08 17:21:18 +00:00
Craig Topper c97775c03c [X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI versions of scalar arithmetic patterns
Summary:
We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD.

Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary.

Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd.

This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type.

The rest of the patch is removing all the excess scalar patterns.

Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint.

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38023

llvm-svn: 315181
2017-10-08 16:57:23 +00:00
Simon Pilgrim 6410ca70aa [X86][XOP] Add XOP oddshuffles tests
XOP codegen is often different to generic AVX - thank you vpperm!

llvm-svn: 315176
2017-10-08 12:58:15 +00:00
Gadi Haber 684944b822 [X86][SKX] Adding the scheduling information for the SKX target.
Adding the scheduling information for the SkylakeServer (SKX) target.

This patch adds the instruction scheduling information for the SkylakeServer (SKX) architecture target by adding the file X86SchedSkylakeServer.td located under the X86 Target.
We used the scheduling information retrieved from the Skylake architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction.

The patch continues the scheduling replacement and insertion effort started with the SNB target in r310792, the HSW target in r311879 and the SkylakeClient (SKL) target in rL313613.

Please expect some performance fluctuations due to code alignment effects.

Reviewers: zvi, RKSimon, craig.topper, chandlerc, aymanmu
Differential Revision: https://reviews.llvm.org/D38443

Change-Id: I5c228fcc09e9e5a99b6116e62b356c4f9b971185
llvm-svn: 315175
2017-10-08 12:52:54 +00:00
Craig Topper bbca2f2978 [X86] Stop LowerSIGN_EXTEND_AVX512 from creating v8i16/v16i16/v16i8 vselects with a v8i1/v16i1 condition when BWI is not available.
Some of the tests in vector-shuffle-v1.ll would get into an infinite loop without this.

llvm-svn: 315172
2017-10-08 08:50:59 +00:00
Craig Topper 27170fee8d [X86] If we see an insert of a bitcast into zero vector, canonicalize it to move the bitcast to the other side of the insert.
This improves detection of zeroing of upper bits during isel.

llvm-svn: 315161
2017-10-08 01:33:41 +00:00
Simon Pilgrim 9508fe7924 [X86][SSE] Match bitcasted BUILD_VECTOR of constants for v2i64 shifts on 64-bit targets (PR34855)
Extension to rL315155, generate constant shifts on 64-bits as well as 32-bits.

llvm-svn: 315156
2017-10-07 17:57:22 +00:00
Simon Pilgrim 70e1db78db [X86][SSE] Match bitcasted v4i32 BUILD_VECTORS for v2i64 shifts on 64-bit targets (PR34855)
We were already doing this for 32-bit targets, but we can generate these on 64-bits as well.

llvm-svn: 315155
2017-10-07 17:42:17 +00:00
Craig Topper 2f60295364 [X86] Add X86ISD::CMOV to computeKnownBitsForTargetNode and ComputeNumSignBitsForTargetNode.
Summary: Implementations based on ISD::SELECT.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38663

llvm-svn: 315153
2017-10-07 16:51:19 +00:00
Simon Pilgrim 73f143e774 [X86][SSE] Improve shuffling combining with horizontal operations
Recognise cases when we can merge the shuffles with their horizontal (HADD/HSUB/PACK) instruction inputs.

Replaces an older implementation which performed some of this during lowering, expanding an existing target shuffle combine stage instead.

Differential Revision: https://reviews.llvm.org/D38506

llvm-svn: 315150
2017-10-07 12:42:23 +00:00
Cameron McInally 9d64101fe8 [AVX512] Fix TERNLOG when folding broadcast
Patch to fix ternlog instructions with a folded
broadcast. The broadcast decorator, e.g. {1toX}, was missing.

Differential Revision: https://reviews.llvm.org/D38649

llvm-svn: 315122
2017-10-06 22:31:29 +00:00
Simon Pilgrim a29dbdf2ca [X86][SSE] Add SKX cpu tests to SSE/AVX scheduling tests (D38443)
llvm-svn: 315061
2017-10-06 13:40:29 +00:00
Artur Pilipenko 7b15254c8f [X86] Fix chains update when lowering BUILD_VECTOR to a vector load
The code which lowers BUILD_VECTOR of consecutive loads into a single vector
load doesn't update chains properly. As a result the vector load can be
reordered with the store to the same location.

The current code in EltsFromConsecutiveLoads only updates the chain following
the first load. The fix is to update the chains following all the loads
comprising the vector.

This is a fix for PR10114.

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D38547

llvm-svn: 314988
2017-10-05 16:28:21 +00:00
Simon Pilgrim 9edbe110e8 [X86][AVX] Improve (i8 bitcast (v8i1 x)) handling for v8i64/v8f64 512-bit vector compare results.
AVX1/AVX2 targets were missing a chance to use vmovmskps for v8f32/v8i32 results for bool vector bitcasts

llvm-svn: 314921
2017-10-04 18:00:42 +00:00
Hans Wennborg 2a6c9adb2f Revert r314886 "[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)"
It broke the Chromium / SQLite build; see PR34830.

> Summary:
>    1/  Operand folding during complex pattern matching for LEAs has been
>        extended, such that it promotes Scale to accommodate similar operand
>        appearing in the DAG.
>        e.g.
>          T1 = A + B
>          T2 = T1 + 10
>          T3 = T2 + A
>        For above DAG rooted at T3, X86AddressMode will no look like
>          Base = B , Index = A , Scale = 2 , Disp = 10
>
>    2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
>        so that if there is an opportunity then complex LEAs (having 3 operands)
>        could be factored out.
>        e.g.
>          leal 1(%rax,%rcx,1), %rdx
>          leal 1(%rax,%rcx,2), %rcx
>        will be factored as following
>          leal 1(%rax,%rcx,1), %rdx
>          leal (%rdx,%rcx)   , %edx
>
>    3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
>       thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
>
> Reviewed By: lsaba
>
> Subscribers: jmolloy, spatel, igorb, llvm-commits
>
>     Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 314919
2017-10-04 17:54:06 +00:00
Simon Pilgrim b47b3f2564 [X86][SSE] Add support for lowering v8i16 binary shuffles to PACKSS/PACKUS
Missed in D38472

llvm-svn: 314916
2017-10-04 17:31:28 +00:00
Craig Topper 6fb55716e9 [X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input instead of FR32/FR64
This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted.

This should fix PR33079

Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results.

Differential Revision: https://reviews.llvm.org/D38449

llvm-svn: 314914
2017-10-04 17:20:12 +00:00
Simon Pilgrim bd5d2f0284 [X86][SSE] Add support for lowering unary shuffles to PACKSS/PACKUS
Extension to D38472

llvm-svn: 314901
2017-10-04 13:12:08 +00:00
Jatin Bhateja 3c29bacd43 [X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)
Summary:
   1/  Operand folding during complex pattern matching for LEAs has been
       extended, such that it promotes Scale to accommodate similar operand
       appearing in the DAG.
       e.g.
         T1 = A + B
         T2 = T1 + 10
         T3 = T2 + A
       For above DAG rooted at T3, X86AddressMode will no look like
         Base = B , Index = A , Scale = 2 , Disp = 10

   2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
       so that if there is an opportunity then complex LEAs (having 3 operands)
       could be factored out.
       e.g.
         leal 1(%rax,%rcx,1), %rdx
         leal 1(%rax,%rcx,2), %rcx
       will be factored as following
         leal 1(%rax,%rcx,1), %rdx
         leal (%rdx,%rcx)   , %edx

   3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
      thus avoiding creation of any complex LEAs within a loop.

Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy

Reviewed By: lsaba

Subscribers: jmolloy, spatel, igorb, llvm-commits

    Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 314886
2017-10-04 09:02:10 +00:00
Martin Storsjo e14145dcb0 [X86] Fix using the SJLJ jump table on x86_64
The previous version didn't work if the jump table base address didn't
fit in 32 bit, since it was encoded as an immediate offset. And in case
the jump table is encoded as 32 bit label differences, we need to
load and add them to the table base first.

This solves the first half of the issues mentioned in PR34720.

Also fix some of the errors pointed out by -verify-machineinstrs, by
using GR32_NOSPRegClass.

Differential Revision: https://reviews.llvm.org/D38333

llvm-svn: 314876
2017-10-04 05:12:10 +00:00
Simon Pilgrim 261a6c29ea [X86] Add non-SSE tests for PR15215 as well
llvm-svn: 314815
2017-10-03 17:04:36 +00:00
Geoff Berry fabedbad11 Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
This reverts commit r314729.

Another bug has been encountered in an out-of-tree target reported by Quentin.

llvm-svn: 314814
2017-10-03 16:59:13 +00:00
Simon Pilgrim 46a804cfd9 [X86][SSE] Add bool vector extraction test cases from PR15215
llvm-svn: 314813
2017-10-03 16:56:57 +00:00
Simon Pilgrim cf99d069c3 [X86][SSE] Add support for decoding PACKSS/PACKUS shuffles masks with UNDEF
llvm-svn: 314792
2017-10-03 12:41:39 +00:00
Simon Pilgrim f5f291d129 [X86][SSE] Add support for lowering shuffles to PACKSS/PACKUS
If the upper bits of a truncation shuffle patterns have at least the minimum number of sign/zero bits on their inputs then we can safely use PACKSS/PACKUS as shuffles.

Partial fix for https://bugs.llvm.org/show_bug.cgi?id=34773

Differential Revision: https://reviews.llvm.org/D38472

llvm-svn: 314788
2017-10-03 12:01:31 +00:00
Simon Pilgrim 640fbf5132 [X86][SSE] Add support for shuffle combining from PACKSS/PACKUS
Mentioned in D38472

llvm-svn: 314777
2017-10-03 09:54:03 +00:00
Simon Pilgrim 19d535e75b [X86][SSE] Add support for PACKSS/PACKUS constant folding
Pulled out of D38472

llvm-svn: 314776
2017-10-03 09:41:00 +00:00
Martin Storsjo 1e54738676 [X86] Provide the LSDA pointer with RIP relative addressing if necessary
This makes sure the LSDA pointer isn't truncated to 32 bit.

Make LowerINTRINSIC_WO_CHAIN a member function instead of a static
function, so that it can use the getGlobalWrapperKind method.

This solves the second half of the issues mentioned in PR34720.

Differential Revision: https://reviews.llvm.org/D38343

llvm-svn: 314767
2017-10-03 06:29:58 +00:00
Geoff Berry bfc5fb4571 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues addressed since original review:
- Avoid bug in regalloc greedy/machine verifier when forwarding to use
  in an instruction that re-defines the same virtual register.
- Fixed bug when forwarding to use in EarlyClobber instruction slot.
- Fixed incorrect forwarding to register definitions that showed up in
  explicit_uses() iterator (e.g. in INLINEASM).
- Moved removal of dead instructions found by
  LiveIntervals::shrinkToUses() outside of loop iterating over
  instructions to avoid instructions being deleted while pointed to by
  iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
  doing so can break code that implicitly relies on the physical
  register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
  can break the machine verifier by creating LiveRanges that don't
  end on a use (since the undef operand is not considered a use).

  [MachineCopyPropagation] Extend pass to do COPY source forwarding

  This change extends MachineCopyPropagation to do COPY source forwarding.

  This change also extends the MachineCopyPropagation pass to be able to
  be run during register allocation, after physical registers have been
  assigned, but before the virtual registers have been re-written, which
  allows it to remove virtual register COPY LiveIntervals that become dead
  through the forwarding of all of their uses.

llvm-svn: 314729
2017-10-02 22:01:37 +00:00
Simon Pilgrim 65bde8806f [X86][SSE] Add PACKSS/PACKUS constant folding tests
llvm-svn: 314682
2017-10-02 15:43:26 +00:00
Simon Pilgrim ec528b2cb6 Regenerate test (missing broadcast constant comments). NFCI.
Still avoiding the floating point comments to prevent linux/windows discrepancies.

llvm-svn: 314681
2017-10-02 15:22:35 +00:00
Simon Pilgrim 050f60370c Regenerate test (missing broadcast constant comments). NFCI.
llvm-svn: 314680
2017-10-02 15:21:14 +00:00
Simon Pilgrim 037f6d10d5 Regenerate test. NFCI.
llvm-svn: 314679
2017-10-02 15:16:30 +00:00
Michael Zuckerman e4084f6bdb [X86][LLVM]Expanding Supports lowerInterleaved{store|load}() in X86InterleavedAccess (VF64 stride 3-4)
I continue to support different VF interleaved and in this pass for this patch,
I added the vf64 stride3 support for both load and store.
I also added support fot the stride4 store.

Reviewers:
1. zvi
2. dorit
3. igorb
4. guyblank

Differential Revision: https://reviews.llvm.org/D37687

Change-Id: I3d238efedf217d1768b348d710de1efa2f19d27b
llvm-svn: 314651
2017-10-02 07:35:25 +00:00
Craig Topper c20b46da2f [X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMem
Summary:
Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable.

For the register&register form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register&register form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem.

I believe this supercedes D38025 which was trying to switch the register&register form back to pre-PR22995.

Reviewers: aymanmus, RKSimon, zvi

Reviewed By: aymanmus

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38120

llvm-svn: 314639
2017-10-01 23:53:53 +00:00
Simon Pilgrim df23a2700d [X86][SSE] Add faux shuffle combining support for PACKUS
llvm-svn: 314631
2017-10-01 18:43:48 +00:00
Simon Pilgrim 4f255ad6a0 [X86][AVX2] Simplify PACKUS combine test
Trying to use a AND mask is tricky as after legalization its nigh impossible for computeKnownBits to do anything with it

llvm-svn: 314630
2017-10-01 18:17:39 +00:00
Simon Pilgrim 836fa6dcfd [X86][SSE] Improve shuffle combining of PACKSS instructions.
Support unary packing and fix the faux shuffle mask for vectors larger than 128 bits.

llvm-svn: 314629
2017-10-01 17:54:55 +00:00
Simon Pilgrim d25c200cd6 [X86][SSE] Add shuffle combining tests with PACKSS/PACKUS
llvm-svn: 314628
2017-10-01 17:30:44 +00:00
Jina Nahias 98c7f91e54 pre-commit adding test for broadcastm pattern
Differential Revision: https://reviews.llvm.org/D38312

Change-Id: Ifbc4189549f2f59995019a86f85f989c04e4d37d
llvm-svn: 314626
2017-10-01 14:25:21 +00:00
Michael Zuckerman 1746895490 Adding test for interleved, case stride 4 vf64 store<NFC>.
Change-Id: I9ea62aac81b763c83d26613dca6fcd846997a017
llvm-svn: 314621
2017-10-01 09:37:38 +00:00
Xin Tong bffac0eb81 Fix typo. NFC
llvm-svn: 314615
2017-10-01 00:10:52 +00:00
Xin Tong c063c3f09d Revert "Fix typo [NFC]"
This reverts commit e60b5028619be1c81bd039d63a0627dac32d38f9.

Incorrectly include changes that are not typo fix.

llvm-svn: 314614
2017-10-01 00:09:53 +00:00
Xin Tong efec219e1b Fix typo [NFC]
llvm-svn: 314613
2017-10-01 00:07:24 +00:00
Simon Pilgrim 1fffcc4580 Regenerate mul combine tests to update broadcast comment.
llvm-svn: 314607
2017-09-30 22:27:46 +00:00
Simon Pilgrim a8dd6f4f30 [X86][SSE] Fold (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1
Remove sign extend in register style pattern if the sign is already extended enough

llvm-svn: 314599
2017-09-30 17:57:34 +00:00
Craig Topper 619569841a [AVX-512] Add patterns to make fp compare instructions commutable during isel.
llvm-svn: 314598
2017-09-30 17:02:39 +00:00
Simon Pilgrim 5bd43bce07 [X86][SSE] Add vector truncation cases inspired by PR34773
We should be using PACKSS/PACKUS more aggressively when we know the state of the upper bits

llvm-svn: 314597
2017-09-30 16:14:59 +00:00
Michael Zuckerman b92b6d424f Code refactoring for the interleaved code <NFC>
Change-Id: I7831c9febad8e14278a5bc87584a0053dc837be1
llvm-svn: 314596
2017-09-30 14:55:03 +00:00
Gadi Haber c3b33f0f0d [X86][SKX] Added codegen regression test for avx512 instructions scheduling.NFC.
NFC.
 Added code gen regression tests for avx512 instructions scheduling called avx512-schedule.ll and
 avx512-shuffle-schedule.ll.
 This patch is in preparation of a larger patch of adding all SKX instruction scheduling and therefore
 the scheduling for the avx512 instructions are still missing.

Reviewers: zvi, delena, RKSimon, igorb
Differential Revision: https://reviews.llvm.org/D38035

Change-Id: I792762763127a921b9e13684b58af03646536533
llvm-svn: 314594
2017-09-30 14:30:23 +00:00
Craig Topper d92ade96f4 [X86] Support v64i8 mulhu/mulhs
Implemented by splitting into two v32i8 mulhu/mulhs and concatenating the results.

Differential Revision: https://reviews.llvm.org/D38307

llvm-svn: 314584
2017-09-30 04:21:46 +00:00
Amara Emerson 7d6c55f8aa [X86] Improve codegen for inverted overflow checking intrinsics.
Adds a new combine for: xor(setcc cc, val), 1 --> setcc (invert(cc), val)

Differential Revision: https://reviews.llvm.org/D38161

llvm-svn: 314514
2017-09-29 13:53:44 +00:00
Simon Pilgrim 2b96841d1d [X86][SSE] Added more tests for vector multiplications as utility for D37896
Added additional tests for vector multiplications with multipliers that are:
 * powers of 2 displaced by 1,
 * product of a power of 2 displaced by one with another power of 2.

Patch by @pacxx (Michael Haidl)

Differential Revision: https://reviews.llvm.org/D38350

llvm-svn: 314504
2017-09-29 10:02:01 +00:00
Craig Topper 6255c7b675 [X86] Don't select (cmp (and, imm), 0) to testw
Summary:
X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters.

I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs.

Reviewers: RKSimon, zvi, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38273

llvm-svn: 314474
2017-09-28 23:35:36 +00:00
Sanjay Patel 4664d77316 [x86] add tests for possible insertelement to shuffle transform; NFC
See PR34716 and D38316 for more discussion.

llvm-svn: 314466
2017-09-28 22:27:25 +00:00
Craig Topper ed19350293 [X86] Make use of vpmovwb when possible in LowerMULH
If we have BWI, we can truncate in a much simpler way by using vpmovwb. This even works without VLX by using the wider zmm->ymm truncate with a subvector extract.

Differential Revision: https://reviews.llvm.org/D38375

llvm-svn: 314457
2017-09-28 20:10:34 +00:00
Matthias Braun 5c3e8a450e MIR: Serialize CaleeSavedInfo Restored flag
llvm-svn: 314449
2017-09-28 18:52:14 +00:00
Craig Topper 56bfbfb117 [AVX512] Add avx512bw command lines to 128-bit idiv tests.
The multiply lowering on some of the tests can take advantage of the vpmovwb to simplify the truncate.

llvm-svn: 314448
2017-09-28 18:45:29 +00:00
Craig Topper ceff6da6e9 [X86] Use BWI instructions to improve lowering of v32i8 MULHU/S
Summary: If we have BWI instructions we can widen to v32i16 to do the multiply instead of splitting.

Reviewers: RKSimon, spatel, zvi

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38305

llvm-svn: 314432
2017-09-28 17:00:21 +00:00
Craig Topper 71a8cf9f99 [X86] Use correct subvector index when combining two insert subvectors featuring zero vectors.
Previously we were using one of the subvector indices twice. The included test case causes an assert without this change.

Thanks to Simon Pilgrim for catching this.

llvm-svn: 314429
2017-09-28 16:53:16 +00:00
Amara Emerson bb16282fb1 [X86] Add overflow intrinsic test in preparation for D38161.
This commit adds the test file before codegen changes as requested in
D38161 to make it easier to see the difference.

llvm-svn: 314416
2017-09-28 13:43:48 +00:00
Jatin Bhateja 75001c9ed8 [X86] Adding more cases to horizontal [f]add/[f]sub for avx512.
Reviewers: jbhateja

Reviewed By: jbhateja

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38344

llvm-svn: 314385
2017-09-28 07:40:52 +00:00
Jessica Paquette 4cf187b5b4 [MachineOutliner] AArch64: Avoid saving + restoring LR if possible
This commit allows the outliner to avoid saving and restoring the link register
on AArch64 when it is dead within an entire class of candidates.

This introduces changes to the way the outliner interfaces with the target.
For example, the target now interfaces with the outliner using a
MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and
getOutliningFrameOverhead.

This also improves several comments on the outliner's cost model.

https://reviews.llvm.org/D36721

llvm-svn: 314341
2017-09-27 20:47:39 +00:00
Craig Topper c16a472966 Revert r314249 "Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST."""
This caused PR34751

llvm-svn: 314339
2017-09-27 20:34:17 +00:00
Than McIntosh dee2cf67ea [CodeGen] Emit necessary .note sections for -fsplit-stack
Summary:
According to https://gcc.gnu.org/wiki/SplitStacks, the linker expects a zero-sized .note.GNU-split-stack section if split-stack is used (and also .note.GNU-no-split-stack section if it also contains non-split-stack functions), so it can handle the cases where a split-stack function calls non-split-stack function.

This change adds the sections if needed.

Fixes PR #34670.

Reviewers: thanm, rnk, luqmana

Reviewed By: rnk

Subscribers: llvm-commits

Patch by Cherry Zhang <cherryyz@google.com>

Differential Revision: https://reviews.llvm.org/D38051

llvm-svn: 314335
2017-09-27 19:34:00 +00:00
Craig Topper 05f71dd036 [X86] In combineLoopSADPattern, pad result with zeros and use full size add instead of using a smaller add and inserting.
In some cases the result psadbw is smaller than the type of the add that started the match. Currently in these cases we are using a smaller add and inserting the result.

If we instead combine the psadbw with zeros and use the full size add we can take advantage of implicit zeroing we get if we emit a narrower move before the add.

In a future patch, I want to make isel aware that the psadbw itself already zeroed the upper bits and remove the move entirely.

Differential Revision: https://reviews.llvm.org/D37453

llvm-svn: 314331
2017-09-27 18:36:45 +00:00
Gadi Haber 87337a2bb9 [X86][SKX][KNL] Updated regression tests to use -mattr instead of -mcpu flag.NFC.
NFC.
 Updated 8 regression tests to use -mattr instead of -mcpu flag as follows:
 -mcpu=knl --> -mattr=+avx512f
 -mcpu=skx --> -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq

The updates are as part of the preparation of a large commit to add all instruction scheduling for the SKX target.

Reviewers: delena, zvi, RKSimon
Differential Revision: https://reviews.llvm.org/D38222

Change-Id: I2381c9b5bb75ecacfca017243c22d054f6eddd14
llvm-svn: 314306
2017-09-27 14:44:15 +00:00
Zvi Rackover eb7a0bf847 X86 Tests: Unsigned saturation subtraction tests. NFC.
Summary:
Adding tests for D37534.

Commit on behalf of julia.koval@intel.com

Reviewers: n.bozhenov, zvi, spatel, DavidKreitzer

Reviewed By: zvi

Differential Revision: https://reviews.llvm.org/D37510

llvm-svn: 314305
2017-09-27 14:38:05 +00:00
Simon Pilgrim 3b0d9e789e [X86][AVX] Improve (i4 bitcast (v4i1 x)) handling for 256-bit vector compare results.
As commented on D37849 and rL313547, AVX1 targets were missing a chance to use vmovmskpd for v4f64/v4i64 results for bool vector bitcasts

llvm-svn: 314293
2017-09-27 10:10:17 +00:00
Martin Storsjo aa1533bf9b [X86] Fix SJLJ struct offsets for x86_64
This is necessary, but not sufficient, for having working SJLJ exception
handling on x86_64.

Differential Revision: https://reviews.llvm.org/D38254

llvm-svn: 314277
2017-09-27 06:08:23 +00:00
Martin Storsjo eccaf04e40 [X86] Remove erroneous callsite offsetting in SJLJ landing pads
The callsite value is already stored indexed from 0 in
the _Unwind_Context struct. When accessed via the functions
_Unwind_GetIP and _Unwind_SetIP, the value is indexed from 1,
but those functions handle the offseting. When reading directly
from the struct here, we shouldn't subtract 1.

This matches the code generated by the ARM target, where SJLJ
exception handling is used by default on iOS.

This makes clang-built object files for 32 bit x86 mingw work when
linked with libgcc/libstdc++.

Differential Revision: https://reviews.llvm.org/D38251

llvm-svn: 314276
2017-09-27 06:08:16 +00:00
Martin Storsjo 233349fe51 [X86] Correct byte offsets and data types in a comment. NFC.
This matches the types of the struct members defined in
lib/CodeGen/SjLjEHPrepare.cpp, and the definition of this struct in libgcc.

Differential Revision: https://reviews.llvm.org/D38248

llvm-svn: 314275
2017-09-27 06:08:04 +00:00
Craig Topper 177a3923ce [X86] Use extract128BitVector in LowerMULH so we can extract from constant build vectors.
llvm-svn: 314274
2017-09-27 06:04:55 +00:00
Craig Topper 31ccd4727b [X86] Add avx512bw command lines to the 256-bit vector idiv tests.
Some of the operations are being sign extended to 512 bits with avx512bw.

llvm-svn: 314272
2017-09-27 05:17:15 +00:00
Craig Topper 7f0eeb428b Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST.""
The late MOV8rr_NOREX that caused the crash has been removed.

llvm-svn: 314249
2017-09-26 21:35:09 +00:00
Michael Zuckerman 645f777e40 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF{8|16|32} stride 3)
This patch expands the support of lowerInterleavedStore to {8|16|32}x8i stride 3.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) .
This patch is part two of two patches and it covers the store (interlevaed) side.

The patch goal is to optimize the following sequence:
a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7

into
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7

Reviewers:
zvi
guyblank
dorit
Ayal

Differential Revision: https://reviews.llvm.org/D37117

Change-Id: I56ced8bcbea809a37654060771911ade20246ccc
llvm-svn: 314234
2017-09-26 18:49:11 +00:00
Craig Topper f51913155c [X86] Add support for v16i32 UMUL_LOHI/SMUL_LOHI
Summary: This patch extends the v8i32/v4i32 custom lowering to support v16i32

Reviewers: zvi, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38274

llvm-svn: 314221
2017-09-26 16:43:57 +00:00
Coby Tayree f191fdc3fb [x86] fix pr29061
https://bugs.llvm.org//show_bug.cgi?id=29061
Don't try referencing REX-needed regs when not on 64bit mode
Aligns to GCC

Differetial Revision: https://reviews.llvm.org/D37801

llvm-svn: 314203
2017-09-26 13:28:05 +00:00
Benjamin Kramer 4b2113a303 Revert "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST."
Makes llc crash. This reverts commit r314151.

llvm-svn: 314199
2017-09-26 10:25:27 +00:00
Uriel Korach 0ecc984b1b [X86] Finishing broadcastf32x2 and broadcasti32x2 intrinsics lowering to IR. llvm side.
Removing X86 broadcast(f/i)32x2 intrinsics from llvm.
Adding autoUpgrade support.
Moving matching tests from avx512dq-intrinsics.ll to avx512dq-intrinsics-upgrade.ll and from avx512dqvl-intrinsics.ll to avx512dqvl-intrinsics-upgrade.ll.

Differential Revision: https://reviews.llvm.org/D38220

llvm-svn: 314195
2017-09-26 07:39:39 +00:00
Saleem Abdulrasool 2e0d72311b X86: remove R12 from CSR on Windows x64 SwiftCC
R12 is used for the SwiftError parameter.  It is no longer a CSR as it
is used for transfer the SwiftError, and the caller must preserve it if
they need to.

llvm-svn: 314165
2017-09-25 22:00:17 +00:00
Craig Topper 5124a14d9c [X86] Don't select anyext GR32->GR64 to SUBREG_TO_REG. Use INSERT_SUBREG instead.
As far as I know SUBREG_TO_REG is stating that the upper bits are 0. But if we are just converting the GR32 with no checks, then we have no reason to say the upper bits are 0.

I don't really know how to test this today since I can't find anything that looks that closely at SUBREG_TO_REG. The test changes here seems to be some perturbance of register allocation.

Differential Revision: https://reviews.llvm.org/D38001

llvm-svn: 314152
2017-09-25 21:14:59 +00:00
Craig Topper d830f276c1 [X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST.
llvm-svn: 314151
2017-09-25 21:14:55 +00:00
Craig Topper 5bc10ede53 [SelectionDAG] Teach simplifyDemandedBits to handle shifts by constant splat vectors
This teach simplifyDemandedBits to handle constant splat vector shifts.

This required changing some uses of getZExtValue to getLimitedValue since we can't rely on legalization using getShiftAmountTy for the shift amount.

I believe there may have been a bug in the ((X << C1) >>u ShAmt) handling where we didn't check if the inner shift was too large. I've fixed that here.

I had to add new patterns to ARM because the zext/sext the patterns were trying to look for got turned into an any_extend with this patch. Happy to split that out too, but not sure how to test without this change.

Differential Revision: https://reviews.llvm.org/D37665

llvm-svn: 314139
2017-09-25 19:26:08 +00:00
Reid Kleckner 8898cd8dcf [DebugInfo] Sort the SDDbgValue list before assuming it is in IR order
Summary:
This code iterates the 'Orders' vector in parallel with the DbgValue
list, emitting all DBG_VALUEs that occurred between the last IR order
insertion point and the next insertion point. This assumes the
SDDbgValue list is sorted in IR order, which it usually is. However, it
is not sorted when a node with a debug value is replaced with another
one. When this happens, TransferDbgValues is called, and the new value
is added to the end of the list.

The problem can be solved by stably sorting the list by IR order.

Reviewers: aprantl, Ka-Ka

Reviewed By: aprantl

Subscribers: MatzeB, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D38197

llvm-svn: 314114
2017-09-25 16:14:53 +00:00
Michael Zuckerman 4a97df01c4 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF8 stride 4):
This patch expands the support of lowerInterleavedStore to 8x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2.
In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future.

The patch goal is to optimize the following sequence:
At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding
each 8 chars:

c0, c1, , c7
m0, m1, , m7
y0, y1, , y7
k0, k1, ., k7

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Reviewers
DavidKreitzer
Farhana
zvi
igorb
guyblank
RKSimon
Ayal

Differential Revision: https://reviews.llvm.org/D36058

Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32
llvm-svn: 314109
2017-09-25 14:50:38 +00:00
Craig Topper 47e14ead54 [X86] Make IFMA instructions during isel so we can fold broadcast loads.
This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us.

llvm-svn: 314083
2017-09-24 19:30:55 +00:00
Craig Topper 4ffd90c504 [X86] Add tests to show missed opportunities to fold broadcast loads into IFMA instructions when the load is on operand1 of the instrinsic.
We need to enable commuting during isel to catch this since the load folding tables can't handle broadcasts.

llvm-svn: 314082
2017-09-24 19:30:54 +00:00
Craig Topper 23f1830748 [X86] Add IFMA instructions to the load folding tables and make them commutable for the multiply operands.
llvm-svn: 314080
2017-09-24 17:28:14 +00:00
Simon Pilgrim e1335b1c75 [X86][SSE] Add more tests for shuffle combining with extracted vector elements (PR22415)
llvm-svn: 314077
2017-09-24 13:45:49 +00:00
Simon Pilgrim a705db9a9e [X86][SSE] Add support for extending bool vectors bitcasted from scalars
This patch acts as a reverse to combineBitcastvxi1 - bitcasting a scalar integer to a boolean vector and extending it 'in place' to the requested legal type.

Currently this doesn't handle AVX512 at all - but the current mask register approach is lacking for some cases.

Differential Revision: https://reviews.llvm.org/D35320

llvm-svn: 314076
2017-09-24 13:42:31 +00:00
Craig Topper eb5c411218 [AVX-512] Add pattern for selecting masked version of v8i32/v8f32 compare instructions when VLX isn't available.
We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'.

llvm-svn: 314072
2017-09-24 05:24:52 +00:00
Simon Pilgrim 026727f861 [X86] Regenerate i64 to v2f32 bitcast test
llvm-svn: 314068
2017-09-23 19:18:29 +00:00
Sanjay Patel fa8bad8a0f [x86] reduce 64-bit mask constant to 32-bits by right shifting
This is a follow-up from D38181 (r314023). We have to put 64-bit
constants into a register using a separate instruction, so we
should try harder to avoid that.

From what I see, we're not likely to encounter this pattern in the 
DAG because the upstream setcc combines from this don't (usually?) 
produce this pattern. If we fix that, then this will become more 
relevant. Since the cost of handling this case is just loosening 
the predicate of the existing fold, we might as well do it now.

llvm-svn: 314064
2017-09-23 14:32:07 +00:00
Sanjay Patel 5ca9f7a0cb [x86] add an add+shift test for follow-up suggestion from D38181; NFC
llvm-svn: 314063
2017-09-23 14:24:07 +00:00
Sanjay Patel ac76201d4e [x86] remove over-specified platform from test config
llvm-svn: 314027
2017-09-22 21:07:13 +00:00
Sanjay Patel 3339954fa3 [x86] swap order of srl (and X, C1), C2 when it saves size
The (non-)obvious win comes from saving 3 bytes by using the 0x83 'and' opcode variant instead of 0x81. 
There are also better improvements based on known-bits that allow us to eliminate the mask entirely.

As noted, this could be extended. There are potentially other wins from always shifting first, but doing
that reveals a tangle of problems in other pattern matching. We do this transform generically in 
instcombine, but we often have icmp IR that doesn't match that pattern, so we must account for this
in the backend.

Differential Revision: https://reviews.llvm.org/D38181

llvm-svn: 314023
2017-09-22 19:37:21 +00:00
Sanjay Patel ae42181db4 [x86] remove unnecessary OS specifier from test
llvm-svn: 313986
2017-09-22 14:38:57 +00:00
Sanjay Patel 04fd5b8cdc [x86] auto-generate complete checks; NFC
llvm-svn: 313985
2017-09-22 14:30:52 +00:00
Sanjay Patel 8dca7080b0 [x86] update test to use FileCheck; NFC
llvm-svn: 313984
2017-09-22 14:29:47 +00:00
Alexander Ivchenko 34498ba052 [X86] Combining CMOVs with [ANY,SIGN,ZERO]_EXTEND for cases where CMOV has constant arguments
Combine CMOV[i16]<-[SIGN,ZERO,ANY]_EXTEND to [i32,i64] into CMOV[i32,i64].
One example of where it is useful is:

before (20 bytes)
    <foo>:
    test $0x1,%dil
    mov $0x307e,%ax
    mov $0xffff,%cx
    cmovne %ax,%cx
    movzwl %cx,%eax
    retq

after (18 bytes)
    <foo>:
    test $0x1,%dil
    mov $0x307e,%ecx
    mov $0xffff,%eax
    cmovne %ecx,%eax
    retq

Reviewers: craig.topper, aaboud, spatel, RKSimon, zvi

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36711

llvm-svn: 313982
2017-09-22 13:21:39 +00:00
Jatin Bhateja c034d36024 [X86] Updating the test case for FMF propagation.
Differential Revision: https://reviews.llvm.org/D38163

llvm-svn: 313964
2017-09-22 05:48:20 +00:00
Sanjay Patel 58f02afecd [x86] add more tests for node-level FMF; NFC
llvm-svn: 313893
2017-09-21 17:40:58 +00:00
Simon Pilgrim 1efe0c7224 [X86][SSE] Add PSHUFLW/PSHUFHW tests inspired by PR34686
llvm-svn: 313883
2017-09-21 15:11:51 +00:00
Jatin Bhateja 1a86c382d4 [X86] Adding a testpoint for fast-math flags propagation.
Reviewers: jbhateja

Reviewed By: jbhateja

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38127

llvm-svn: 313869
2017-09-21 09:53:21 +00:00
Saleem Abdulrasool aff96d907b X86: treat SwiftCC as Win64_CC on Win64
The Swift CC is identical to Win64 CC with the exception of swift error
being passed in r12 which is a CSR.  However, since this calling
convention is only used in swift -> swift code, it does not impact
interoperability and can be treated entirely as Win64 CC.  We would
previously incorrectly lower the frame setup as we did not treat the
frame as conforming to Win64 specifications.

llvm-svn: 313813
2017-09-20 21:00:40 +00:00
Saleem Abdulrasool 432b88e5f4 CodeGen: support SwiftError SwiftCC on Windows x64
Add support for passing SwiftError through a register on the Windows x64
calling convention.  This allows the use of swifterror attributes on
parameters which is used by the swift front end for the `Error`
parameter.  This partially enables building the swift standard library
for Windows x86_64.

llvm-svn: 313791
2017-09-20 18:40:59 +00:00
Simon Pilgrim d202ad15c1 [X86][SSE] Add PR22415 test case
llvm-svn: 313755
2017-09-20 13:49:52 +00:00
Florian Hahn ceb4494786 Recommit [MachineCombiner] Update instruction depths incrementally for large BBs.
This version of the patch fixes an off-by-one error causing PR34596. We
do not need to use std::next(BlockIter) when calling updateDepths, as
BlockIter already points to the next element.

Original commit message:
> For large basic blocks with lots of combinable instructions, the
> MachineTraceMetrics computations in MachineCombiner can dominate the compile
> time, as computing the trace information is quadratic in the number of
> instructions in a BB and it's relevant successors/predecessors.

> In most cases, knowing the instruction depth should be enough to make
> combination decisions. As we already iterate over all instructions in a basic
> block, the instruction depth can be computed incrementally. This reduces the
> cost of machine-combine drastically in cases where lots of instructions
> are combined. The major drawback is that AFAIK, computing the critical path
> length cannot be done incrementally. Therefore we only compute
> instruction depths incrementally, for basic blocks with more
> instructions than inc_threshold. The -machine-combiner-inc-threshold
> option can be used to set the threshold and allows for easier
> experimenting and checking if using incremental updates for all basic
> blocks has any impact on the performance.
>
> Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn
>
> Reviewed By: fhahn
>
> Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D36619

llvm-svn: 313751
2017-09-20 11:54:37 +00:00
Simon Pilgrim d5e2878252 [X86][SSE] Add 'redundant pand' test case from PR34620
llvm-svn: 313632
2017-09-19 14:02:16 +00:00
Sanjay Patel bd7958d7ca [x86] regenerate checks; NFC
llvm-svn: 313631
2017-09-19 13:43:09 +00:00
Daniel Sanders 28887fe548 [globalisel] Add support for intrinsic_w_chain.
This maps directly to G_INTRINSIC_W_SIDE_EFFECTS.

llvm-svn: 313627
2017-09-19 12:56:36 +00:00
Jina Nahias ccfb8d4fe8 [x86] Lowering Mask Set1 intrinsics to LLVM IR
This patch, together with a matching clang patch (https://reviews.llvm.org/D37668), implements the lowering of X86 mask set1 intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D37669

llvm-svn: 313625
2017-09-19 11:03:06 +00:00
Andrei Elovikov 142516b456 Test commit.
llvm-svn: 313617
2017-09-19 07:56:20 +00:00
Gadi Haber 6f8fbf4b86 [X86][Skylake] Adding the scheduling information for the SkylakeClient target
This patch adds the instruction scheduling information for the SkylakeClient (SKL) architecture target by adding the file X86SchedSkylakeClient.td located under the X86 Target.
We used the scheduling information retrieved from the Skylake architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction.
The patch continues the scheduling replacement and insertion effort started with the SNB target in r307529 and r310792 and for HSW in r311879.

Please expect some performance fluctuations due to code alignment effects.

Reviewers: craig.topper, zvi, chandlerc, igorb, aymanmus, RKSimon, delena
Differential Revision: https://reviews.llvm.org/D37294

llvm-svn: 313613
2017-09-19 06:19:27 +00:00
Craig Topper a80949feb5 [X86] Add VPERMPD/VPERMQ and VPERMPS/VPERMD to the execution domain fixing table.
llvm-svn: 313610
2017-09-19 04:39:55 +00:00
Sanjay Patel f31b1a00ea [DAGCombiner] fold assertzexts separated by trunc
If we have an AssertZext of a truncated value that has already been AssertZext'ed, 
we can assert on the wider source op to improve the zext-y knowledge:
 assert (trunc (assert X, i8) to iN), i1 --> trunc (assert X, i1) to iN

This moves a fold from being Mips-specific to general combining, and x86 shows
improvements.

Differential Revision: https://reviews.llvm.org/D37017

llvm-svn: 313577
2017-09-18 22:05:35 +00:00
Sanjay Patel 7765c93be2 [DAG, x86] allow store merging before and after legalization (PR34217)
rL310710 allowed store merging to occur after legalization to catch stores that are created late,
but this exposes a logic hole seen in PR34217:
https://bugs.llvm.org/show_bug.cgi?id=34217

We will miss merging stores if the target lowers vector extracts into target-specific operations.
This patch allows store merging to occur both before and after legalization if the target chooses
to get maximum merging.

I don't think the potential regressions in the other tests are relevant. The tests are for
correctness of weird IR constructs rather than perf tests, and I think those are still correct.

Differential Revision: https://reviews.llvm.org/D37987

llvm-svn: 313564
2017-09-18 20:54:26 +00:00
Craig Topper 39cdb84560 [X86] Make sure we still emit zext for GR32 to GR64 when the source of the zext is AssertZext
The AssertZext we might see in this case is only giving information about the lower 32 bits. It isn't providing information about the upper 32 bits. So we should emit a zext.

This fixes PR28540.

Differential Revision: https://reviews.llvm.org/D37729

llvm-svn: 313563
2017-09-18 20:49:13 +00:00
Sanjay Patel 74d12b5697 [x86] add tests for PR34217; NFC
llvm-svn: 313548
2017-09-18 18:07:50 +00:00
Simon Pilgrim 4aa28b9730 [X86][AVX] Improve (i8 bitcast (v8i1 x)) handling for 256-bit vector compare results.
As commented on D37849, AVX1 targets were missing a chance to use vmovmskps for v8f32/v8i32 results for bool vector bitcasts

llvm-svn: 313547
2017-09-18 17:58:31 +00:00
Sanjay Patel 078d5d978c [x86] regenerate checks; NFC
llvm-svn: 313545
2017-09-18 17:33:47 +00:00
Simon Pilgrim 0b21ef1fa3 [SelectionDAG] Add BITCAST handling to ComputeNumSignBits for splatted sign bits.
For cases where we are BITCASTing to vectors of smaller elements, then if the entire source was a splatted sign (src's NumSignBits == SrcBitWidth) we can say that the dst's NumSignBit == DstBitWidth, as we're just splitting those sign bits across multiple elements.

We could generalize this but at the moment the only use case I have is to peek through bitcasts to vector comparison results.

Differential Revision: https://reviews.llvm.org/D37849

llvm-svn: 313543
2017-09-18 16:45:05 +00:00
Craig Topper 77d7f331dd [X86] Fix two more places to prefer VPERMQ/PD over VPERM2X128 when AVX2 is enabled
The shuffle combining and lowerVectorShuffleAsLanePermuteAndBlend were both still trying to use VPERM2XF128 for unary shuffles when AVX2 is enabled. VPERM2X128 takes two inputs meaning when we use it for a unary shuffle one of those inputs is left undefined creating a false dependency on whatever register gets allocated there.

If we have VPERMQ/PD we should prefer those since they only have a single input.

Differential Revision: https://reviews.llvm.org/D37947

llvm-svn: 313542
2017-09-18 16:39:49 +00:00
Simon Pilgrim 00161c9961 [X86][SSE] Improve support for vselect(Cond, 0, X) -> ANDN(Cond, X)
As discussed on PR28925 and D37849.

Differential Revision: https://reviews.llvm.org/D37975

llvm-svn: 313532
2017-09-18 14:23:23 +00:00
Simon Pilgrim 360629d170 [X86][SSE] Add vselect with zero tests (PR28925)
llvm-svn: 313529
2017-09-18 13:32:33 +00:00
Nikolai Bozhenov 84af99b3b1 [X86FixupBWInsts] More precise register liveness if no <imp-use> on MOVs.
Summary:
Subregister liveness tracking is not implemented for X86 backend, so
sometimes the whole super register is said to be live, when only a
subregister is really live. That might happen if the def and the use
are located in different MBBs, see added fixup-bw-isnt.mir test.

However, using knowledge of the specific instructions handled by the
bw-fixup-pass we can get more precise liveness information which this
change does.

Reviewers: MatzeB, DavidKreitzer, ab, andrew.w.kaylor, craig.topper

Reviewed By: craig.topper

Subscribers: n.bozhenov, myatsina, llvm-commits, hiraditya

Patch by Andrei Elovikov <andrei.elovikov@intel.com>

Differential Revision: https://reviews.llvm.org/D37559

llvm-svn: 313524
2017-09-18 10:17:59 +00:00
Mohammed Agabaria 77cb080c2d [X86][Codegen] adding masked gathers tests for avx2
related to patch: https://reviews.llvm.org/D35772
adding llvm gathers test before gathers codegen support.

Differential Revision: https://reviews.llvm.org/D37800

llvm-svn: 313516
2017-09-18 06:49:54 +00:00
Craig Topper a6054328e8 [X86] Teach the execution domain fixing tables to use movlhps inplace of unpcklpd for the packed single domain.
MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter.

llvm-svn: 313509
2017-09-18 04:40:58 +00:00
Craig Topper 87f7381edf [X86] Teach execution domain fixing to convert between FP and int unpack instructions.
llvm-svn: 313508
2017-09-18 03:29:54 +00:00
Craig Topper d4341920d5 [X86] Teach execution domain fixing to convert between VPERMILPS and VPSHUFD.
llvm-svn: 313507
2017-09-18 03:29:47 +00:00
Craig Topper ee6646d7de [X86] Teach shuffle lowering to use MOVLHPS/MOVHLPS for lowering v4f32 unary shuffles with SSE1 only.
llvm-svn: 313504
2017-09-17 22:36:41 +00:00
Craig Topper 6c221690a3 [X86] Add a couple more unary shuffles to the sse1 shuffle test.
These can be implemented with movlhps and movhlps.

llvm-svn: 313503
2017-09-17 22:36:39 +00:00
Jatin Bhateja 356e3e2c1d Adding test cases for PR34629 & PR34634.
Differential Revision: https://reviews.llvm.org/D37962

llvm-svn: 313490
2017-09-17 18:16:26 +00:00
Igor Breger f1d388a5c5 [GlobalISel][X86] Legalize i1 G_ADD/G_SUB/G_MUL/G_XOR/G_OR/G_AND instructions.
llvm-svn: 313483
2017-09-17 11:34:17 +00:00
Igor Breger 0f382ccb68 [GlobalISel][X86] Use correct physical register in mir tests.NFC.
llvm-svn: 313479
2017-09-17 08:30:42 +00:00
Igor Breger 21200ed7af [GlobalISel][X86] G_FCONSTANT support.
Summary: G_FCONSTANT support, port the implementation from X86FastIsel.

Reviewers: zvi, delena, guyblank

Reviewed By: delena

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D37734

llvm-svn: 313478
2017-09-17 08:08:13 +00:00
Sanjay Patel 65d6780703 [x86] enable storeOfVectorConstantIsCheap() target hook
This allows vector-sized store merging of constants in DAGCombiner using the existing code in MergeConsecutiveStores(). 
All of the twisted logic that decides exactly what vector operations are legal and fast for each particular CPU are 
handled separately in there using the appropriate hooks.

For the motivating tests in merge-store-constants.ll, we already produce the same vector code in IR via the SLP vectorizer. 
So this is just providing a backend backstop for code that doesn't go through that pass (-O1). More details in PR24449:
https://bugs.llvm.org/show_bug.cgi?id=24449 (this change should be the last step to resolve that bug)

Differential Revision: https://reviews.llvm.org/D37451

llvm-svn: 313458
2017-09-16 13:29:12 +00:00
Craig Topper 23f78c1662 [X86] Add isel patterns to be able to fold loads into VPERM2F128 even when the load is on the first input to the SDNode.
We just need to toggle bits 1 and 5 of the immediate and swap the sources. The peephole pass could trigger commuting/folding for this later, but its easy enough to fix in isel.

Disable the peephole pass on the main vperm2x128 test so we know we're doing this through isel.

llvm-svn: 313455
2017-09-16 09:16:48 +00:00
Craig Topper 0d1b519f78 [X86] Remove unused check lines that got left behind when I moved tests to the instrinsic upgrade file and regenerated.
llvm-svn: 313454
2017-09-16 09:16:46 +00:00
Craig Topper 8374ffde08 [X86] Remove the vperm2f128 test file I just added in r313450.
I missed the we already had a pretty thorough test file for these instructions.

llvm-svn: 313451
2017-09-16 07:51:01 +00:00
Craig Topper f264fcc704 [X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native shuffles.
I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates.

llvm-svn: 313450
2017-09-16 07:36:14 +00:00
Craig Topper aa499c1cb2 [X86] Fix some FileCheck lines that use the wrong prefix.
Assume they were moved during autoupgrading and not changed.

llvm-svn: 313448
2017-09-16 07:13:39 +00:00
Craig Topper 950b19515a [X86] Don't set reserved bits in the immediate in the test cases for vperm2f128.
I'm going to autoupgrade these intrinsics in a future commit. This bit will never be set in the resulting output so pre-removing the bit.

llvm-svn: 313434
2017-09-16 02:11:21 +00:00
Craig Topper 9313df747d [X86] Remove slash in front of a CHECK line in a test.
llvm-svn: 313433
2017-09-16 01:43:21 +00:00
Craig Topper d02179cd9c [X86] Remove usages of vperm2f intrinsics from fast isel tests to match what clang generates after r313418.
llvm-svn: 313424
2017-09-15 23:53:43 +00:00
Hans Wennborg 534bfbd3ba Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs."
This caused PR34629: asserts firing when building Chromium. It also broke some
buildbots building test-suite as reported on the commit thread.

> Summary:
>    1/  Operand folding during complex pattern matching for LEAs has been
>        extended, such that it promotes Scale to accommodate similar operand
>        appearing in the DAG.
>        e.g.
>           T1 = A + B
>           T2 = T1 + 10
>           T3 = T2 + A
>        For above DAG rooted at T3, X86AddressMode will no look like
>           Base = B , Index = A , Scale = 2 , Disp = 10
>
>    2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
>        so that if there is an opportunity then complex LEAs (having 3 operands)
>        could be factored out.
>        e.g.
>           leal 1(%rax,%rcx,1), %rdx
>           leal 1(%rax,%rcx,2), %rcx
>        will be factored as following
>           leal 1(%rax,%rcx,1), %rdx
>           leal (%rdx,%rcx)   , %edx
>
>    3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
>       thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet
>
> Reviewed By: lsaba
>
> Subscribers: spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 313376
2017-09-15 18:40:26 +00:00
Craig Topper 7a183e2760 [X86] Prefer VPERMQ over VPERM2F128 for any unary shuffle, not just the ones that can be done with a insertf128
The early out for AVX2 in lowerV2X128VectorShuffle is positioned in a weird spot below some shuffle mask equivalency checks.

But I think we want to allow VPERMQ for any unary shuffle.

Differential Revision: https://reviews.llvm.org/D37893

llvm-svn: 313373
2017-09-15 18:11:13 +00:00
Craig Topper e0d724cf51 [X86] Don't create i64 constants on 32-bit targets when lowering v64i1 constant build vectors
When handling a v64i1 build vector of constants on 32-bit targets we were creating an illegal i64 constant that we then bitcasted back to v64i1. We need to instead create two 32-bit constants, bitcast them to v32i1 and concat the result. We should also take care to handle the halves being all zeros/ones after the split.

This patch splits the build vector and then recursively lowers the two pieces. This allows us to handle the all ones and all zeros cases with minimal effort. Ideally we'd just do the split and concat, and let lowering get called again on the new nodes, but getNode has special handling for CONCAT_VECTORS that reassembles the pieces back into a single BUILD_VECTOR. Hopefully the two temporary BUILD_VECTORS we had to create to do this that don't get returned don't cause any issues.

Fixes PR34605.

Differential Revision: https://reviews.llvm.org/D37858

llvm-svn: 313366
2017-09-15 17:09:03 +00:00
Craig Topper 143797eb89 [X86] Add isel pattern infrastructure to begin recognizing when we're inserting 0s into the upper portions of a vector register and the producing instruction as already produced the zeros.
Currently if we're inserting 0s into the upper elements of a vector register we insert an explicit move of the smaller register to implicitly zero the upper bits. But if we can prove that they are already zero we can skip that. This is based on a similar idea of what we do to avoid emitting explicit zero extends for GR32->GR64.

Unfortunately, this is harder for vector registers because there are several opcodes that don't have VEX equivalent instructions, but can write to XMM registers. Among these are SHA instructions and a MMX->XMM move. Bitcasts can also get in the way.

So for now I'm starting with explicitly allowing only VPMADDWD because we emit zeros in combineLoopMAddPattern. So that is placing extra instruction into the reduction loop.

I'd like to allow PSADBW as well after D37453, but that's currently blocked by a bitcast. We either need to peek through bitcasts or canonicalize insert_subvectors with zeros to remove bitcasts on the value being inserted.

Longer term we should probably have a cleanup pass that removes superfluous zeroing moves even when the producer is in another basic block which is something these isel tricks can't do. See PR32544.

Differential Revision: https://reviews.llvm.org/D37653

llvm-svn: 313365
2017-09-15 17:09:00 +00:00
Simon Pilgrim 905e79c4dc [X86][SSE] Add test cases vector for integer multiplies
Mainly inspired by PR34474 / D37896

llvm-svn: 313353
2017-09-15 11:17:42 +00:00
Jatin Bhateja 908c8b37c2 [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.
Summary:
   1/  Operand folding during complex pattern matching for LEAs has been
       extended, such that it promotes Scale to accommodate similar operand
       appearing in the DAG.
       e.g.
          T1 = A + B
          T2 = T1 + 10
          T3 = T2 + A
       For above DAG rooted at T3, X86AddressMode will no look like
          Base = B , Index = A , Scale = 2 , Disp = 10

   2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
       so that if there is an opportunity then complex LEAs (having 3 operands)
       could be factored out.
       e.g.
          leal 1(%rax,%rcx,1), %rdx
          leal 1(%rax,%rcx,2), %rcx
       will be factored as following
          leal 1(%rax,%rcx,1), %rdx
          leal (%rdx,%rcx)   , %edx

   3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
      thus avoiding creation of any complex LEAs within a loop.

Reviewers: lsaba, RKSimon, craig.topper, qcolombet

Reviewed By: lsaba

Subscribers: spatel, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 313343
2017-09-15 05:29:51 +00:00
Simon Pilgrim 0b220c7524 [X86] Regenerate test. NFCI.
llvm-svn: 313259
2017-09-14 13:00:27 +00:00
Simon Pilgrim 47d8f62472 Regenerate test (broadcast comment). NFCI.
llvm-svn: 313258
2017-09-14 12:41:19 +00:00
Ayman Musa ab68449c53 [X86] When applying the shuffle-to-zero-extend transformation on floating point, bitcast to integer first.
Fix issue described in PR34577.

Differential Revision: https://reviews.llvm.org/D37803

llvm-svn: 313256
2017-09-14 12:06:38 +00:00
Simon Pilgrim 8bd2d8780a [DAGCombine] (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)
We already have a combine for this pattern when the input to shl is add, so we just need to enable the transformation when the input is or.

Original patch by @tstellar

Differential Revision: https://reviews.llvm.org/D19325

llvm-svn: 313251
2017-09-14 10:38:30 +00:00
Simon Pilgrim 11e2969a35 Fix line endings. NFCI.
llvm-svn: 313246
2017-09-14 10:30:22 +00:00
Dean Michael Berris 01fd7c8bd4 [XRay][CodeGen] Use the current function symbol as the associated symbol for the instrumentation map
Summary:
XRay had been assuming that the previous section is the "text" section
of the function when lowering the instrumentation map. Unfortunately
this is not a safe assumption, because we may be coming from lowering
debug type information for the function being lowered.

This fixes an issue with combining -gsplit-dwarf, -generate-type-units,
-debug-compile and -fxray-instrument for sole member functions. When the
split dwarf section is stripped, we're left with references from the
xray_instr_map to the debug section. The change now uses the function's
symbol instead of the previous section's start symbol.

We found the bug while attempting to strip the split debug sections off
an XRay-instrumented object file, which had a peculiar edge-case for
single-function classes where the single function is being lowered.
Because XRay had assocaited the instrumentation map for a function to
the debug types section instead of the function's section, the objcopy
call will fail due to the misplaced reference from the xray_instr_map
section.

Reviewers: pcc, dblaikie, echristo

Subscribers: llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D37791

llvm-svn: 313233
2017-09-14 07:08:23 +00:00
NAKAMURA Takumi 38fac5905e Move llvm/test/CodeGen/X86/clear-liverange-spillreg.mir to SystemZ. It was in wrong place.
llvm-svn: 313218
2017-09-14 00:03:23 +00:00
Hans Wennborg 06e2a384c2 Revert r312719 "[MachineCombiner] Update instruction depths incrementally for large BBs."
This caused PR34596.

> [MachineCombiner] Update instruction depths incrementally for large BBs.
>
> Summary:
> For large basic blocks with lots of combinable instructions, the
> MachineTraceMetrics computations in MachineCombiner can dominate the compile
> time, as computing the trace information is quadratic in the number of
> instructions in a BB and it's relevant successors/predecessors.
>
> In most cases, knowing the instruction depth should be enough to make
> combination decisions. As we already iterate over all instructions in a basic
> block, the instruction depth can be computed incrementally. This reduces the
> cost of machine-combine drastically in cases where lots of instructions
> are combined. The major drawback is that AFAIK, computing the critical path
> length cannot be done incrementally. Therefore we only compute
> instruction depths incrementally, for basic blocks with more
> instructions than inc_threshold. The -machine-combiner-inc-threshold
> option can be used to set the threshold and allows for easier
> experimenting and checking if using incremental updates for all basic
> blocks has any impact on the performance.
>
> Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn
>
> Reviewed By: fhahn
>
> Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D36619

llvm-svn: 313213
2017-09-13 23:23:09 +00:00
Wei Mi a2a135a01c Add a comment for the test. NFC.
llvm-svn: 313199
2017-09-13 21:47:13 +00:00
Wei Mi c0d066468e [RegAlloc] Keep a copy of live interval for the spilled vregs in HoistSpillHelper.
This is to fix PR34502. After rL311401, the live range of spilled vreg will be
cleared. HoistSpill need to use the live range of the original vreg before splitting
to know the moving range of the spills. The patch saves a copy of live interval for
the spilled vreg inside of HoistSpillHelper.

Differential Revision: https://reviews.llvm.org/D37578

llvm-svn: 313197
2017-09-13 21:41:30 +00:00
Gadi Haber 35f4d7ca46 [X86][Skylake] Replacing -mcpu=skx by -mattr in a codegen test. NFC.
NFC.
Replacing -mcpu=skx by -mattr in the run command of the codegen test: avx512-gather-scatter-intrin.ll.

Reviewers: delena
Revision: https://reviews.llvm.org/D37799
llvm-svn: 313144
2017-09-13 12:39:18 +00:00
Simon Pilgrim f613a45bf3 [X86][FMA4] Test FMA4 commutation with repeated ops as well as FMA3
llvm-svn: 313143
2017-09-13 11:21:38 +00:00
Simon Pilgrim 322fc53725 [X86][FMA] Added *213 fma instructions to scheduling tests
Annoyingly the 132/231 variants are pretty tricky to create when you need to due to weak FMA commutation patterns.

llvm-svn: 313142
2017-09-13 11:12:56 +00:00
Gadi Haber a753080d1e [X86][Skylake][KNL] Updating code gen regression test to use the KNL and SKYLAKE prefixes. NFC.
NFC.
Updating the code gen regression test bmi2-schedule.ll to use the KNL and SKYLAKE prefixes for the run commands that use the knl and Skylake mcpu options.
The fix is in preparation for a large patch of adding all SKL scheduling information.

Reviewers: delena, zvi, RKSimon
Revision: https://reviews.llvm.org/D37796
llvm-svn: 313138
2017-09-13 09:28:25 +00:00
Gadi Haber 04de4ce9e2 [X86][Skylake][KNL] Updating code gen regression test to use the KNL and SKYLAKE prefixes. NFC.
NFC.
Updating the code gen regression test bmi2-schedule.ll to use the KNL and SKYLAKE prefixes for the run commands that use the knl and Skylake mcpu options.
The fix is in preparation for a large patch of adding all SKL scheduling information.

Reviewers: delena, zvi
Revision: https://reviews.llvm.org/D37796
llvm-svn: 313137
2017-09-13 09:28:18 +00:00
Gadi Haber fb47ab7cdd NFC.
Updating codegen test bmi2-schedule.ll to use the SKYLAKE and KNL prefix as preparatipn for an upcoming patch to add all SKL scheduling information.

llvm-svn: 313136
2017-09-13 09:27:39 +00:00
Igor Breger 5c721199dd [GlobalISel][X86] support G_FPEXT operation.
Summary: Support G_FPEXT operation. Selection done via TableGen'erated code.

Reviewers: zvi, guyblank, aymanmus, m_zuckerman

Reviewed By: zvi

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34816

llvm-svn: 313135
2017-09-13 09:05:23 +00:00
Uriel Korach 5d5da5f531 [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm)
This patch, together with a matching clang patch (https://reviews.llvm.org/D37694), implements the lowering of X86 ABS intrinsics to IR.

differential revision: https://reviews.llvm.org/D37693.

llvm-svn: 313134
2017-09-13 09:02:36 +00:00
Uriel Korach 53872a2d89 [X86] Add explicit mc-encoding checks to X86/viabs.ll. NFC.
Add explicit mc-encoding checks showing that the AVX512VL ABS intrinsics are actually mapped to EVEX encoding.
This is a pre-commit for a soon to come patch which will lower x86 target specific ABS intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D37688

llvm-svn: 313131
2017-09-13 08:33:55 +00:00
Craig Topper 2b6bfda561 [X86] Make sure we emit a SUBREG_TO_REG after the MOV32ri when creating a BEXTR64rr instruction from a shift/and pair.
Fixes PR34589.

llvm-svn: 313126
2017-09-13 07:53:21 +00:00
Elena Demikhovsky 6cab129464 [X86 CodeGen] Optimization of ZeroExtendLoad for v2i8 vector
Load with zero-extend and sign-extend from v2i8 to v2i32 is "Legal" since SSE4.1 and may be performed using PMOVZXBD , PMOVSXBD instructions.

llvm-svn: 313121
2017-09-13 06:40:26 +00:00
Sanjay Patel 659279450e [x86] eliminate unnecessary vector compare for AVX masked store
The masked store instruction only cares about the sign-bit of each mask element,
so the compare s<0 isn't needed.

As noted in PR11210:
https://bugs.llvm.org/show_bug.cgi?id=11210
...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR.
(Although more testing will be needed to confirm that.)

I filed a bug to track improvements for AVX512:
https://bugs.llvm.org/show_bug.cgi?id=34584

Differential Revision: https://reviews.llvm.org/D37446

llvm-svn: 313089
2017-09-12 23:24:05 +00:00
Craig Topper 958106d0f1 [X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel
Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp.

This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together.

I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel.

I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch.

Differential Revision: https://reviews.llvm.org/D37592

llvm-svn: 313054
2017-09-12 17:40:25 +00:00
Elena Demikhovsky 18ff5c1374 Added "zext" from v2i8 to v2i32. In the next patch I'll optimize the sequence.
llvm-svn: 313052
2017-09-12 17:27:53 +00:00
Simon Pilgrim 76418aae74 [X86][AVX2] Add gather/movntdqa/pmaskmov/pmovmskb/pslldq/psrldq instructions to scheduling tests
llvm-svn: 313039
2017-09-12 15:52:01 +00:00
Simon Pilgrim 0af5a772e0 [X86][AVX2] Add further instructions to scheduling tests
llvm-svn: 313032
2017-09-12 15:01:20 +00:00
Simon Pilgrim d2d2b37cc9 [X86][AVX2] Add integer broadcast scheduling tests
llvm-svn: 313026
2017-09-12 12:59:20 +00:00
Simon Pilgrim 5a931c641e [X86][AVX2] Add additional fp-broadcast/subvector/shuffle scheduling tests
llvm-svn: 313022
2017-09-12 11:17:01 +00:00
Simon Pilgrim ef9a9d709a [X86][AVX] Add vperm2f128 scheduling test
llvm-svn: 313021
2017-09-12 11:10:59 +00:00
Simon Pilgrim f336d9ce3c [X86][AVX2] Remove old (unused) intrinsic declarations
llvm-svn: 313020
2017-09-12 11:09:30 +00:00
Yael Tsafrir 47668b5e03 [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IR
Differential Revision: https://reviews.llvm.org/D37560

llvm-svn: 313013
2017-09-12 07:50:35 +00:00
Craig Topper afdc36ed74 [X86] Add an extra instruction to TruncAssertSext.ll to prevent the 'or' from being narrowed so that the movl is really required to avoid a miscompile.
If we allow the OR to be narrowed then the upper bits really are zero and we can't tell if the zeroing movl was removed on purpose.

While here regenerate the test with update_llc_test_checks.py

llvm-svn: 312995
2017-09-12 03:50:44 +00:00
Craig Topper 66e4ace1c8 [X86] Rename TruncAssertZext.ll test to TruncAssertSext.ll. Since its testing AssertSext.
llvm-svn: 312991
2017-09-12 01:30:10 +00:00
Adrian Prantl 16aa4cf7ef llvm-dwarfdump: Make -brief the default and add a -verbose option instead.
Differential Revision: https://reviews.llvm.org/D37717

llvm-svn: 312972
2017-09-11 23:05:20 +00:00
Adrian Prantl 7bc1b28291 llvm-dwarfdump: Replace -debug-dump=sect option with individual options.
As discussed on llvm-dev in
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117301.html
this changes the command line interface of llvm-dwarfdump to match the
one used by the dwarfdump utility shipping on macOS. In addition to
being shorter to type this format also has the advantage of allowing
more than one section to be specified at the same time.

In a nutshell, with this change

  $ llvm-dwarfdump --debug-dump=info
  $ llvm-dwarfdump --debug-dump=apple-objc

becomes

  $ dwarfdump --debug-info --apple-objc

Differential Revision: https://reviews.llvm.org/D37714

llvm-svn: 312970
2017-09-11 22:59:45 +00:00
Zvi Rackover 255488a1e0 X86 Tests: More AVX512 conversions tests. NFC
Adding more tests for AVX512 fp<->int conversions that were missing.

llvm-svn: 312921
2017-09-11 15:54:38 +00:00
Simon Pilgrim b092bd321a [X86][SSE] Add support for X86ISD::PACKSS to ComputeNumSignBitsForTargetNode
Helps improve combineLogicBlendIntoPBLENDV support by allowing us to peek into through PACKSS truncations of vector comparison results.

Differential Revision: https://reviews.llvm.org/D37680

llvm-svn: 312916
2017-09-11 14:03:47 +00:00
Simon Pilgrim d0ff65b50e [X86][SSE] Add further test cases showing failure to compute sign bits through PACKSS
Suggested in D37680

Note: had to drop AVX512VL tests as there is an infinite loop in the new tests that needs further investigation (not relevant to D37680).
llvm-svn: 312910
2017-09-11 12:18:43 +00:00
Gadi Haber 3ddffced43 [X86][SKX][KNL] Updating several CodeGen tests to use the attr flag instead of mcpu flag
NFC.
 Updated 3 Codegen regression tests to use the -mattr flag instead of the -mcpu flags as follows:
 Instead of -mcpu=skx use -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq
 Instead of -mcpu=knl use -mattr=+avx512f

Reviewers: delena
Revision: https://reviews.llvm.org/D37674
llvm-svn: 312909
2017-09-11 11:26:20 +00:00
Michael Zuckerman 9707ba0957 [Interleved][Stride 3]Adding test for case the VF=64 target with AVX512.
llvm-svn: 312907
2017-09-11 10:57:15 +00:00
Simon Pilgrim f6fa1d0369 [X86][SSE] Add test showing failure to compute sign bits through PACKSS
Prevents combineLogicBlendIntoPBLENDV from merging to PBLENDV

llvm-svn: 312906
2017-09-11 10:50:03 +00:00
Igor Breger 1f14364d64 [GlobalISel][X86] G_ANYEXT support.
Summary: G_ANYEXT support

Reviewers: zvi, delena

Reviewed By: delena

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D37675

llvm-svn: 312903
2017-09-11 09:41:13 +00:00
Elena Demikhovsky cc477bbcea Fixed a bug in splitting Scatter operation in the Type Legalizer.
After the split of the Scatter operation, the order of the new instructions is well defined - Lo goes before Hi. Otherwise the semantic of Scatter (from LSB to MSB) is broken.
I'm chaining 2 nodes to prevent reordering.

Differential Revision https://reviews.llvm.org/D37670

llvm-svn: 312894
2017-09-11 06:18:15 +00:00
Elena Demikhovsky 9afc3d7b82 Added a test that demonstrates a ug in Scatter scheduling.
The bug is going to be fixed in an upcomming patch.
 

llvm-svn: 312883
2017-09-10 13:20:42 +00:00
Simon Pilgrim ed27bea373 [X86] Add v2i4 store test case (PR20012)
llvm-svn: 312874
2017-09-09 20:28:50 +00:00
Simon Pilgrim e932c7fafa [X86] Add v2i2 test case (PR20011)
llvm-svn: 312873
2017-09-09 20:22:35 +00:00
Simon Pilgrim da41ca5a25 [X86][FMA] Regenerate FMA tests
llvm-svn: 312871
2017-09-09 19:25:59 +00:00
Simon Pilgrim 97a56866a2 [X86][SSE] i32 vector multiplications test cases from PR6399
llvm-svn: 312868
2017-09-09 18:18:17 +00:00
Simon Pilgrim a866a190d6 [X86][MOVBE] Fix typo in MOVBE scheduling test names
Copy+paste is not your friend

llvm-svn: 312867
2017-09-09 17:52:44 +00:00
Craig Topper 3be1db82b6 [X86] Don't disable slow INC/DEC if optimizing for size
Summary:
Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size.

This appears to match gcc behavior.

Reviewers: chandlerc, zvi, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37177

llvm-svn: 312866
2017-09-09 17:11:59 +00:00
Craig Topper 56af2cad89 [X86] Simplify the slow-incdec test and add test cases with optsize.
I think we want to consider using inc/dec with optsize.

llvm-svn: 312804
2017-09-08 17:33:54 +00:00
Simon Pilgrim 2e4fb24173 [X86] Added PR31045 test case
Reduced version of 'addr-calc-crash.ll' that was included in D27044, that had been fixed already by D31286/rL298633

llvm-svn: 312786
2017-09-08 10:49:11 +00:00
Jatin Bhateja a251312719 [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum'
Differential Revision: https://reviews.llvm.org/D37614

llvm-svn: 312778
2017-09-08 09:15:36 +00:00
Chandler Carruth acbcf06f03 [x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to
cover the bitwise operators.

Nothing really exciting here, this just stamps out the rest of the core
operations that can RMW memory and set flags.

Still not implemented here: ADC, SBB. Those will require more
interesting logic to channel the flags *in*, and I'm not currently
planning to try to tackle that. It might be interesting for someone who
wants to improve our code generation for bignum implementations.

Differential Revision: https://reviews.llvm.org/D37141

llvm-svn: 312768
2017-09-08 00:17:12 +00:00
Chandler Carruth 52a31bf268 [x86] Extend the manual ISel of `add` and `sub` with both RMW memory
operands and used flags to support matching immediate operands.

This is a bit trickier than register operands, and we still want to fall
back on a register operands even for things that appear to be
"immediates" when they won't actually select into the operation's
immediate operand. This also requires us to handle things like selecting
`sub` vs. `add` to minimize the number of bits needed to represent the
immediate, and picking the shortest immediate encoding. In order to
that, we in turn need to scan to make sure that CF isn't used as it will
get inverted.

The end result seems very nice though, and we're now generating
optimal instruction sequences for these patterns IMO.

A follow-up patch will further expand this to other operations with RMW
memory operands. But handing `add` and `sub` are useful starting points
to flesh out the machinery and make sure interesting and complex cases
can be handled.

Thanks to Craig Topper who provided a few fixes and improvements to this
patch in addition to the review!

Differential Revision: https://reviews.llvm.org/D37139

llvm-svn: 312764
2017-09-07 23:54:24 +00:00
Paul Robinson bb92137080 [DWARF] Line 0 should not have a discriminator.
It's meaningless and takes up extra space in the line table.

Differential Revision: https://reviews.llvm.org/D37364

llvm-svn: 312751
2017-09-07 22:15:44 +00:00
Michael Zuckerman 5a385940d3 [X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess (VF{8|16|32} stride 3).
This patch expands the support of lowerInterleavedload to {8|16|32}x8i stride 3.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) and we plan to include the store (deinterleved side).

The patch goal is to optimize the following sequence:
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7

into

a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7

Reviewers
1. zvi
2. igor
3. guyblank
4. dorit
5. Ayal

llvm-svn: 312722
2017-09-07 14:02:13 +00:00
Florian Hahn d39b8a3533 [MachineCombiner] Update instruction depths incrementally for large BBs.
Summary:
For large basic blocks with lots of combinable instructions, the
MachineTraceMetrics computations in MachineCombiner can dominate the compile
time, as computing the trace information is quadratic in the number of
instructions in a BB and it's relevant successors/predecessors.

In most cases, knowing the instruction depth should be enough to make
combination decisions. As we already iterate over all instructions in a basic
block, the instruction depth can be computed incrementally. This reduces the
cost of machine-combine drastically in cases where lots of instructions
are combined. The major drawback is that AFAIK, computing the critical path
length cannot be done incrementally. Therefore we only compute
instruction depths incrementally, for basic blocks with more
instructions than inc_threshold. The -machine-combiner-inc-threshold
option can be used to set the threshold and allows for easier
experimenting and checking if using incremental updates for all basic
blocks has any impact on the performance.

Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn

Reviewed By: fhahn

Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D36619

llvm-svn: 312719
2017-09-07 12:49:39 +00:00
Alexander Ivchenko f3a3cd198e [x86] Update to cmov promotion tests for D36711; NFC
Adding i8 -> [i16, i32, i64] and i32 -> i64 cases.
This way we can see what the current codegen looks like.

llvm-svn: 312707
2017-09-07 08:59:05 +00:00
Zvi Rackover 25799d93f0 X86: Improve AVX512 fptoui lowering
Summary:
Add patterns for
  fptoui <16 x float> to <16 x i8>
  fptoui <16 x float> to <16 x i16>

Reviewers: igorb, delena, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37505

llvm-svn: 312704
2017-09-07 07:40:34 +00:00
Sanjay Patel e96f875deb [x86] fix triple and regenerate checks for psubus; NFC
Patch by Yulia Koval!

Differential Revision: https://reviews.llvm.org/D37523

llvm-svn: 312662
2017-09-06 19:05:20 +00:00
Wei Mi 818d50a93d [TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent
function return the intrinsics's first argument.

llvm.memcpy/memset/memmove return void but they will return the first
argument after they are expanded as libcalls. Now if the parent function
has any return value, llvm.memcpy cannot be turned into tail call after
expansion.

The patch is to handle that case in SelectionDAGBuilder so when caller
function return the same value as the first argument of llvm.memcpy,
tail call is allowed.

Differential Revision: https://reviews.llvm.org/D37406

llvm-svn: 312641
2017-09-06 16:05:17 +00:00
Chandler Carruth 585bfc8443 [x86] Fix PR34377 by disabling cmov conversion when we relied on it
performing a zext of a register.

On the PR there is discussion of how to more effectively handle this,
but this patch prevents us from miscompiling code.

Differential Revision: https://reviews.llvm.org/D37504

llvm-svn: 312620
2017-09-06 06:28:08 +00:00
Zvi Rackover 5ebe94a84d X86 Tests: Tidy up AVX512 conversion tests. NFC.
Rename functions to a consistent format to make it easier to track coverage.

llvm-svn: 312619
2017-09-06 05:33:04 +00:00
Jatin Bhateja 80b5e38c4e Updating a test reference for rL312608.
Differential Revision: https://reviews.llvm.org/D37501

llvm-svn: 312614
2017-09-06 03:58:14 +00:00
Jatin Bhateja 2c139f77c7 [X86] Allow cross-lane permutations for sub targets supporting AVX2.
Summary:
Most instructions in AVX work “in-lane”, that is, each source element is applied only to other
elements of the same lane, thus a cross lane permutation is costly and needs more than one instrution.
AVX2 includes instructions to perform any-to-any permutation of words over a 256-bit register
and vectorized table lookup.

This should also Fix PR34369

Differential Revision: https://reviews.llvm.org/D37388

llvm-svn: 312608
2017-09-06 02:58:47 +00:00
Reid Kleckner e33c94f1b0 Add llvm.codeview.annotation to implement MSVC __annotation
Summary:
This intrinsic represents a label with a list of associated metadata
strings. It is modelled as reading and writing inaccessible memory so
that it won't be removed as dead code. I think the intention is that the
annotation strings should appear at most once in the debug info, so I
marked it noduplicate. We are allowed to inline code with annotations as
long as we strip the annotation, but that can be done later.

Reviewers: majnemer

Subscribers: eraman, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D36904

llvm-svn: 312569
2017-09-05 20:14:58 +00:00
Craig Topper 784fa8a4e3 [X86] Remove unnecessary (v4f32 (X86vzmovl (v4f32 (scalar_to_vector FR32X)))) patterns
We had already disabled the pattern for SSE4.1 and SSE4.2. But it got re-enabled for AVX and AVX512.

With SSE41 we rely on a separate (v4f32 (X86vzmovl VR128)) pattern to select blendps with a xorps to create zeroess. And a separate (v4f32 (scalar_to_vector FR32X)) to select a COPY_TO_REG_CLASS to move FR32 to VR128

The same thing can happen for AVX with vblendps and those separate patterns already exist.

For AVX512, (v4f32 (X86vzmov VR128)) will select a VMOVSS instruction instead of VBLENDPS due to their not being a EVEX VBLENDPS. This is what we were getting out of the larger pattern anyway. So the larger pattern is unneeded for AVX512 too.

For SSE1-SSSE3 we can rely on (v4f32 (X86vzmov VR128)) selecting a MOVSS similar to AVX512. Again this is what the larger pattern did too.

So the only real change here is that AVX1/2 now properly outputs a VBLENDPS during isel instead of a VMOVSS to match SSE41. Most tests didn't notice because the two address instruction pass knows how to turn VMOVSS into VBLENDPS to get an independent destination register.

llvm-svn: 312564
2017-09-05 19:09:02 +00:00
Zvi Rackover 2096893f34 X86 Tests: Adding missing AVX512 fptoui coverage tests. NFC.
Some of the cases show missing pattern i intend to fix shortly.

llvm-svn: 312560
2017-09-05 18:24:39 +00:00