Commit Graph

15577 Commits

Author SHA1 Message Date
Sanjay Patel 6c0aef77aa [x86] avoid infinite loop from SoftenFloatOperand (PR34866)
Legalization of fp128 assumes things that we should have asserts for,
so that's another potential improvement.

Differential Revision: https://reviews.llvm.org/D38771

llvm-svn: 315485
2017-10-11 18:24:21 +00:00
Simon Pilgrim 7db366630c Spelling mistake in comment. NFCI.
llvm-svn: 315471
2017-10-11 16:10:05 +00:00
Craig Topper 3dc22bba47 [X86] Remove MVT::i1 handling code from LowerTRUNCATE
Summary: I don't think this is necessary with i1 being illegal now.

Reviewers: RKSimon, zvi, guyblank

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38784

llvm-svn: 315469
2017-10-11 16:05:05 +00:00
Oliver Stannard 4191b9eaea [Asm] Add debug tracing in table-generated assembly matcher
This adds debug tracing to the table-generated assembly instruction matcher,
enabled by the -debug-only=asm-matcher option.

The changes in the target AsmParsers are to add an MCInstrInfo reference under
a consistent name, so that we can use it from table-generated code. This was
already being used this way for targets that use deprecation warnings, but 5
targets did not have it, and Hexagon had it under a different name to the other
backends.

llvm-svn: 315445
2017-10-11 09:17:43 +00:00
Lang Hames 02d330548d [MC] Have MCObjectStreamer take its MCAsmBackend argument via unique_ptr.
MCObjectStreamer owns its MCAsmBackend -- this fixes the types to reflect that,
and allows us to remove another instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.

llvm-svn: 315410
2017-10-11 01:57:21 +00:00
Craig Topper 85b1da1dc4 [X86] Remove temporary std::string creation from shuffle comment printing. We can just write directly to the raw_ostream.
llvm-svn: 315399
2017-10-11 00:46:09 +00:00
Craig Topper 6ce20bd184 [X86] Add 128-bit version of vbroadcasti32x2 to shuffle comment decoding.
llvm-svn: 315395
2017-10-11 00:11:53 +00:00
Craig Topper bb0e316dc7 [X86] Add broadcast patterns that allow a scalar_to_vector between the broadcast and the load.
We already have these patterns for AVX512VL, but not AVX1 or 2.

llvm-svn: 315382
2017-10-10 22:40:31 +00:00
Craig Topper ad3d03193a [X86] Fix some patterns that select VLX instructions, but were incorrectly also checking presence of BWI instructions.
The EVEX->VEX pass probably obscures this.

llvm-svn: 315365
2017-10-10 21:07:14 +00:00
Lang Hames 60fbc7cc38 [MC] Thread unique_ptr<MCObjectWriter> through the create.*ObjectWriter
functions.

This makes the ownership of the resulting MCObjectWriter clear, and allows us
to remove one instance of MCObjectStreamer's bizarre "holding ownership via
someone else's reference" trick.

llvm-svn: 315327
2017-10-10 16:28:07 +00:00
Uriel Korach 059e211aa1 after fixing the i386 case
Change-Id: If6fe0b6ec01f111115fb734fe31c0e152dbc165f
llvm-svn: 315311
2017-10-10 13:43:09 +00:00
Craig Topper a88306e6fb [AVX512] Add patterns to commute integer comparison instructions during isel.
This enables broadcast loads to be commuted and allows normal loads to be folded without the peephole pass.

llvm-svn: 315274
2017-10-10 06:36:46 +00:00
Reid Kleckner e52d1e6787 [SEH] Use reportError instead of report_fatal_error for bad directives
This makes the .seh_ directives slightly more usable from standalone
assembly files.

This removes a large number of report_fatal_errors and recovers from the
error by ignoring the directive.

llvm-svn: 315262
2017-10-10 01:26:25 +00:00
Lang Hames 77dff39cb4 [MC] Plumb unique_ptr<MCWinCOFFObjectTargetWriter> through
createWinCOFFObjectWriter to WinCOFFObjectWriter's constructor.

Fixes the same ownership issue for COFF that r315245 did for MachO:
WinCOFFObjectWriter takes ownership of its MCWinCOFFObjectTargetWriter, so we
want to pass this through to the constructor via a unique_ptr, rather than a
raw ptr.

llvm-svn: 315257
2017-10-10 00:50:29 +00:00
Lang Hames dcb312bdb9 [MC] Plumb unique_ptr<MCELFObjectTargetWriter> through createELFObjectWriter to
ELFObjectWriter's constructor.

Fixes the same ownership issue for ELF that r315245 did for MachO:
ELFObjectWriter takes ownership of its MCELFObjectTargetWriter, so we want to
pass this through to the constructor via a unique_ptr, rather than a raw ptr.

llvm-svn: 315254
2017-10-09 23:53:15 +00:00
Lang Hames 9b206a7d60 [MC] Plumb unique_ptr<MCMachObjectTargetWriter> through createMachObjectWriter
to MCObjectWriter's constructor.

MCObjectWriter takes ownership of its MCMachObjectTargetWriter argument -- this
patch plumbs that ownership relationship through the constructor (which
previously took raw MCMachObjectTargetWriter*) and the createMachObjectWriter
function.

llvm-svn: 315245
2017-10-09 22:38:13 +00:00
Aditya Nandakumar c3bfc81a1f [GISel]: Fix generation of illegal COPYs during CallLowering
We end up creating COPY's that are either truncating/extending and this
should be illegal.

https://reviews.llvm.org/D37640

Patch for X86 and ARM by igorb, rovka

llvm-svn: 315240
2017-10-09 20:07:43 +00:00
Zvi Rackover c1d5955684 [X86] Unsigned saturation subtraction canonicalization [the backend part]
Summary:
On behalf of julia.koval@intel.com

The patch transforms canonical version of unsigned saturation, which is sub(max(a,b),a) or sub(a,min(a,b)) to special psubus insturuction on targets, which support it(8bit and 16bit uints).
umax(a,b) - b -> subus(a,b)
a - umin(a,b) -> subus(a,b)

There is also extra case handled, when right part of sub is 32 bit and can be truncated, using UMIN(this transformation was discussed in https://reviews.llvm.org/D25987).

The example of special case code:

```
void foo(unsigned short *p, int max, int n) {

  int i;
  unsigned m;
  for (i = 0; i < n; i++) {
    m = *--p;
    *p = (unsigned short)(m >= max ? m-max : 0);
  }
}
```
Max in this example is truncated to max_short value, if it is greater than m, or just truncated to 16 bit, if it is not. It is vaid transformation, because if max > max_short, result of the expression will be zero.

Here is the table of types, I try to support, special case items are bold:

| Size | 128 | 256 | 512
| -----  | -----  | -----   | -----
| i8 | v16i8 | v32i8 | v64i8
| i16 | v8i16 | v16i16 | v32i16
| i32 | | **v8i32** | **v16i32**
| i64 | | | **v8i64**

Reviewers: zvi, spatel, DavidKreitzer, RKSimon

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37534

llvm-svn: 315237
2017-10-09 20:01:10 +00:00
Craig Topper c88883b07d [X86] Remove a setLoadExtAction from the AVX512 section that uses an AVX512BW type and is alraedy present in the AVX512BW section.
llvm-svn: 315202
2017-10-09 01:05:16 +00:00
Craig Topper 4f8656a7af [X86] Enable extended comparison predicate support for SETUEQ/SETONE when targeting AVX instructions.
We believe that despite AMD's documentation, that they really do support all 32 comparision predicates under AVX.

Differential Revision: https://reviews.llvm.org/D38609

llvm-svn: 315201
2017-10-09 01:05:15 +00:00
Simon Pilgrim 2c742f919a [X86][SSE] Don't call combineTo inside combineX86ShufflesRecursively. NFCI.
Return the combined shuffle from combineX86ShufflesRecursively and perform the combineTo in the caller.

Makes it easier for future patches to use this in functions that aren't actually shuffles themselves.

llvm-svn: 315195
2017-10-08 20:58:14 +00:00
Simon Pilgrim 6abbd33ec0 Tidyup with clang-format. NFCI.
llvm-svn: 315187
2017-10-08 19:24:30 +00:00
Simon Pilgrim dc32c844f9 [X86] getTargetConstantBitsFromNode - add support for decoding scalar constants
llvm-svn: 315182
2017-10-08 17:21:18 +00:00
Craig Topper c97775c03c [X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI versions of scalar arithmetic patterns
Summary:
We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD.

Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary.

Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd.

This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type.

The rest of the patch is removing all the excess scalar patterns.

Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint.

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38023

llvm-svn: 315181
2017-10-08 16:57:23 +00:00
Gadi Haber 684944b822 [X86][SKX] Adding the scheduling information for the SKX target.
Adding the scheduling information for the SkylakeServer (SKX) target.

This patch adds the instruction scheduling information for the SkylakeServer (SKX) architecture target by adding the file X86SchedSkylakeServer.td located under the X86 Target.
We used the scheduling information retrieved from the Skylake architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction.

The patch continues the scheduling replacement and insertion effort started with the SNB target in r310792, the HSW target in r311879 and the SkylakeClient (SKL) target in rL313613.

Please expect some performance fluctuations due to code alignment effects.

Reviewers: zvi, RKSimon, craig.topper, chandlerc, aymanmu
Differential Revision: https://reviews.llvm.org/D38443

Change-Id: I5c228fcc09e9e5a99b6116e62b356c4f9b971185
llvm-svn: 315175
2017-10-08 12:52:54 +00:00
Ayman Musa 1170deb9c8 [X86] Add missing entries in 'MemoryFoldTable2Addr' to get complete form of the table.
Get the folding table 'MemoryFoldTable2Addr' to a complete state as part of the process explained in https://reviews.llvm.org/D38028

Differential Revision: https://reviews.llvm.org/D38500

llvm-svn: 315174
2017-10-08 09:46:50 +00:00
Ayman Musa 993339b941 [X86][TableGen] Recommitting the X86 memory folding tables TableGen backend while disabling it by default.
After the original commit ([[ https://reviews.llvm.org/rL304088 | rL304088 ]]) was reverted, a discussion in llvm-dev was opened on 'how to accomplish this task'.
In the discussion we concluded that the best way to achieve our goal (which is to automate the folding tables and remove the manually maintained tables) is:

 # Commit the tablegen backend disabled by default.

 # Proceed with an incremental updating of the manual tables - while checking the validity of each added entry.

 # Repeat previous step until we reach a state where the generated and the manual tables are identical. Then we can safely remove the manual tables and include the generated tables instead.

 # Schedule periodical (1 week/2 weeks/1 month) runs of the pass:

   - if changes appear (new entries):
      - make sure the entries are legal
      - If they are not, mark them as illegal to folding
   - Commit the changes (if there are any).

CMake flag added for this purpose is "X86_GEN_FOLD_TABLES". Building with this flags will run the pass and emit the X86GenFoldTables.inc file under build/lib/Target/X86/ directory which is a good reference for any developer who wants to take part in the effort of completing the current folding tables.

Differential Revision: https://reviews.llvm.org/D38028

llvm-svn: 315173
2017-10-08 09:20:32 +00:00
Craig Topper bbca2f2978 [X86] Stop LowerSIGN_EXTEND_AVX512 from creating v8i16/v16i16/v16i8 vselects with a v8i1/v16i1 condition when BWI is not available.
Some of the tests in vector-shuffle-v1.ll would get into an infinite loop without this.

llvm-svn: 315172
2017-10-08 08:50:59 +00:00
Ayman Musa 5fc6dc58d7 [X86] Add new attribute to X86 instructions to enable marking them as "not memory foldable"
This attribute will be used in a tablegen backend that generated the X86 memory folding tables which will be added in a future pass.
Instructions with this attribute unset will be excluded from the full set of X86 instructions available for the pass.

Differential Revision: https://reviews.llvm.org/D38027

llvm-svn: 315171
2017-10-08 08:32:56 +00:00
Craig Topper 9563cab961 [X86] Simplify some code in getInsertVINSERTImmediate and getExtractVEXTRACTImmediate. NFC
Replace one of the divides with a multiply.

llvm-svn: 315162
2017-10-08 01:33:42 +00:00
Craig Topper 27170fee8d [X86] If we see an insert of a bitcast into zero vector, canonicalize it to move the bitcast to the other side of the insert.
This improves detection of zeroing of upper bits during isel.

llvm-svn: 315161
2017-10-08 01:33:41 +00:00
Craig Topper f7a19db649 [X86] Remove ISD::INSERT_SUBVECTOR handling from combineBitcastForMaskedOp. Add isel patterns to make up for it.
This will allow for some flexibility in canonicalizing bitcasts around insert_subvector.

llvm-svn: 315160
2017-10-08 01:33:40 +00:00
Craig Topper 16f2044fa8 [X86] Use getConstantOperandVal to simplify some code. NFC
llvm-svn: 315159
2017-10-08 01:33:38 +00:00
Simon Pilgrim 9508fe7924 [X86][SSE] Match bitcasted BUILD_VECTOR of constants for v2i64 shifts on 64-bit targets (PR34855)
Extension to rL315155, generate constant shifts on 64-bits as well as 32-bits.

llvm-svn: 315156
2017-10-07 17:57:22 +00:00
Simon Pilgrim 70e1db78db [X86][SSE] Match bitcasted v4i32 BUILD_VECTORS for v2i64 shifts on 64-bit targets (PR34855)
We were already doing this for 32-bit targets, but we can generate these on 64-bits as well.

llvm-svn: 315155
2017-10-07 17:42:17 +00:00
Craig Topper 2f60295364 [X86] Add X86ISD::CMOV to computeKnownBitsForTargetNode and ComputeNumSignBitsForTargetNode.
Summary: Implementations based on ISD::SELECT.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38663

llvm-svn: 315153
2017-10-07 16:51:19 +00:00
Simon Pilgrim 73f143e774 [X86][SSE] Improve shuffling combining with horizontal operations
Recognise cases when we can merge the shuffles with their horizontal (HADD/HSUB/PACK) instruction inputs.

Replaces an older implementation which performed some of this during lowering, expanding an existing target shuffle combine stage instead.

Differential Revision: https://reviews.llvm.org/D38506

llvm-svn: 315150
2017-10-07 12:42:23 +00:00
Martin Storsjo 5e9d482b0a [X86] Update an outdated comment about SjLj
The SjLj intrinsics in the X86 backend are intended for use with
SjLj exception handling as well, since SVN r271244.

Differential Revision: https://reviews.llvm.org/D38532

llvm-svn: 315146
2017-10-07 06:00:32 +00:00
Craig Topper e79eff3bb5 [X86] Correct result type for the flag result of RDSEED and RDRAND nodes. Correct the CC type for the CMOV used with RDSEED/RDRAND.
The flag result was MVT::Glue, but should be MVT::i32. The CC type was MVT::i8, but should be MVT::i32.

llvm-svn: 315145
2017-10-07 05:11:59 +00:00
Jessica Paquette 13593843f6 [MachineOutliner] Disable outlining from LinkOnceODRs by default
Say you have two identical linkonceodr functions, one in M1 and one in M2.
Say that the outliner outlines A,B,C from one function, and D,E,F from another
function (where letters are instructions). Now those functions are not
identical, and cannot be deduped. Locally to M1 and M2, these outlining
choices would be good-- to the whole program, however, this might not be true!

To mitigate this, this commit makes it so that the outliner sees linkonceodr
functions as unsafe to outline from. It also adds a flag,
-enable-linkonceodr-outlining, which allows the user to specify that they
want to outline from such functions when they know what they're doing.

Changing this handles most code size regressions in the test suite caused by
competing with linker dedupe. It also doesn't have a huge impact on the code
size improvements from the outliner. There are 6 tests that regress > 5% from
outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most
tests either improve or are not impacted.

Not outlined vs outlined without linkonceodrs:
https://hastebin.com/raw/qeguxavuda

Not outlined vs outlined with linkonceodrs:
https://hastebin.com/raw/edepoqoqic

Outlined with linkonceodrs vs outlined without linkonceodrs:
https://hastebin.com/raw/awiqifiheb

Numbers generated using compare.py with -m size.__text. Tests run for AArch64
with -Oz -mllvm -enable-machine-outliner -mno-red-zone.

llvm-svn: 315136
2017-10-07 00:16:34 +00:00
Cameron McInally 9d64101fe8 [AVX512] Fix TERNLOG when folding broadcast
Patch to fix ternlog instructions with a folded
broadcast. The broadcast decorator, e.g. {1toX}, was missing.

Differential Revision: https://reviews.llvm.org/D38649

llvm-svn: 315122
2017-10-06 22:31:29 +00:00
Reid Kleckner 676941909d [X86] Extract CATCHRET handling from emitEpilogue, NFC
llvm-svn: 315023
2017-10-05 21:37:39 +00:00
Reid Kleckner 7344282c36 [X86] Simplify X86 epilogue frame size calculation, NFC
Sink the insertion of "pop ebp" out of the frame size calculation
branches. They all check for HasFP.

Our handling of CLEANUPRET and CATCHRET was equivalent, both are
funclets and use the same frame size. We can eliminate the CLEANUPRET
case.

Hoist the hasFP(MF) query into a local bool.

Rename TargetMBB to CatchRetTarget to be more descriptive.

Eliminate the Optional<unsigned> RetOpcode local, now that it has one
use.

It's only a net savings of 10 lines, but hopefully it's *slightly* more
readable.

llvm-svn: 315000
2017-10-05 18:27:08 +00:00
Artur Pilipenko 7b15254c8f [X86] Fix chains update when lowering BUILD_VECTOR to a vector load
The code which lowers BUILD_VECTOR of consecutive loads into a single vector
load doesn't update chains properly. As a result the vector load can be
reordered with the store to the same location.

The current code in EltsFromConsecutiveLoads only updates the chain following
the first load. The fix is to update the chains following all the loads
comprising the vector.

This is a fix for PR10114.

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D38547

llvm-svn: 314988
2017-10-05 16:28:21 +00:00
Eugene Zelenko 60433b682f [X86] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 314953
2017-10-05 00:33:50 +00:00
Simon Pilgrim 9edbe110e8 [X86][AVX] Improve (i8 bitcast (v8i1 x)) handling for v8i64/v8f64 512-bit vector compare results.
AVX1/AVX2 targets were missing a chance to use vmovmskps for v8f32/v8i32 results for bool vector bitcasts

llvm-svn: 314921
2017-10-04 18:00:42 +00:00
Hans Wennborg 2a6c9adb2f Revert r314886 "[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)"
It broke the Chromium / SQLite build; see PR34830.

> Summary:
>    1/  Operand folding during complex pattern matching for LEAs has been
>        extended, such that it promotes Scale to accommodate similar operand
>        appearing in the DAG.
>        e.g.
>          T1 = A + B
>          T2 = T1 + 10
>          T3 = T2 + A
>        For above DAG rooted at T3, X86AddressMode will no look like
>          Base = B , Index = A , Scale = 2 , Disp = 10
>
>    2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
>        so that if there is an opportunity then complex LEAs (having 3 operands)
>        could be factored out.
>        e.g.
>          leal 1(%rax,%rcx,1), %rdx
>          leal 1(%rax,%rcx,2), %rcx
>        will be factored as following
>          leal 1(%rax,%rcx,1), %rdx
>          leal (%rdx,%rcx)   , %edx
>
>    3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
>       thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
>
> Reviewed By: lsaba
>
> Subscribers: jmolloy, spatel, igorb, llvm-commits
>
>     Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 314919
2017-10-04 17:54:06 +00:00
Simon Pilgrim b47b3f2564 [X86][SSE] Add support for lowering v8i16 binary shuffles to PACKSS/PACKUS
Missed in D38472

llvm-svn: 314916
2017-10-04 17:31:28 +00:00
Craig Topper 6fb55716e9 [X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input instead of FR32/FR64
This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted.

This should fix PR33079

Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results.

Differential Revision: https://reviews.llvm.org/D38449

llvm-svn: 314914
2017-10-04 17:20:12 +00:00
Simon Pilgrim 46a366ccb7 [X86][SSE] Early out from ComputeNumSignBitsForTargetNode. NFCI.
Early out from vector shift by immediates that will exceed eltsize - don't bother making an unnecessary ComputeNumSignBits recursive call.

llvm-svn: 314903
2017-10-04 13:41:26 +00:00
Simon Pilgrim bd5d2f0284 [X86][SSE] Add support for lowering unary shuffles to PACKSS/PACKUS
Extension to D38472

llvm-svn: 314901
2017-10-04 13:12:08 +00:00
Jatin Bhateja 3c29bacd43 [X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)
Summary:
   1/  Operand folding during complex pattern matching for LEAs has been
       extended, such that it promotes Scale to accommodate similar operand
       appearing in the DAG.
       e.g.
         T1 = A + B
         T2 = T1 + 10
         T3 = T2 + A
       For above DAG rooted at T3, X86AddressMode will no look like
         Base = B , Index = A , Scale = 2 , Disp = 10

   2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
       so that if there is an opportunity then complex LEAs (having 3 operands)
       could be factored out.
       e.g.
         leal 1(%rax,%rcx,1), %rdx
         leal 1(%rax,%rcx,2), %rcx
       will be factored as following
         leal 1(%rax,%rcx,1), %rdx
         leal (%rdx,%rcx)   , %edx

   3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
      thus avoiding creation of any complex LEAs within a loop.

Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy

Reviewed By: lsaba

Subscribers: jmolloy, spatel, igorb, llvm-commits

    Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 314886
2017-10-04 09:02:10 +00:00
Martin Storsjo e14145dcb0 [X86] Fix using the SJLJ jump table on x86_64
The previous version didn't work if the jump table base address didn't
fit in 32 bit, since it was encoded as an immediate offset. And in case
the jump table is encoded as 32 bit label differences, we need to
load and add them to the table base first.

This solves the first half of the issues mentioned in PR34720.

Also fix some of the errors pointed out by -verify-machineinstrs, by
using GR32_NOSPRegClass.

Differential Revision: https://reviews.llvm.org/D38333

llvm-svn: 314876
2017-10-04 05:12:10 +00:00
Reid Kleckner 33cbbbc62f [X86] Remove dead declaration convertArgMovsToPushes, NFC
This was dead when it landed in r252578. We have this functionality, if
not for stack probe calls, but for regular calls in
X86CallFrameOptimization.cpp.

llvm-svn: 314845
2017-10-03 21:12:18 +00:00
Hans Wennborg 660531085a CodeView: Provide a .def file with the register ids
The list of register ids was previously written out in a couple of dirrent
places. This puts it in a .def file and also adds a few more registers (e.g.
the x87 regs) which should lead to more readable dumps, but I didn't include
the whole list since that seems unnecessary.

X86_MC::initLLVMToSEHAndCVRegMapping is pretty ugly, but at least it's not
relying on magic constants anymore. The TODO of using tablegen still stands.

Differential revision: https://reviews.llvm.org/D38480

llvm-svn: 314821
2017-10-03 18:27:22 +00:00
Simon Pilgrim cf99d069c3 [X86][SSE] Add support for decoding PACKSS/PACKUS shuffles masks with UNDEF
llvm-svn: 314792
2017-10-03 12:41:39 +00:00
Simon Pilgrim f5f291d129 [X86][SSE] Add support for lowering shuffles to PACKSS/PACKUS
If the upper bits of a truncation shuffle patterns have at least the minimum number of sign/zero bits on their inputs then we can safely use PACKSS/PACKUS as shuffles.

Partial fix for https://bugs.llvm.org/show_bug.cgi?id=34773

Differential Revision: https://reviews.llvm.org/D38472

llvm-svn: 314788
2017-10-03 12:01:31 +00:00
Simon Pilgrim d87af9a1c0 Remove unused variable. NFCI.
llvm-svn: 314778
2017-10-03 10:01:02 +00:00
Simon Pilgrim 640fbf5132 [X86][SSE] Add support for shuffle combining from PACKSS/PACKUS
Mentioned in D38472

llvm-svn: 314777
2017-10-03 09:54:03 +00:00
Simon Pilgrim 19d535e75b [X86][SSE] Add support for PACKSS/PACKUS constant folding
Pulled out of D38472

llvm-svn: 314776
2017-10-03 09:41:00 +00:00
Martin Storsjo 1e54738676 [X86] Provide the LSDA pointer with RIP relative addressing if necessary
This makes sure the LSDA pointer isn't truncated to 32 bit.

Make LowerINTRINSIC_WO_CHAIN a member function instead of a static
function, so that it can use the getGlobalWrapperKind method.

This solves the second half of the issues mentioned in PR34720.

Differential Revision: https://reviews.llvm.org/D38343

llvm-svn: 314767
2017-10-03 06:29:58 +00:00
Amjad Aboud 8ef85a088e [X86][NFC] Add X86CmovConverterPass to the pass registry.
Differential Revision: https://reviews.llvm.org/D38355

llvm-svn: 314726
2017-10-02 21:46:37 +00:00
Bjorn Pettersson 8e978c0151 [X86][SSE] Fix -Wsign-compare problems introduced in r314658
The refactoring in
"[X86][SSE] Add createPackShuffleMask helper function. NFCI."
resulted in warning when compiling the code (seen in build bots).

This patch restores some types from int to unsigned to avoid
those warnings.

llvm-svn: 314667
2017-10-02 12:46:38 +00:00
Simon Pilgrim e2e27aff9b [X86][SSE] Add createPackShuffleMask helper function. NFCI.
llvm-svn: 314658
2017-10-02 10:12:51 +00:00
Simon Pilgrim c04c7443ea [X86][SSE] matchBinaryVectorShuffle - add support for different src/dst value shuffle types
Preparation for support for combining to PACKSS/PACKUS

llvm-svn: 314656
2017-10-02 09:45:08 +00:00
Simon Pilgrim 3bbbf31590 Fix typo in comment. NFCI.
llvm-svn: 314653
2017-10-02 09:10:50 +00:00
Simon Pilgrim e575651370 [X86] Cleanup uses of computeKnownBits by using MaskedValueIsZero helper instead. NFCI.
llvm-svn: 314652
2017-10-02 09:08:45 +00:00
Michael Zuckerman e4084f6bdb [X86][LLVM]Expanding Supports lowerInterleaved{store|load}() in X86InterleavedAccess (VF64 stride 3-4)
I continue to support different VF interleaved and in this pass for this patch,
I added the vf64 stride3 support for both load and store.
I also added support fot the stride4 store.

Reviewers:
1. zvi
2. dorit
3. igorb
4. guyblank

Differential Revision: https://reviews.llvm.org/D37687

Change-Id: I3d238efedf217d1768b348d710de1efa2f19d27b
llvm-svn: 314651
2017-10-02 07:35:25 +00:00
Craig Topper d37625859a [X86] Fix copy pasto in X86FastISel::fastEmitInst_rrrr.
The 4th operand was not being constrained and the third operand was being constrained twice.

llvm-svn: 314648
2017-10-02 05:46:53 +00:00
Craig Topper bb7866162c [X86] Use a bool flag instead of assigning an unsigned to two different values that we only use in an equality comparison.
llvm-svn: 314647
2017-10-02 05:46:52 +00:00
Craig Topper c05c390a7c [X86] Use _NOREX MOVZX instructions for some patterns even in 32-bit mode.
This unifies the patterns between both modes. This should be effectively NFC since all the available registers in 32-bit mode statisfy this constraint.

llvm-svn: 314643
2017-10-02 00:44:50 +00:00
Craig Topper c20b46da2f [X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMem
Summary:
Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable.

For the register&register form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register&register form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem.

I believe this supercedes D38025 which was trying to switch the register&register form back to pre-PR22995.

Reviewers: aymanmus, RKSimon, zvi

Reviewed By: aymanmus

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38120

llvm-svn: 314639
2017-10-01 23:53:53 +00:00
Craig Topper 00230604d3 [X86] Remove a couple unnecessary COPY_TO_REGCLASS from some output patterns where the instruction already produces the correct register class.
llvm-svn: 314638
2017-10-01 23:53:50 +00:00
Simon Pilgrim df23a2700d [X86][SSE] Add faux shuffle combining support for PACKUS
llvm-svn: 314631
2017-10-01 18:43:48 +00:00
Simon Pilgrim 836fa6dcfd [X86][SSE] Improve shuffle combining of PACKSS instructions.
Support unary packing and fix the faux shuffle mask for vectors larger than 128 bits.

llvm-svn: 314629
2017-10-01 17:54:55 +00:00
Sanjay Patel c7076a3ba9 [x86] formatting; NFC
llvm-svn: 314627
2017-10-01 14:39:10 +00:00
Simon Pilgrim a8dd6f4f30 [X86][SSE] Fold (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1
Remove sign extend in register style pattern if the sign is already extended enough

llvm-svn: 314599
2017-09-30 17:57:34 +00:00
Craig Topper 619569841a [AVX-512] Add patterns to make fp compare instructions commutable during isel.
llvm-svn: 314598
2017-09-30 17:02:39 +00:00
Michael Zuckerman b92b6d424f Code refactoring for the interleaved code <NFC>
Change-Id: I7831c9febad8e14278a5bc87584a0053dc837be1
llvm-svn: 314596
2017-09-30 14:55:03 +00:00
Craig Topper d92ade96f4 [X86] Support v64i8 mulhu/mulhs
Implemented by splitting into two v32i8 mulhu/mulhs and concatenating the results.

Differential Revision: https://reviews.llvm.org/D38307

llvm-svn: 314584
2017-09-30 04:21:46 +00:00
Amara Emerson 7d6c55f8aa [X86] Improve codegen for inverted overflow checking intrinsics.
Adds a new combine for: xor(setcc cc, val), 1 --> setcc (invert(cc), val)

Differential Revision: https://reviews.llvm.org/D38161

llvm-svn: 314514
2017-09-29 13:53:44 +00:00
Michael Zuckerman 0b5db55b96 Small modification <NFC>
Change-Id: I360abccee12cae29bd2ac4f8399c9ecc92eb7f13
llvm-svn: 314510
2017-09-29 12:45:54 +00:00
Coby Tayree c3d24118e8 [X86][MS-InlineAsm] Extended support for variables / identifiers on memory / immediate expressions
Allow the proper recognition of Enum values and global variables inside ms inline-asm memory / immediate expressions, as they require some additional overhead and treated incorrect if doesn't early recognized.
supersedes D33278, D35774

Differential Revision: https://reviews.llvm.org/D37412

llvm-svn: 314493
2017-09-29 07:02:46 +00:00
Craig Topper 6255c7b675 [X86] Don't select (cmp (and, imm), 0) to testw
Summary:
X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters.

I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs.

Reviewers: RKSimon, zvi, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38273

llvm-svn: 314474
2017-09-28 23:35:36 +00:00
Craig Topper ed19350293 [X86] Make use of vpmovwb when possible in LowerMULH
If we have BWI, we can truncate in a much simpler way by using vpmovwb. This even works without VLX by using the wider zmm->ymm truncate with a subvector extract.

Differential Revision: https://reviews.llvm.org/D38375

llvm-svn: 314457
2017-09-28 20:10:34 +00:00
Craig Topper 3819be6cf6 [X86] Use target independent ZERO_EXTEND/SIGN_EXTEND nodes were possible in LowerMULH
We aren't do any in register extends here so we should be able to just the target independent nodes directly and allow them to be lowered as necessary.

llvm-svn: 314447
2017-09-28 18:45:28 +00:00
Craig Topper fc104bfbc0 [X86] Move a setOperation action for ISD::TRUNCATE near another one in the same if. Remove one that is redundant with another subtarget features.
llvm-svn: 314446
2017-09-28 18:45:27 +00:00
Craig Topper ceff6da6e9 [X86] Use BWI instructions to improve lowering of v32i8 MULHU/S
Summary: If we have BWI instructions we can widen to v32i16 to do the multiply instead of splitting.

Reviewers: RKSimon, spatel, zvi

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38305

llvm-svn: 314432
2017-09-28 17:00:21 +00:00
Craig Topper fd6b8a67fb [X86] Remove dead code from X86ISelDAGToDAG.cpp multiply handling
Summary:
Lowering never creates X86ISD::UMUL for 8-bit types. X86ISD::UMUL8 is used instead. If X86ISD::UMUL 8-bit were ever used it would crash.

DAGCombiner replaces UMUL_LOHI/SMUL_LOHI with a wider MUL and a shift if the type twice as wide is legal. So we should never see i8 UMUL_LOHI/SMUL_LOHI. In fact I think there was a bug in part of the i8 code. Similar is true for i16 though without the bug.

Reviewers: RKSimon, spatel, zvi

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38276

llvm-svn: 314430
2017-09-28 16:56:36 +00:00
Craig Topper 71a8cf9f99 [X86] Use correct subvector index when combining two insert subvectors featuring zero vectors.
Previously we were using one of the subvector indices twice. The included test case causes an assert without this change.

Thanks to Simon Pilgrim for catching this.

llvm-svn: 314429
2017-09-28 16:53:16 +00:00
Simon Pilgrim 2ff339303e Use SDValue::getConstantOperandVal helper. NFCI.
llvm-svn: 314425
2017-09-28 15:53:27 +00:00
Coby Tayree 566348f2a0 [x86][AsmParser] Allow some more MS size directives
MS allows the following size directives: float/double and long as synonymous to dword/qword and dword, respectively.
Differential Revision: https://reviews.llvm.org/D37190

llvm-svn: 314410
2017-09-28 11:04:08 +00:00
Jessica Paquette 4cf187b5b4 [MachineOutliner] AArch64: Avoid saving + restoring LR if possible
This commit allows the outliner to avoid saving and restoring the link register
on AArch64 when it is dead within an entire class of candidates.

This introduces changes to the way the outliner interfaces with the target.
For example, the target now interfaces with the outliner using a
MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and
getOutliningFrameOverhead.

This also improves several comments on the outliner's cost model.

https://reviews.llvm.org/D36721

llvm-svn: 314341
2017-09-27 20:47:39 +00:00
Craig Topper c16a472966 Revert r314249 "Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like the NOREX version of TEST."""
This caused PR34751

llvm-svn: 314339
2017-09-27 20:34:17 +00:00
Craig Topper e0d8290094 Revert r314248 "[X86] Don't emit X86::MOV8rr_NOREX from X86InstrInfo::copyPhysReg."
This contributed to PR34751

llvm-svn: 314338
2017-09-27 20:34:13 +00:00
Simon Pilgrim 870007b4f8 [X86][SSE] Pull out variable shuffle mask combine logic. NFCI.
Hopefully this will make it easier to vary the combine depth threshold per-target.

llvm-svn: 314337
2017-09-27 20:19:53 +00:00
Craig Topper 7b1d503d7f [X86] Rewrite the zero vector checks in lowerV2X128VectorShuffle to use the Zeroable APInt
We already have zeroable bits in an APInt. We might as well use that instead of checking for an all zero BUILD_VECTOR.

Differential Revision: https://reviews.llvm.org/D37950

llvm-svn: 314332
2017-09-27 18:56:20 +00:00
Craig Topper 05f71dd036 [X86] In combineLoopSADPattern, pad result with zeros and use full size add instead of using a smaller add and inserting.
In some cases the result psadbw is smaller than the type of the add that started the match. Currently in these cases we are using a smaller add and inserting the result.

If we instead combine the psadbw with zeros and use the full size add we can take advantage of implicit zeroing we get if we emit a narrower move before the add.

In a future patch, I want to make isel aware that the psadbw itself already zeroed the upper bits and remove the move entirely.

Differential Revision: https://reviews.llvm.org/D37453

llvm-svn: 314331
2017-09-27 18:36:45 +00:00
Coby Tayree 836c50cc2f [X86][AsmParser] fix PR32035
Differential Revision: https://reviews.llvm.org/D37473

llvm-svn: 314295
2017-09-27 10:29:29 +00:00
Simon Pilgrim 3b0d9e789e [X86][AVX] Improve (i4 bitcast (v4i1 x)) handling for 256-bit vector compare results.
As commented on D37849 and rL313547, AVX1 targets were missing a chance to use vmovmskpd for v4f64/v4i64 results for bool vector bitcasts

llvm-svn: 314293
2017-09-27 10:10:17 +00:00