Commit Graph

11001 Commits

Author SHA1 Message Date
Craig Topper 2aac3ee5bc [X86] Legalize 128/256 gathers/scatters on KNL by using widening rather than sign extending the index.
We can just widen the vectors with undef and zero extend the mask.

llvm-svn: 322308
2018-01-11 19:38:30 +00:00
Zvi Rackover 999e6c2967 DAGCombine: Let truncates negate extension through extract-subvector
Summary:
Fold cases such as:
(v8i8 truncate (v8i32 extract_subvector (v16i32 sext (v16i8 V), Idx)))
->
(v8i8 extract_subvector (v16i8 V), Idx)

This can be generalized to cases where the truncate and extend do not
fully cancel each other out, but it may require querying the target
about profitability.

Reviewers: RKSimon, craig.topper, spatel, efriedma

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41927

llvm-svn: 322300
2018-01-11 18:02:33 +00:00
Zvi Rackover cf0999887a X86 Tests: Add zext cases in (trunc (subvector)) test. NFC
Cases were missing as observed in D41927

llvm-svn: 322297
2018-01-11 17:50:34 +00:00
Simon Pilgrim 8de035670e [X86][SSE] Drop old insertps stack folding test
Broken test from old attempt for folding tables - we don't peek through extract_subvector spills at all (which is why it doesn't fold), and we already have foldMemoryOperandCustom to handle insertps immediate correction anyway.

llvm-svn: 322292
2018-01-11 16:57:58 +00:00
Simon Pilgrim 6e6da3f449 [X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding
Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering.

llvm-svn: 322279
2018-01-11 14:25:18 +00:00
Zvi Rackover 3ee66d9cd1 X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices
Summary:
As RKSimon suggested in pr35820, in the case that Src is smaller in
bit-size than Indices, need to widen Src to avoid type mismatch.

Fixes pr35820

Reviewers: RKSimon, craig.topper

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41865

llvm-svn: 322272
2018-01-11 12:26:52 +00:00
Craig Topper 0b59034b15 [X86] Optimize v2i32/v2f32 scatters.
If the index is v2i64 we can use the scatter instruction that has v4i32/v4f32 data register, v2i64 index, and v2i1 mask. Similar was already done for gather.

Implement custom widening for v2i32 data to remove the code that reverses type legalization during lowering.

llvm-svn: 322254
2018-01-11 06:31:28 +00:00
Matthias Braun 725ad0eee0 TargetLoweringBase: The ios simulator has no bzero function.
Make sure I really get back to the beahvior before my rewrite in r321035
which turned out not to be completely NFC as I changed the behavior for
the ios simulator environment.

llvm-svn: 322223
2018-01-10 20:49:57 +00:00
Craig Topper af4eb17223 [SelectionDAG][X86] Explicitly store the scale in the gather/scatter ISD nodes
Currently we infer the scale at isel time by analyzing whether the base is a constant 0 or not. If it is we assume scale is 1, else we take it from the element size of the pass thru or stored value. This seems a little weird and I think it makes more sense to make it explicit in the DAG rather than doing tricky things in the backend.

Most of this patch is just making sure we copy the scale around everywhere.

Differential Revision: https://reviews.llvm.org/D40055

llvm-svn: 322210
2018-01-10 19:16:05 +00:00
Simon Pilgrim f74e3f45dc [X86][MMX] Add test for PR35869
llvm-svn: 322197
2018-01-10 17:05:03 +00:00
Zvi Rackover a27442f4f4 X86 Tests: Add isel tests for truncate-extract_vector-extend. NFC.
To be improved in a future patch

llvm-svn: 322192
2018-01-10 14:56:15 +00:00
Simon Pilgrim a0c59cce0e [X86][SSE] Add some basic FABS combine tests
llvm-svn: 322182
2018-01-10 13:28:34 +00:00
Simon Pilgrim a330a407c4 [X86][SSE] Add v2f64 u2 shuffle test
Adds missing coverage for SHUFPD undef argument lowering, and also shows a missed opportunity to remove a unnecessary move compared to 02 shuffle mask.

llvm-svn: 322175
2018-01-10 12:23:39 +00:00
Puyan Lotfi fe6c9cbb24 [MIR] Repurposing '$' sigil used by external symbols. Replacing with '&'.
Planning to add support for named vregs. This puts is in a conundrum since
physregs are named as well. To rectify this we need to use a sigil other than
'%' for physregs in MIR. We've settled on using '$' for physregs but first we
must repurpose it from external symbols using it, which is what this commit is
all about. We think '&' will have familiar semantics for C/C++ users.

llvm-svn: 322146
2018-01-10 00:56:48 +00:00
Craig Topper c4d2dd80b6 [X86] Add a DAG combine to combine (sext (setcc)) with VLX
Normally target independent DAG combine would do this combine based on getSetCCResultType, but with VLX getSetCCResultType returns a vXi1 type preventing the DAG combining from kicking in.

But doing this combine can allow us to remove the explicit sign extend that would otherwise be emitted.

This patch adds a target specific DAG combine to combine the sext+setcc when the result type is the same size as the input to the setcc. I've restricted this to FP compares and things that can be represented with PCMPEQ and PCMPGT since we don't have full integer compare support on the older ISAs.

Differential Revision: https://reviews.llvm.org/D41850

llvm-svn: 322101
2018-01-09 18:14:22 +00:00
Zvi Rackover 72b0bb1405 X86 Tests: Update more isel tests with FastVariableShuffle feature
Summary:
Added the FastVariableShuffle feature to cases that resembled processors
for which this fearure is on.
For AVX2 there are processors with and w/o this fearue enable.
For AVX512 only KNL does enable this feature so cases which only have
+avx512f were left without the FastVariableShuffle enabled.

Reviewers: RKSimon, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41851

llvm-svn: 322090
2018-01-09 16:26:06 +00:00
Zvi Rackover b11e84c1d8 X86 Tests: Add common check prefix to test-case. NFC.
As suggested in D41851

llvm-svn: 322089
2018-01-09 16:14:15 +00:00
Sanjay Patel 37e28e40cb [SelectionDAG] lower math intrinsics to finite version of libcalls when possible (PR35672)
Ingredients in this patch:
1. Add HANDLE_LIBCALL defs for finite mathlib functions that correspond to LLVM intrinsics.
2. Plumbing to send TargetLibraryInfo down to SelectionDAGLegalize.
3. Relaxed math and library checking in SelectionDAGLegalize::ConvertNodeToLibcall() to choose finite libcalls.

There was a bug about determining the availability of the finite calls that should be fixed with:
rL322010

Not in this patch:
This doesn't resolve the question/bug of clang creating the intrinsic IR in the first place.
There's likely follow-up work needed to support the long double variants better.
There's room for improvement to reduce the code duplication.
Create finite calls that don't originate from a corresponding intrinsic or DAG node?

Differential Revision: https://reviews.llvm.org/D41338

llvm-svn: 322087
2018-01-09 15:41:00 +00:00
Francis Visoiu Mistrih 2b3bd30637 [CodeGen] Don't print register classes in -debug output
Since register classes and banks are already printed with the register
definition, don't print it at the end of every instruction anymore.

This follows MIR in this regard and is another step to the unification
of the two formats.

llvm-svn: 322086
2018-01-09 15:39:44 +00:00
Simon Pilgrim 9cf3e765d8 [X86][AVX] Add v2i64/v2f64 load tests
Ensure these use insertions, not masked load ops

llvm-svn: 322076
2018-01-09 13:35:18 +00:00
Oren Ben Simhon 1c6308ecd5 Instrument Control Flow For Indirect Branch Tracking
CET (Control-Flow Enforcement Technology) introduces a new mechanism called IBT (Indirect Branch Tracking).
According to IBT, each Indirect branch should land on dedicated ENDBR instruction (End Branch).
The new pass adds ENDBR instructions for every indirect jmp/call (including jumps using jump tables / switches).
For more information, please see the following:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

Differential Revision: https://reviews.llvm.org/D40482

Change-Id: Icb754489faf483a95248f96982a4e8b1009eb709
llvm-svn: 322062
2018-01-09 08:51:18 +00:00
Craig Topper def1c30c66 [X86] Allow more cmpps/pd immediate encodings to be commuted during isel.
The code that checks the immediate wasn't masking to the lower 3-bits like the code in X86InstrInfo.cpp that's used by the peephole pass does.

llvm-svn: 322060
2018-01-09 07:09:34 +00:00
Craig Topper cc342d465e [X86] Remove llvm.x86.avx512.cvt*2mask.* intrinsics and autoupgrade to (icmp slt X, 0)
I had to drop fast-isel-abort from a test because we can't fast isel some of the mask stuff. When we used intrinsics we implicitly fell back to SelectionDAG for the intrinsic call without triggering the abort error. But with native IR that doesn't happen the same way.

llvm-svn: 322050
2018-01-09 00:50:47 +00:00
Sam Parker 3800f0f11d [DAGCombine] Fix for PR35761
I had falsely assumed that constant operands would be operand(1) of
the bin ops that may need their constant operand to be masked.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=35761

Differential Revision: https://reviews.llvm.org/D41667

llvm-svn: 321991
2018-01-08 13:21:24 +00:00
Sam Parker 51164c409d [X86] Renamed CodeGen test
llvm-svn: 321988
2018-01-08 10:56:44 +00:00
Craig Topper f090e8a89a [X86] Replace CVT2MASK ISD opcode with PCMPGTM compared to zero.
CVT2MASK is just checking the sign bit which can be represented with a comparison with zero.

llvm-svn: 321985
2018-01-08 06:53:54 +00:00
Craig Topper a2018e799a [X86] Add patterns to allow 512-bit BWI compare instructions to be used for 128/256-bit compares when VLX is not available.
llvm-svn: 321984
2018-01-08 06:53:52 +00:00
Craig Topper 03d8e516cf [X86] Add VSHUFF32X4 and similar instructions to load folding tables.
llvm-svn: 321978
2018-01-07 23:30:20 +00:00
Zvi Rackover 93b8bd4955 X86 Tests: Add Tests for PMADDWD selection. NFC.
Support for ISel to be added.

llvm-svn: 321970
2018-01-07 20:21:10 +00:00
Simon Pilgrim 998180dad3 [DAG] Fix for Bug PR34620 - Allow SimplifyDemandedBits to look through bitcasts
Allow SimplifyDemandedBits to use TargetLoweringOpt::computeKnownBits to look through bitcasts. This can help simplifying in some cases where bitcasts of constants generated during or after legalization can't be folded away, and thus didn't get picked up by SimplifyDemandedBits. This fixes PR34620, where a redundant pand created during legalization from lowering and lshr <16xi8> wasn't being simplified due to the presence of a bitcasted build_vector as an operand.

Committed on the behalf of @sameconrad (Sam Conrad)

Differential Revision: https://reviews.llvm.org/D41643

llvm-svn: 321969
2018-01-07 19:09:40 +00:00
Craig Topper d58c165545 [X86] Make v2i1 and v4i1 legal types without VLX
Summary:
There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.

It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.

This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.

We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.

I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.

There's definitely room for improvement with some follow up patches.

Reviewers: RKSimon, zvi, guyblank

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41560

llvm-svn: 321967
2018-01-07 18:20:37 +00:00
Craig Topper a21f551109 [X86] Add the 16 and 8-bit CRC32 instructions to the load folding tables.
llvm-svn: 321958
2018-01-07 06:48:20 +00:00
Craig Topper 89293a2a94 [X86] Add 128 and 256-bit VPOPCNTD/Q instructions to load folding tables.
llvm-svn: 321953
2018-01-07 06:24:27 +00:00
Craig Topper 11aede13db [X86] Add EVEX vcvtph2ps to the load folding tables.
llvm-svn: 321951
2018-01-07 06:24:24 +00:00
Craig Topper 40cc8338f7 [X86] Remove cvtps2ph xmm->xmm from store folding tables. Add the evex versions of cvtps2ph to the store folding tables.
The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros.

llvm-svn: 321950
2018-01-07 06:24:23 +00:00
Craig Topper 0f4ccb7806 [X86] Add load folding pattern to EVEX vcvttss2si/vcvtsd2si.
llvm-svn: 321945
2018-01-06 21:02:26 +00:00
Sanjay Patel 5a48aef3f0 [x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325)
This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325

We're trading branch and compares for loads and logic ops. 
This makes the code smaller and hopefully faster in most cases.

The 24-byte test shows an interesting construct: we load the trailing scalar 
elements into vector registers and generate the same pcmpeq+movmsk code that 
we expected for a pair of full vector elements (see the 32- and 64-byte tests).

Differential Revision: https://reviews.llvm.org/D41714

llvm-svn: 321934
2018-01-06 16:16:04 +00:00
Craig Topper 8c2ea74e74 [X86] Call lowerShuffleAsRepeatedMaskAndLanePermute from lowerV4I64VectorShuffle.
llvm-svn: 321929
2018-01-06 06:08:04 +00:00
Craig Topper af1d257571 [X86] Run dos2unix on a test file. NFC
llvm-svn: 321928
2018-01-06 06:08:02 +00:00
Craig Topper 004867312e [X86] Stop printing moves between VR64 and GR64 with 'movd' mnemonic. Use 'movq' instead.
This behavior existed to work with an old version of the gnu assembler on MacOS that only accepted this form. Newer versions of GNU assembler and the current LLVM derived version of the assembler on MacOS support movq as well.

llvm-svn: 321898
2018-01-05 20:55:12 +00:00
Simon Pilgrim 15fcbe2d4a [X86] Regenerate illegal move test
Recommitting after fixing case-sensitive issue in the RUN command

llvm-svn: 321868
2018-01-05 14:24:03 +00:00
Sam Parker 1ad085b808 [DAGCombine] Fix for PR37563
While searching for loads to be narrowed, equal sized loads were not
added to the list, resulting in anyext loads not being converted to
zext loads.

https://bugs.llvm.org/show_bug.cgi?id=35763

Differential Revision: https://reviews.llvm.org/D41628

llvm-svn: 321862
2018-01-05 08:47:23 +00:00
Simon Pilgrim ab84a9a65d [X86] Add srem/udiv/urem by one combine tests
llvm-svn: 321826
2018-01-04 22:08:36 +00:00
Simon Pilgrim e7c06423c1 [X86] Add scalar undef sdiv/srem/udiv/urem combine tests
llvm-svn: 321823
2018-01-04 21:33:19 +00:00
Craig Topper dffb98e03d [X86] Correct the execution domain for AVX1 VBROADCASTF128 to be FP instead of integer.
llvm-svn: 321821
2018-01-04 20:56:21 +00:00
Amara Emerson 83e852f30d Revert "[X86] Regenerate test"
This reverts r321814 as it was failing make check.

llvm-svn: 321819
2018-01-04 20:20:44 +00:00
Simon Pilgrim a47289a2ee [X86] Regenerate test
llvm-svn: 321814
2018-01-04 18:48:42 +00:00
Simon Pilgrim 208fab193b [X86] Add common CHECK prefix for tests without SSE/AVX codegen
llvm-svn: 321810
2018-01-04 18:23:46 +00:00
Simon Pilgrim d6923083ba Regenerate broadcast constant comment
llvm-svn: 321808
2018-01-04 18:21:33 +00:00
Simon Pilgrim 5643b40d91 [X86] Show missed combine for X/X for SDIV/UDIV and X%X for SREM/UREM
llvm-svn: 321807
2018-01-04 18:20:46 +00:00