Commit Graph

9930 Commits

Author SHA1 Message Date
Simon Pilgrim 66a2eb8c77 [X86][AVX512] Regenerated and cleaned up extension tests.
llvm-svn: 309139
2017-07-26 16:47:00 +00:00
Simon Pilgrim b77cb95744 [X86] Regenerate setcc tests
llvm-svn: 309138
2017-07-26 16:45:57 +00:00
Simon Pilgrim 164160b4f6 [X86][AVX512] Regenerate shuffle tests with broadcast comments.
llvm-svn: 309137
2017-07-26 16:41:18 +00:00
Simon Pilgrim 0a7d9ac766 [X86] Regenerate memset tests
llvm-svn: 309136
2017-07-26 16:39:07 +00:00
Simon Pilgrim 01ab86e62b [X86] Add combineBT test failure because bits have multiple uses.
llvm-svn: 309124
2017-07-26 15:41:57 +00:00
Zvi Rackover 092f199188 DAGCombiner: Extend reduceBuildVecToTrunc to handle non-zero offset
Summary:
Adding support for combining power2-strided build_vector's where the
first build_vectori's operand is extracted from a non-zero index.

Example:

 v4i32 build_vector((extract_elt V, 1),
                    (extract_elt V, 3),
                    (extract_elt V, 5),
                    (extract_elt V, 7))
 -->
 v4i32 truncate (bitcast (shuffle<1,u,3,u,5,u,7,u> V, u) to v4i64)

Reviewers: delena, RKSimon, guyblank

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35700

llvm-svn: 309108
2017-07-26 12:57:03 +00:00
Simon Pilgrim a9551fb10f [X86] Regenerated BT tests
Test on 32/64 bit targets where appropriate 

llvm-svn: 309107
2017-07-26 12:49:20 +00:00
Simon Pilgrim dd06da0804 [X86] Add urem vector test for non-uniform pow2 constants
llvm-svn: 309104
2017-07-26 11:07:45 +00:00
Simon Pilgrim c5c72306f3 [X86] Regenerated urem pow2 tests on 32/64 bit targets
llvm-svn: 309103
2017-07-26 11:05:16 +00:00
Simon Pilgrim 976a5d2662 [X86] Regenerated umul overflow tests on 32/64 bit targets
llvm-svn: 309102
2017-07-26 11:04:18 +00:00
Simon Pilgrim 106307aa13 [X86][AVX] Regenerated and cleaned up AVX1 intrinsic tests.
Cleaned up triple settings, added 32-bit/64-bit targets where useful, added broadcast comments

llvm-svn: 309100
2017-07-26 10:54:51 +00:00
Simon Pilgrim c402839c72 [X86][AVX2] Regenerated and cleaned up broadcast tests.
llvm-svn: 309099
2017-07-26 10:47:51 +00:00
Simon Pilgrim b695f74bba [X86][AVX512] Regenerated and added 32-bit targets to select tests
llvm-svn: 309098
2017-07-26 10:39:55 +00:00
Simon Pilgrim 82097a8d8c [X86][AVX] Regenerated and cleaned up masked gather/scatter tests.
Remove unused KNL checks and triple settings, added broadcast comments

llvm-svn: 309097
2017-07-26 10:37:12 +00:00
Simon Pilgrim dbf1fa8958 [X86][AVX] Regenerate lzcnt test.
Tidied up triples and checks.

llvm-svn: 309095
2017-07-26 10:22:56 +00:00
Simon Pilgrim ddf407dec9 [X86][FMA] Regenerate test with broadcast comments.
llvm-svn: 309093
2017-07-26 10:20:49 +00:00
Michael Zuckerman c1918ad571 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
This patch expands the support of lowerInterleavedStore to 32x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low).

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 32 chars:

c0, c1, , c31
m0, m1, , m31
y0, y1, , y31
k0, k1, ., k31

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Reviewers:
dorit
Farhana
RKSimon
guyblank
DavidKreitzer

Differential Revision: https://reviews.llvm.org/D34601

llvm-svn: 309086
2017-07-26 08:10:14 +00:00
Craig Topper 050c9c8f83 [X86] Prevent selecting masked aligned load instructions if the load should be non-temporal
Summary: The aligned load predicates don't  suppress themselves if the load is non-temporal the way the unaligned predicates do. For the most part this isn't a problem because the aligned predicates are mostly used for instructions that only load the the non-temporal loads have priority over those. The exception are masked loads.

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35712

llvm-svn: 309079
2017-07-26 04:31:04 +00:00
Simon Pilgrim 18b97f78fe [X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914)
D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914).

Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os).

This patch should be considered for the 5.0.0 release branch as well

Differential Revision: https://reviews.llvm.org/D35830

llvm-svn: 308986
2017-07-25 17:04:37 +00:00
Simon Pilgrim 0d3054fb44 [X86] Regenerate test.
llvm-svn: 308981
2017-07-25 16:10:32 +00:00
Simon Pilgrim 3edf2901d2 [X86] Regenerate test with broadcast comments.
llvm-svn: 308980
2017-07-25 16:09:56 +00:00
Simon Pilgrim 3459f108f8 [X86] Add 24-byte memcmp tests (PR33914)
llvm-svn: 308963
2017-07-25 10:33:36 +00:00
Michael Zuckerman 196b3cadf6 Adding base test for interleave store VF16 and expand the test for AVX512
This patch doesn't modifay any non test file.

llvm-svn: 308909
2017-07-24 18:29:56 +00:00
Ayman Musa b16ce777e3 [X86][AVX512] Add patterns for masked AVX512 floating point compare instructions that were missing.
patterns were missed by D33188. Adding for completion.
+Updating test.

Differential Revesion: https://reviews.llvm.org/D35179

llvm-svn: 308868
2017-07-24 08:10:32 +00:00
Petr Hosek 710479cede [CodeGen][X86] Fuchsia supports sincos* libcalls and sin+cos->sincos optimization
Patch by Roland McGrath

Differential Revision: https://reviews.llvm.org/D35748

llvm-svn: 308854
2017-07-23 22:30:00 +00:00
Craig Topper 6912d7faa3 [X86] Add patterns for memory forms of SARX/SHLX/SHRX with careful complexity adjustment to keep shift by immediate using the legacy instructions.
These patterns were only missing to favor using the legacy instructions when the shift was a constant. With careful adjustment of the pattern complexity we can make sure the immediate instructions still have priority over these patterns.

llvm-svn: 308834
2017-07-23 03:59:37 +00:00
Simon Pilgrim 84cbd8e750 [X86][SSE] Add extra (sra (sra x, c1), c2) -> (sra x, (add c1, c2)) test case
We should be able to handle the case where some c1+c2 elements exceed max shift and some don't by performing a clamp after the sum

llvm-svn: 308724
2017-07-21 10:22:49 +00:00
Simon Pilgrim 32c377a1cf [X86][SSE] Add pre-AVX2 support for (i32 bitcast(v32i1)) -> 2xMOVMSK
Currently we only support (i32 bitcast(v32i1)) using the AVX2 VPMOVMSKB ymm instruction.

This patch adds support for splitting pre-AVX2 targets into 2 x (V)PMOVMSKB xmm instructions and merging the integer results.

In future we could probably generalize this to handle more cases.

Differential Revision: https://reviews.llvm.org/D35303

llvm-svn: 308723
2017-07-21 09:58:50 +00:00
Craig Topper 31140ade70 [AVX-512] Fix a bug that prevented some non-temporal loads from using the movntdqa instruction.
The bitconverts here had an input type of 128-bits and an output type of 256 bits. The input type should also have been 256 bits.

llvm-svn: 308702
2017-07-21 00:40:42 +00:00
Matt Arsenault db78273b6e Add an ID field to StackObjects
On AMDGPU SGPR spills are really spilled to another register.
The spiller creates the spills to new frame index objects,
which is used as a placeholder.

This will eventually be replaced with a reference to a position
in a VGPR to write to and the frame index deleted. It is
most likely not a real stack location that can be shared
with another stack object.

This is a problem when StackSlotColoring decides it should
combine a frame index used for a normal VGPR spill with
a real stack location and a frame index used for an SGPR.

Add an ID field so that StackSlotColoring has a way
of knowing the different frame index types are
incompatible.

llvm-svn: 308673
2017-07-20 21:03:45 +00:00
Zvi Rackover eac8e7c08a [X86] Adding ISel tests for strided-shuffles with non-zero offset. NFC.
llvm-svn: 308672
2017-07-20 21:03:36 +00:00
Craig Topper 27c12e088e [X86] Allow masks with more than 6 bits set on the x << (y & mask) optimization for the 64-bit memory shifts.
llvm-svn: 308657
2017-07-20 19:29:58 +00:00
Craig Topper 02959b3d05 [X86] Add test case to demonstrate that we don't allow masks wider than 6 bits in the (shift x, (and y, mask)) patterns for the 64-bit memory form.
We allow wider than 5 bits in the 16 and 32 bit store forms. And we allow wider than 6 bits on the 64-bit regsiter form.:w

I'm assuming this was a mistake made back in r148024.

llvm-svn: 308656
2017-07-20 19:29:56 +00:00
Nirav Dave df86d2d008 [DAG] Handle missing transform in fold of value extension case.
Summary:
When pushing an extension of a constant bitwise operator on a load
into the load, change other uses of the load value if they exist to
prevent the old load from persisting.

Reviewers: spatel, RKSimon, efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35030

llvm-svn: 308618
2017-07-20 13:57:32 +00:00
Nirav Dave 77cc6f23b9 [DAG] Optimize away degenerate INSERT_VECTOR_ELT nodes.
Summary:
Add missing vector write of vector read reduction, i.e.:

(insert_vector_elt x (extract_vector_elt x idx) idx) to x

Reviewers: spatel, RKSimon, efriedma

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35563

llvm-svn: 308617
2017-07-20 13:48:17 +00:00
Simon Pilgrim b6485252aa [X86][AVX512] Improve vector rotation constant folding tests
Test constant folding both on node creation (which already works) and once the input nodes have been folded themselves (not working yet).

llvm-svn: 308611
2017-07-20 13:07:37 +00:00
Simon Pilgrim 2911296f10 [DAGCombiner] Match ISD::SRL non-uniform constant vectors patterns using predicates.
Use predicate matchers introduced in D35492 to match more ISD::SRL constant folds

llvm-svn: 308602
2017-07-20 11:03:30 +00:00
Simon Pilgrim 7ff0e49d8c [DAGCombiner] Match ISD::SRA non-uniform constant vectors patterns using predicates.
Use predicate matchers introduced in D35492 to match more ISD::SRA constant folds

llvm-svn: 308600
2017-07-20 10:43:05 +00:00
Simon Pilgrim 9d7863b935 [DAGCombiner] Match non-uniform constant vectors using predicates.
Most combines currently recognise scalar and splat-vector constants, but not non-uniform vector constants.

This patch introduces a matching mechanism that uses predicates to check against BUILD_VECTOR of ConstantSDNode, as well as scalar ConstantSDNode cases.

I've changed a couple of predicates to demonstrate - the combine-shl changes add currently unsupported cases, while the MatchRotate replaces an existing mechanism.

Differential Revision: https://reviews.llvm.org/D35492

llvm-svn: 308598
2017-07-20 10:13:40 +00:00
Craig Topper 33225ef314 [X86] Use SARX/SHLX/SHLX instructions for (shift x (and y, (BitWidth-1)))
Fixes PR33841.

llvm-svn: 308591
2017-07-20 06:19:55 +00:00
Craig Topper bdd114ef9d [X86] Add test cases for (shift x (and y, (BitWidth-1))) to the BMI2 shift test.
We should use SHLX and similar instructions for these patterns, but we currently don't.

llvm-svn: 308590
2017-07-20 06:19:54 +00:00
Craig Topper a774ecc7f5 [X86] Regenerate shift-and.ll and shift-bmi2.ll using update_llc_test_checks.py.
I've stripped the checks for 64-bit types in 32-bit mode to match the existing tests.

llvm-svn: 308589
2017-07-20 06:19:53 +00:00
Craig Topper 01d4ca3916 [X86] Remove outdated bug comment from a test.
The test issue was fixed and the test was updated in r244577, but the comment wasn't removed.

llvm-svn: 308588
2017-07-20 06:19:52 +00:00
Francis Visoiu Mistrih 52042aa21e [PEI] Add basic opt-remarks support
Add optimization remarks support to the PrologueEpilogueInserter. For
now, emit the stack size as an analysis remark, but more additions wrt
shrink-wrapping may be added.

https://reviews.llvm.org/D35645

llvm-svn: 308556
2017-07-19 23:47:32 +00:00
Wolfgang Pieb 3610942c12 Forgot to add triple to test in r308513.
llvm-svn: 308527
2017-07-19 21:45:21 +00:00
Wolfgang Pieb e018bbd835 Fixing an issue with the initialization of LexicalScopes objects when mixing debug
and non-debug units.

Patch by Andrea DiBiagio.

Differential Revision:  https://reviews.llvm.org/D35637

llvm-svn: 308513
2017-07-19 19:36:40 +00:00
Davide Italiano 5fc5d0a406 [X86] Don't try to scale down if that exceeds the bitwidth.
Fixes the crash reported in PR33844.

llvm-svn: 308503
2017-07-19 18:09:46 +00:00
Simon Pilgrim e5c7925c5e [X86][XOP] Use default AVX2 lowering for v4i64 ashr by splat constants
XOP shifts only support 128-bit vectors, so we were ending up with less optimal codegen requiring constants

llvm-svn: 308430
2017-07-19 10:29:31 +00:00
Craig Topper 106b5b6856 AMD znver1 Initial Scheduler model
Summary:
This patch adds the following
1. Adds a skeleton scheduler model for AMD Znver1.
2. Introduces the znver1 execution units and pipes.
3. Caters the instructions based on the generic scheduler classes.
4. Further additions to the scheduler model with instruction itineraries will be carried out incrementally based on
        a. Instructions types
        b. Registers used
5. Since itineraries are not added based on instructions, throughput information are bound to change when incremental changes are added.
6. Scheduler testcases are modified accordingly to suit the new model.

Patch by Ganesh Gopalasubramanian. With minor formatting tweaks from me.

Reviewers: craig.topper, RKSimon

Subscribers: javed.absar, shivaram, ddibyend, vprasad

Differential Revision: https://reviews.llvm.org/D35293

llvm-svn: 308411
2017-07-19 02:45:14 +00:00
Nirav Dave d839749ae8 [DAG] Improve Aliasing of operations to static alloca
Re-recommiting after landing DAG extension-crash fix.

Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.

Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 308350
2017-07-18 20:06:24 +00:00