llvm-project/llvm/test/Transforms/LoopVectorize/X86
Chandler Carruth b89464a9b6 [x86,sdag] Two interrelated changes to the x86 and sdag code.
First, don't combine bit masking into vector shuffles (even ones the
target can handle) once operation legalization has taken place. Custom
legalization of vector shuffles may exist for these patterns (making the
predicate return true) but that custom legalization may in some cases
produce the exact bit math this matches. We only really want to handle
this prior to operation legalization.

However, the x86 backend, in a fit of awesome, relied on this. What it
would do is mark VSELECTs as expand, which would turn them into
arithmetic, which this would then match back into vector shuffles, which
we would then lower properly. Amazing.

Instead, the second change is to teach the x86 backend to directly form
vector shuffles from VSELECT nodes with constant conditions, and to mark
all of the vector types we support lowering blends as shuffles as custom
VSELECT lowering. We still mark the forms which actually support
variable blends as *legal* so that the custom lowering is bypassed, and
the legal lowering can even be used by the vector shuffle legalization
(yes, i know, this is confusing. but that's how the patterns are
written).

This makes the VSELECT lowering much more sensible, and in fact should
fix a bunch of bugs with it. However, as you'll see in the test cases,
right now what it does is point out the *hilarious* deficiency of the
new vector shuffle lowering when it comes to blends. Fortunately, my
very next patch fixes that. I can't submit it yet, because that patch,
somewhat obviously, forms the exact and/or pattern that the DAG combine
is matching here! Without this patch, teaching the vector shuffle
lowering to produce the right code infloops in the DAG combiner. With
this patch alone, we produce terrible code but at least lower through
the right paths. With both patches, all the regressions here should be
fixed, and a bunch of the improvements (like using 2 shufps with no
memory loads instead of 2 andps with memory loads and an orps) will
stay. Win!

There is one other change worth noting here. We had hilariously wrong
vectorization cost estimates for vselect because we fell through to the
code path that assumed all "expand" vector operations are scalarized.
However, the "expand" lowering of VSELECT is vector bit math, most
definitely not scalarized. So now we go back to the correct if horribly
naive cost of "1" for "not scalarized". If anyone wants to add actual
modeling of shuffle costs, that would be cool, but this seems an
improvement on its own. Note the removal of 16 and 32 "costs" for doing
a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
course, we don't right now because of OMG bad code, but I'm going to fix
that. Next patch. I promise.

llvm-svn: 229835
2015-02-19 10:36:19 +00:00
..
already-vectorized.ll IR: Add 'distinct' MDNodes to bitcode and assembly 2015-01-08 22:38:29 +00:00
assume.ll [LoopVectorize] Ignore @llvm.assume for cost estimates and legality 2014-10-14 22:59:49 +00:00
avx1.ll
avx512.ll [X86] AVX512: Enable it in the Loop Vectorizer 2014-07-09 18:22:33 +00:00
constant-vector-operand.ll
conversion-cost.ll
cost-model.ll
fp32_to_uint32-cost-model.ll [X86] Adjust cost of FP_TO_UINT v8f32->v8i32 2014-03-30 18:07:13 +00:00
fp64_to_uint32-cost-model.ll [X86] Adjust cost of FP_TO_UINT v4f64->v4i32 as well 2014-03-31 21:54:48 +00:00
fp_to_sint8-cost-model.ll add 'requires asserts' to test that needs it 2014-03-27 00:20:42 +00:00
gather-cost.ll
gcc-examples.ll Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. 2014-09-10 17:58:16 +00:00
illegal-parallel-loop-uniform-write.ll IR: Make metadata typeless in assembly 2014-12-15 19:07:53 +00:00
lit.local.cfg Reduce verbiage of lit.local.cfg files 2014-06-09 22:42:55 +00:00
masked_load_store.ll Fixed a bug in masked load/store in reversed loop. 2015-01-22 08:20:06 +00:00
metadata-enable.ll IR: Make metadata typeless in assembly 2014-12-15 19:07:53 +00:00
min-trip-count-switch.ll Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. 2014-09-10 17:58:16 +00:00
no-vector.ll
parallel-loops-after-reg2mem.ll IR: Make metadata typeless in assembly 2014-12-15 19:07:53 +00:00
parallel-loops.ll IR: Make metadata typeless in assembly 2014-12-15 19:07:53 +00:00
powof2div.ll Allow vectorization of division by uniform power of 2. 2014-08-25 04:56:54 +00:00
rauw-bug.ll SLPVectorizer: Fix stale for Value pointer array 2013-11-19 22:20:20 +00:00
reduction-crash.ll
small-size.ll IR: Make metadata typeless in assembly 2014-12-15 19:07:53 +00:00
struct-store.ll
tripcount.ll Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. 2014-09-10 17:58:16 +00:00
uint64_to_fp64-cost-model.ll [X86][Vectorizer Cost Model] Correct vectorization cost model for v2i64->v2f64 2014-03-27 00:52:16 +00:00
unroll-pm.ll
unroll-small-loops.ll Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. 2014-09-10 17:58:16 +00:00
unroll_selection.ll Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. 2014-09-10 17:58:16 +00:00
vect.omp.force.ll IR: Make metadata typeless in assembly 2014-12-15 19:07:53 +00:00
vect.omp.force.small-tc.ll IR: Make metadata typeless in assembly 2014-12-15 19:07:53 +00:00
vector-scalar-select-cost.ll [x86,sdag] Two interrelated changes to the x86 and sdag code. 2015-02-19 10:36:19 +00:00
vector_ptr_load_store.ll [LoopVectorize] Use AA to partition potential dependency checks 2014-07-20 23:07:52 +00:00
vectorization-remarks-missed.ll IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00
vectorization-remarks.ll IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00
x86_fp80-vector-store.ll Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. 2014-09-10 17:58:16 +00:00