This change is a logical suspect in 22587 and 22590. Given it's of minimal importanance and I can't get clang to build on my home machine, I'm reverting so that I can deal with this next week.
llvm-svn: 229322
This patch refactors the existing lowerVectorShuffleAsByteShift function to add support for 256-bit vectors on AVX2 targets.
It also fixes a tablegen issue that prevented the lowering of vpslldq/vpsrldq vec256 instructions.
Differential Revision: http://reviews.llvm.org/D7596
llvm-svn: 229311
when that will allow it to lower with a single permute instead of
multiple permutes.
It tries to detect when it will only have to do a single permute in
either case to maximize folding of loads and such.
This cuts a *lot* of the avx2 shuffle permute counts in half. =]
llvm-svn: 229309
directly into blends of the splats.
These patterns show up even very late in the vector shuffle lowering
where we don't have any chance for DAG combining to kick in, and
blending is a tremendously simpler operation to model. By coercing the
shuffle into a blend we can much more easily match and lower shuffles of
splats.
Immediately with this change there are significantly more blends being
matched in the x86 vector shuffle lowering.
llvm-svn: 229308
vectors and detect equivalent inputs.
This lets the code match unpck-style instructions when only one of the
inputs are lined up but the other input is a splat and so which lanes we
pull from doesn't matter. Today, this doesn't really happen, but just by
accident. I have a patch that normalizes how we shuffle splats, and with
that patch this will be necessary for a lot of the mask equivalence
tests to work.
I don't really know how to write a test case for this specific change
until the other change lands though.
llvm-svn: 229307
don't try to do element insertion for non-zero-index floating point
vectors.
We don't have any useful patterns or lowering for element insertion into
high elements of a floating point vector, and the generic shuffle
lowering will end up being better -- namely it will fall back to unpck.
But we should try to handle other forms of element insertion before
matching unpck patterns.
While this doesn't matter much right now, I'm working on a patch that
makes unpck matching much more powerful, and that patch will break
without this re-ordering.
llvm-svn: 229306
I was somewhat surprised this pattern really came up, but it does. It
seems better to just directly handle it than try to special case every
place where we end up forming a shuffle that devolves to a shuffle of
a zero vector.
llvm-svn: 229301
subvectors from buildvectors. That doesn't really make any sense and it
breaks all of the down-stream matching of buildvectors to cleverly lower
shuffles.
With this, we now get the shift-based lowering of 256-bit vector
shuffles with AVX1 when we split them into 128-bit vectors. We also do
much better on the zero-extension patterns, although there remains quite
a bit of room for improvement here.
llvm-svn: 229299
least in theory.
I don't actually have a test case that benefits from this, but
theoretically, it could come up, and I don't want to try to think about
whether this is the culprit or something else is, so I'd rather just
make this code powerful. =/ Makes me sad that I can't really test it
though.
llvm-svn: 229298
GNU ld sets default, not hidden, visibility on local symbols.
Having default or hidden visibility on local symbols makes no difference in run-time behavior.
Patch by: H.J. Lu <hjl.tools@gmail.com>
llvm-svn: 229297
The first part of that line doesn't parse correctly and ParseClassSpecifier() for
some reason skips to tok::comma to recover, and then
ParseDeclarationSpecifiers() sees the next struct and calls
ParseClassSpecifier() again with the same DeclSpec object.
However, the first call already called ActOnCXXGlobalScopeSpecifier() on the
DeclSpec's CXXScopeSpec, and sema gets confused when this gets called again.
As a fix, let ParseClassSpecifier() (and ParseEnumSpecifier()) call
ParseOptionalCXXScopeSpec() with a temporary CXXScopeSpec object, and only
copy it into the DeclSpec if things work out. (This is also how all the other
functions that set the DeclSpec's TypeSpecScope set it.)
Found by SLi's bot.
llvm-svn: 229293
lowerings -- one which decomposes into an initial blend followed by
a permute.
Particularly on newer chips, blends are handled independently of
shuffles and so this is much less bottlenecked on the single port that
floating point shuffles are executed with on Intel.
I'll be adding this lowering to a bunch of other code paths in
subsequent commits to handle still more places where we can effectively
leverage blends when they're available in the ISA.
llvm-svn: 229292
The first part of that line doesn't parse correctly and ParseClassSpecifier() for
some reason skips to tok::comma to recover, and then
ParseDeclarationSpecifiers() sees the next struct and calls
ParseClassSpecifier() again with the same DeclSpec object.
However, the first call already called ActOnCXXGlobalScopeSpecifier() on the
DeclSpec's CXXScopeSpec, and sema gets confused when this gets called again.
As a fix, let ParseClassSpecifier() (and ParseEnumSpecifier()) call
ParseOptionalCXXScopeSpec() with a temporary CXXScopeSpec object, and only
copy it into the DeclSpec if things work out. (This is also how all the other
functions that set the DeclSpec's TypeSpecScope set it.)
Found by SLi's bot.
llvm-svn: 229288
test.
This was just a matter of the DAG combine for vector shuffles being too
aggressive. This is a bit of a grey area, but I think generally if we
can re-use intermediate shuffles, we should. Certainly, given the test
cases I have available, this seems like the right call.
llvm-svn: 229285
legality test (essentially, everything is legal).
I'm planning to make this the default shortly, but I'd like to fix
a collection of the bugs it exposes first, and this will let me easily
test them. It also showcases both the improvements and a few of the
regressions triggered by the change. The biggest improvements by far are
the significantly reduced shuffling and domain crossing in the combining
test case. The biggest regressions are missing some clever blending
patterns.
llvm-svn: 229284