When we had a vector_shuffle where we had an input from each vector, we
could miscompile it because we were assuming the input from V2 wouldn't
be moved from where it was on the vector.
Added a test case.
llvm-svn: 213826
Summary:
We were being too strict and not accounting for undefs.
Added a test case and fixed another one where we improved codegen.
Reviewers: grosbach, nadav, delena
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4039
llvm-svn: 210361
Summary:
When inserting an element that's coming from a vector load or a broadcast
of a vector (or scalar) load, combine the load into the insertps
instruction.
Added PerformINSERTPSCombine for the case where we need to fix the load
(load of a vector + insertps with a non-zero CountS).
Added patterns for the broadcasts.
Also added tests for SSE4.1, AVX, and AVX2.
Reviewers: delena, nadav, craig.topper
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3581
llvm-svn: 209156
LowerVSELECT will, if possible, generate a X86ISD::BLENDI DAG node if the
condition is constant and we can emit that instruction, given the
subtarget.
This is not enough for all cases. An additional SELECTCombine optimization
will be committed.
Fixed tests that were expecting variable blends but where a blend+imm can
be generated.
Added test where we can't emit blend+immediate.
Added avx2 blend+imm tests.
llvm-svn: 209043
Summary:
Vectors built with zeros and elements in the same order as another
(source) vector are optimized to be built using a single insertps
instruction.
Also optimize when we move one element in a vector to a different place
in that vector while zeroing out some of the other elements.
Further optimizations are possible, described in TODO comments.
I will be implementing at least some of them in the near future.
Added some tests for different cases where this optimization triggers.
Reviewers: nadav, delena, craig.topper
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3521
llvm-svn: 208271
Summary:
If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower
certain shufflevectors to an insertps instruction:
When most of the shufflevector result's elements come from one vector (and
keep their index), and one element comes from another vector or a memory
operand.
Added tests for insertps optimizations on shufflevector.
Added support and tests for v4i32 vector optimization.
Reviewers: nadav
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3475
llvm-svn: 207291
This update was done with the following bash script:
find test/CodeGen -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
done
sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
mv $TEMP $NAME
fi
done
llvm-svn: 186280
dumping ground of various SSE4.1 tests, since filecheck can reasonably
handle them all in one file. Generalize it to check x86-64 stuff as
well since it has a different ABI (a convenient way to test both the
reg and mem forms of these instructions).
llvm-svn: 76848