I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently.
This happens because we were transforming any 'setb' - even when we only wanted a single-bit result.
This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it
is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that
existing behavior in this patch.
Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files
where this transform still fires.
The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register
stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate
issue.
Differential Revision: https://reviews.llvm.org/D30611
llvm-svn: 297586
This is part of the ongoing attempt to improve select codegen for all targets and select
canonicalization in IR (see D24480 for more background). The transform is a subset of what
is done in InstCombine's FoldOpIntoSelect().
I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that
hopes to convert more selects to basic math ops. This appears to be a general missing DAG
transform though, so I added tests for all standard binops in rL296621
(PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86).
The poor output for "sel_constants_shl_constant" is tracked with:
https://bugs.llvm.org/show_bug.cgi?id=32105
Differential Revision: https://reviews.llvm.org/D30502
llvm-svn: 296699
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.
This fixes PR28146.
The original commit tried inserting an 8bit-subreg into a GR32 (not GR32_ABCD)
which was not appreciated by fast regalloc on 32-bit.
llvm-svn: 274802
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.
This fixes PR28146.
Differential Revision: http://reviews.llvm.org/D21774
llvm-svn: 274692
The result type of setcc is dependent on whether or not AVX512 is
present.
We had an X86-specific DAG-combine which assumed that the result type
should be i8 when it could be i1.
This meant that we would generate illegal setccs which LowerSETCC did
not like.
Instead, use an appropriate type and zero extend to i8.
Also, there were some scenarios where the fold should have fired but
didn't because we were overly cautious about the types. This meant that
we generated:
shrl $31, %edi
andl $1, %edi
kmovw %edi, %k0
kxnorw %k0, %k0, %k1
kshiftrw $15, %k1, %k1
kxorw %k1, %k0, %k0
kmovw %k0, %eax
instead of:
testl %edi, %edi
setns %al
This fixes PR27638.
llvm-svn: 268609
Previously the patterns didn't have high enough priority and we would only use the GR32 form if the only the upper 32 or 56 bits were zero.
Fixes PR23100.
llvm-svn: 234075
This update was done with the following bash script:
find test/CodeGen -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
done
sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
mv $TEMP $NAME
fi
done
llvm-svn: 186280