llvm-project/llvm/test/CodeGen/X86/blend-msb.ll

; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=corei7 -mattr=+sse4.1 | FileCheck %s


; Verify that we produce movss instead of blendvps when possible.

;CHECK-LABEL: vsel_float:
;CHECK-NOT: blend
;CHECK: movss
;CHECK: ret
define <4 x float> @vsel_float(<4 x float> %v1, <4 x float> %v2) {
  %vsel = select <4 x i1> <i1 true, i1 false, i1 false, i1 false>, <4 x float> %v1, <4 x float> %v2
  ret <4 x float> %vsel
}

;CHECK-LABEL: vsel_4xi8:
;CHECK-NOT: blend
;CHECK: movss
;CHECK: ret
define <4 x i8> @vsel_4xi8(<4 x i8> %v1, <4 x i8> %v2) {
  %vsel = select <4 x i1> <i1 true, i1 false, i1 false, i1 false>, <4 x i8> %v1, <4 x i8> %v2
  ret <4 x i8> %vsel
}

;CHECK-LABEL: vsel_8xi16:
; The select mask is
; <i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false>
; which translates into the boolean mask (big endian representation):
; 00010001 = 17.
; '1' means takes the first argument, '0' means takes the second argument.
; This is the opposite of the intel syntax, thus we expect
; the inverted mask: 11101110 = 238.
; According to the ABI:
; v1 is in xmm0 => first argument is xmm0.
; v2 is in xmm1 => second argument is xmm1.
;CHECK: pblendw $238, %xmm1, %xmm0
;CHECK: ret
define <8 x i16> @vsel_8xi16(<8 x i16> %v1, <8 x i16> %v2) {
  %vsel = select <8 x i1> <i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false>, <8 x i16> %v1, <8 x i16> %v2
  ret <8 x i16> %vsel
}
Replace more uses of sse41 with sse4.1. llc using the host cpu features and waning on unknown features is probably not a good thing :-( llvm-svn: 189144 2013-08-24 04:39:19 +08:00			`; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=corei7 -mattr=+sse4.1 \| FileCheck %s`
[AVX] Optimize x86 VSELECT instructions using SimplifyDemandedBits. We know that the blend instructions only use the MSB, so if the mask is sign-extended then we can convert it into a SHL instruction. This is a common pattern because the type-legalizer sign-extends the i1 type which is used by the LLVM-IR for the condition. Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL. llvm-svn: 148225 2012-01-16 03:27:55 +08:00

[X86] Teach how to combine a vselect into a movss/movsd Add target specific rules for combining vselect dag nodes into movss/movsd when possible. If the vector type of the vselect dag node in input is either MVT::v4i13 or MVT::v4f32, then try to fold according to rules: 1) fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B) 2) fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A) If the vector type of the vselect dag node in input is either MVT::v2i64 or MVT::v2f64 (and we have SSE2), then try to fold according to rules: 3) fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B) 4) fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A) llvm-svn: 199683 2014-01-21 03:35:22 +08:00			`; Verify that we produce movss instead of blendvps when possible.`
[AVX] Optimize x86 VSELECT instructions using SimplifyDemandedBits. We know that the blend instructions only use the MSB, so if the mask is sign-extended then we can convert it into a SHL instruction. This is a common pattern because the type-legalizer sign-extends the i1 type which is used by the LLVM-IR for the condition. Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL. llvm-svn: 148225 2012-01-16 03:27:55 +08:00
Update to more CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change. All changes were made by the following bash script: find test/CodeGen -name ".ll" \| \ while read NAME; do echo "$NAME" grep -q "^; RUN: llc.debug" $NAME && continue grep -q "^; RUN:.llvm-objdump" $NAME && continue grep -q "^; RUN: opt." $NAME && continue TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@\([A-Za-z0-9_]\)(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;\([A-Za-z0-9_-]\)\([A-Za-z0-9_-]\):\( \)$FUNC[:] \$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;\(.\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;\(.\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;\(.\)-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME done This script catches a superset of the cases caught by the script associated with commit r186280. It initially found some false positives due to unusual constructs in a minority of tests; all such cases were disambiguated first in commit r186621. llvm-svn: 186624 2013-07-19 06:47:09 +08:00			`;CHECK-LABEL: vsel_float:`
Lower vselects into X86ISD::BLENDI when appropriate. LowerVSELECT will, if possible, generate a X86ISD::BLENDI DAG node if the condition is constant and we can emit that instruction, given the subtarget. This is not enough for all cases. An additional SELECTCombine optimization will be committed. Fixed tests that were expecting variable blends but where a blend+imm can be generated. Added test where we can't emit blend+immediate. Added avx2 blend+imm tests. llvm-svn: 209043 2014-05-17 06:47:49 +08:00			`;CHECK-NOT: blend`
[X86] Teach how to combine a vselect into a movss/movsd Add target specific rules for combining vselect dag nodes into movss/movsd when possible. If the vector type of the vselect dag node in input is either MVT::v4i13 or MVT::v4f32, then try to fold according to rules: 1) fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B) 2) fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A) If the vector type of the vselect dag node in input is either MVT::v2i64 or MVT::v2f64 (and we have SSE2), then try to fold according to rules: 3) fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B) 4) fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A) llvm-svn: 199683 2014-01-21 03:35:22 +08:00			`;CHECK: movss`
[AVX] Optimize x86 VSELECT instructions using SimplifyDemandedBits. We know that the blend instructions only use the MSB, so if the mask is sign-extended then we can convert it into a SHL instruction. This is a common pattern because the type-legalizer sign-extends the i1 type which is used by the LLVM-IR for the condition. Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL. llvm-svn: 148225 2012-01-16 03:27:55 +08:00			`;CHECK: ret`
			`define <4 x float> @vsel_float(<4 x float> %v1, <4 x float> %v2) {`
			`%vsel = select <4 x i1> <i1 true, i1 false, i1 false, i1 false>, <4 x float> %v1, <4 x float> %v2`
			`ret <4 x float> %vsel`
			`}`

Update to more CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change. All changes were made by the following bash script: find test/CodeGen -name ".ll" \| \ while read NAME; do echo "$NAME" grep -q "^; RUN: llc.debug" $NAME && continue grep -q "^; RUN:.llvm-objdump" $NAME && continue grep -q "^; RUN: opt." $NAME && continue TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@\([A-Za-z0-9_]\)(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;\([A-Za-z0-9_-]\)\([A-Za-z0-9_-]\):\( \)$FUNC[:] \$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;\(.\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;\(.\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;\(.\)-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME done This script catches a superset of the cases caught by the script associated with commit r186280. It initially found some false positives due to unusual constructs in a minority of tests; all such cases were disambiguated first in commit r186621. llvm-svn: 186624 2013-07-19 06:47:09 +08:00			`;CHECK-LABEL: vsel_4xi8:`
Lower vselects into X86ISD::BLENDI when appropriate. LowerVSELECT will, if possible, generate a X86ISD::BLENDI DAG node if the condition is constant and we can emit that instruction, given the subtarget. This is not enough for all cases. An additional SELECTCombine optimization will be committed. Fixed tests that were expecting variable blends but where a blend+imm can be generated. Added test where we can't emit blend+immediate. Added avx2 blend+imm tests. llvm-svn: 209043 2014-05-17 06:47:49 +08:00			`;CHECK-NOT: blend`
[X86] Teach how to combine a vselect into a movss/movsd Add target specific rules for combining vselect dag nodes into movss/movsd when possible. If the vector type of the vselect dag node in input is either MVT::v4i13 or MVT::v4f32, then try to fold according to rules: 1) fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B) 2) fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A) If the vector type of the vselect dag node in input is either MVT::v2i64 or MVT::v2f64 (and we have SSE2), then try to fold according to rules: 3) fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B) 4) fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A) llvm-svn: 199683 2014-01-21 03:35:22 +08:00			`;CHECK: movss`
[AVX] Optimize x86 VSELECT instructions using SimplifyDemandedBits. We know that the blend instructions only use the MSB, so if the mask is sign-extended then we can convert it into a SHL instruction. This is a common pattern because the type-legalizer sign-extends the i1 type which is used by the LLVM-IR for the condition. Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL. llvm-svn: 148225 2012-01-16 03:27:55 +08:00			`;CHECK: ret`
			`define <4 x i8> @vsel_4xi8(<4 x i8> %v1, <4 x i8> %v2) {`
			`%vsel = select <4 x i1> <i1 true, i1 false, i1 false, i1 false>, <4 x i8> %v1, <4 x i8> %v2`
			`ret <4 x i8> %vsel`
			`}`

Update to more CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change. All changes were made by the following bash script: find test/CodeGen -name ".ll" \| \ while read NAME; do echo "$NAME" grep -q "^; RUN: llc.debug" $NAME && continue grep -q "^; RUN:.llvm-objdump" $NAME && continue grep -q "^; RUN: opt." $NAME && continue TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@\([A-Za-z0-9_]\)(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;\([A-Za-z0-9_-]\)\([A-Za-z0-9_-]\):\( \)$FUNC[:] \$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;\(.\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;\(.\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;\(.\)-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME done This script catches a superset of the cases caught by the script associated with commit r186280. It initially found some false positives due to unusual constructs in a minority of tests; all such cases were disambiguated first in commit r186621. llvm-svn: 186624 2013-07-19 06:47:09 +08:00			`;CHECK-LABEL: vsel_8xi16:`
[X86] Fix a bug in the lowering of BLENDI introduced in r209043. ISD::VSELECT mask uses 1 to identify the first argument and 0 to identify the second argument. On the other hand, BLENDI uses 0 to identify the first argument and 1 to identify the second argument. Fix the generation of the blend mask to account for this difference. The bug did not show up with r209043, because we were not checking for the actual arguments of the blend instruction! This commit also fixes the test cases. Note: The same mask works for the BLENDr variant because the arguments are swapped during instruction selection (see the BLENDXXrr patterns). <rdar://problem/16975435> llvm-svn: 209324 2014-05-22 06:00:39 +08:00			`; The select mask is`
			`; <i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false>`
			`; which translates into the boolean mask (big endian representation):`
			`; 00010001 = 17.`
			`; '1' means takes the first argument, '0' means takes the second argument.`
			`; This is the opposite of the intel syntax, thus we expect`
			`; the inverted mask: 11101110 = 238.`
			`; According to the ABI:`
			`; v1 is in xmm0 => first argument is xmm0.`
			`; v2 is in xmm1 => second argument is xmm1.`
			`;CHECK: pblendw $238, %xmm1, %xmm0`
[AVX] Optimize x86 VSELECT instructions using SimplifyDemandedBits. We know that the blend instructions only use the MSB, so if the mask is sign-extended then we can convert it into a SHL instruction. This is a common pattern because the type-legalizer sign-extends the i1 type which is used by the LLVM-IR for the condition. Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL. llvm-svn: 148225 2012-01-16 03:27:55 +08:00			`;CHECK: ret`
			`define <8 x i16> @vsel_8xi16(<8 x i16> %v1, <8 x i16> %v2) {`
			`%vsel = select <8 x i1> <i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 false, i1 false>, <8 x i16> %v1, <8 x i16> %v2`
			`ret <8 x i16> %vsel`
			`}`