llvm-project/llvm/lib/Transforms/InstCombine
Sanjay Patel e9ca7ea3e5 [InstCombine] reverse 'trunc X to <N x i1>' canonicalization
icmp ne (and X, 1), 0 --> trunc X to N x i1

Ideally, we'd do the same for scalars, but there will likely be 
regressions unless we add more trunc folds as we're doing here 
for vectors.

The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {
  %c = fcmp ole <4 x float> %x, %y
  %s = sext <4 x i1> %c to <4 x i32>
  %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
  %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
  %cond = or <4 x i32> %s1, %s2
  %condtr = trunc <4 x i32> %cond to <4 x i1>
  %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
  ret <4 x float> %r
}

Here's a sampling of the vector codegen for that case using 
mask+icmp (current behavior) vs. trunc (with this patch):

AVX before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vandps	LCPI0_0(%rip), %xmm0, %xmm0
vxorps	%xmm1, %xmm1, %xmm1
vpcmpeqd	%xmm1, %xmm0, %xmm0
vblendvps	%xmm0, %xmm3, %xmm2, %xmm0

AVX after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vblendvps	%xmm0, %xmm2, %xmm3, %xmm0

AVX512f before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpbroadcastd	LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd	%zmm1, %zmm0, %k1
vblendmps	%zmm3, %zmm2, %zmm0 {%k1}

AVX512f after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpslld	$31, %xmm0, %xmm0
vptestmd	%zmm0, %zmm0, %k1
vblendmps	%zmm2, %zmm3, %zmm0 {%k1}

AArch64 before:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
movi	v1.4s, #1
and	v0.16b, v0.16b, v1.16b
cmeq	v0.4s, v0.4s, #0
bsl	v0.16b, v3.16b, v2.16b

AArch64 after:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
bsl	v0.16b, v2.16b, v3.16b

PowerPC-le before:

xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34

PowerPC-le after:

xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0

Differential Revision: https://reviews.llvm.org/D52747

llvm-svn: 344082
2018-10-09 21:26:01 +00:00
..
CMakeLists.txt InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemanded 2018-06-21 13:37:31 +00:00
InstCombineAddSub.cpp [InstCombine] name change: foldShuffledBinop -> foldVectorBinop; NFC 2018-10-03 15:20:58 +00:00
InstCombineAndOrXor.cpp [InstCombine] name change: foldShuffledBinop -> foldVectorBinop; NFC 2018-10-03 15:20:58 +00:00
InstCombineCalls.cpp [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle. 2018-10-08 10:32:33 +00:00
InstCombineCasts.cpp [InstCombine] reverse 'trunc X to <N x i1>' canonicalization 2018-10-09 21:26:01 +00:00
InstCombineCompares.cpp [InstCombine] reverse 'trunc X to <N x i1>' canonicalization 2018-10-09 21:26:01 +00:00
InstCombineInternal.h [InstCombine] name change: foldShuffledBinop -> foldVectorBinop; NFC 2018-10-03 15:20:58 +00:00
InstCombineLoadStoreAlloca.cpp [Local] Make DoesKMove required for combineMetadata. 2018-08-24 11:40:04 +00:00
InstCombineMulDivRem.cpp [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle. 2018-10-08 10:32:33 +00:00
InstCombinePHI.cpp [IR] Replace `isa<TerminatorInst>` with `isTerminator()`. 2018-08-26 09:51:22 +00:00
InstCombineSelect.cpp [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle. 2018-10-08 10:32:33 +00:00
InstCombineShifts.cpp [InstCombine] name change: foldShuffledBinop -> foldVectorBinop; NFC 2018-10-03 15:20:58 +00:00
InstCombineSimplifyDemanded.cpp [InstCombine] drop poison flags in SimplifyVectorDemandedElts 2018-10-04 21:36:50 +00:00
InstCombineTables.td InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemanded 2018-06-21 13:37:31 +00:00
InstCombineVectorOps.cpp [InstCombine] reverse 'trunc X to <N x i1>' canonicalization 2018-10-09 21:26:01 +00:00
InstructionCombining.cpp [InstCombine] Fix incongruous GEP type addrspace 2018-10-08 08:40:45 +00:00
LLVMBuild.txt