llvm-project/llvm/test/CodeGen/X86/setcc-lowering.ll

111 lines
4.4 KiB
LLVM
Raw Normal View History

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+avx < %s | FileCheck %s --check-prefix=AVX
; RUN: llc -mtriple=i386-unknown-linux-gnu -mcpu=knl < %s | FileCheck %s --check-prefix=KNL-32
[x86] Fix wrong lowering of vsetcc nodes (PR25080). Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong assumption that for non-AVX512 targets, the source type and destination type of a type-legalized setcc node were always the same type. This assumption was unfortunately incorrect; the type legalizer is not always able to promote the return type of a setcc to the same type as the first operand of a setcc. In the case of a vsetcc node, the legalizer firstly checks if the first input operand has a legal type. If so, then it promotes the return type of the vsetcc to that same type. Otherwise, the return type is promoted to the 'next legal type', which, for vectors of MVT::i1 is always a 128-bit integer vector type. Example (-mattr=+avx): %0 = trunc <8 x i32> %a to <8 x i23> %1 = icmp eq <8 x i23> %0, zeroinitializer The initial selection dag for the code above is: v8i1 = setcc t5, t7, seteq:ch t5: v8i23 = truncate t2 t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1 t7: v8i32 = build_vector of all zeroes. The type legalizer would firstly check if 't5' has a legal type. If so, then it would reuse that same type to promote the return type of the setcc node. Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to promote the return type of the setcc node. Consequently, the setcc return type is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the following dag node: v8i16 = setcc t32, t25, seteq:ch where t32 and t25 are now values of type v8i32. Before this patch, function LowerVSETCC would have wrongly expanded the setcc to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an instruction. In our case, ISel would have matched a VPCMPEQWrr: t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25 However, t36 and t25 are both VR256, while the result type is instead of class VR128. This inconsistency ended up causing the insertion of COPY instructions like this: %vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3 Which is an invalid full copy (not a sub register copy). Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy instruction" in the attempt to expand the malformed pseudo COPY instructions. This patch fixes the problem adding the missing logic in LowerVSETCC to handle the corner case of a setcc with 128-bit return type and 256-bit operand type. This problem was originally reported by Dimitry as PR25080. It has been latent for a very long time. I have added the minimal reproducible from that bugzilla as test setcc-lowering.ll. Differential Revision: http://reviews.llvm.org/D13660 llvm-svn: 250085
2015-10-13 03:22:30 +08:00
; Verify that we don't crash during codegen due to a wrong lowering
; of a setcc node with illegal operand types and return type.
define <8 x i16> @pr25080(<8 x i32> %a) {
; AVX-LABEL: pr25080:
; AVX: # %bb.0: # %entry
; AVX-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
; AVX-NEXT: vextractf128 $1, %ymm0, %xmm1
; AVX-NEXT: vpxor %xmm2, %xmm2, %xmm2
; AVX-NEXT: vpcmpeqd %xmm2, %xmm1, %xmm1
; AVX-NEXT: vpcmpeqd %xmm2, %xmm0, %xmm0
; AVX-NEXT: vpackssdw %xmm1, %xmm0, %xmm0
; AVX-NEXT: vpor {{.*}}(%rip), %xmm0, %xmm0
; AVX-NEXT: vpsllw $15, %xmm0, %xmm0
; AVX-NEXT: vpsraw $15, %xmm0, %xmm0
; AVX-NEXT: vzeroupper
; AVX-NEXT: retq
;
; KNL-32-LABEL: pr25080:
; KNL-32: # %bb.0: # %entry
; KNL-32-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; KNL-32-NEXT: vpbroadcastd {{.*#+}} ymm1 = [8388607,8388607,8388607,8388607,8388607,8388607,8388607,8388607]
; KNL-32-NEXT: vptestnmd %zmm1, %zmm0, %k0
; KNL-32-NEXT: movb $15, %al
; KNL-32-NEXT: kmovw %eax, %k1
; KNL-32-NEXT: korw %k1, %k0, %k1
; KNL-32-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; KNL-32-NEXT: vpmovdw %zmm0, %ymm0
; KNL-32-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
; KNL-32-NEXT: retl
[x86] Fix wrong lowering of vsetcc nodes (PR25080). Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong assumption that for non-AVX512 targets, the source type and destination type of a type-legalized setcc node were always the same type. This assumption was unfortunately incorrect; the type legalizer is not always able to promote the return type of a setcc to the same type as the first operand of a setcc. In the case of a vsetcc node, the legalizer firstly checks if the first input operand has a legal type. If so, then it promotes the return type of the vsetcc to that same type. Otherwise, the return type is promoted to the 'next legal type', which, for vectors of MVT::i1 is always a 128-bit integer vector type. Example (-mattr=+avx): %0 = trunc <8 x i32> %a to <8 x i23> %1 = icmp eq <8 x i23> %0, zeroinitializer The initial selection dag for the code above is: v8i1 = setcc t5, t7, seteq:ch t5: v8i23 = truncate t2 t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1 t7: v8i32 = build_vector of all zeroes. The type legalizer would firstly check if 't5' has a legal type. If so, then it would reuse that same type to promote the return type of the setcc node. Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to promote the return type of the setcc node. Consequently, the setcc return type is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the following dag node: v8i16 = setcc t32, t25, seteq:ch where t32 and t25 are now values of type v8i32. Before this patch, function LowerVSETCC would have wrongly expanded the setcc to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an instruction. In our case, ISel would have matched a VPCMPEQWrr: t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25 However, t36 and t25 are both VR256, while the result type is instead of class VR128. This inconsistency ended up causing the insertion of COPY instructions like this: %vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3 Which is an invalid full copy (not a sub register copy). Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy instruction" in the attempt to expand the malformed pseudo COPY instructions. This patch fixes the problem adding the missing logic in LowerVSETCC to handle the corner case of a setcc with 128-bit return type and 256-bit operand type. This problem was originally reported by Dimitry as PR25080. It has been latent for a very long time. I have added the minimal reproducible from that bugzilla as test setcc-lowering.ll. Differential Revision: http://reviews.llvm.org/D13660 llvm-svn: 250085
2015-10-13 03:22:30 +08:00
entry:
%0 = trunc <8 x i32> %a to <8 x i23>
%1 = icmp eq <8 x i23> %0, zeroinitializer
%2 = or <8 x i1> %1, <i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false>
%3 = sext <8 x i1> %2 to <8 x i16>
ret <8 x i16> %3
}
define void @pr26232(i64 %a, <16 x i1> %b) {
; AVX-LABEL: pr26232:
; AVX: # %bb.0: # %for_loop599.preheader
; AVX-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX-NEXT: vmovdqa {{.*#+}} xmm2 = [128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128]
; AVX-NEXT: .p2align 4, 0x90
; AVX-NEXT: .LBB1_1: # %for_loop599
; AVX-NEXT: # =>This Inner Loop Header: Depth=1
; AVX-NEXT: xorl %eax, %eax
; AVX-NEXT: cmpq $65536, %rdi # imm = 0x10000
; AVX-NEXT: setl %al
; AVX-NEXT: vmovd %eax, %xmm3
; AVX-NEXT: vpshufb %xmm1, %xmm3, %xmm3
; AVX-NEXT: vpand %xmm0, %xmm3, %xmm3
; AVX-NEXT: vpsllw $7, %xmm3, %xmm3
; AVX-NEXT: vpand %xmm2, %xmm3, %xmm3
; AVX-NEXT: vpmovmskb %xmm3, %eax
; AVX-NEXT: testw %ax, %ax
; AVX-NEXT: jne .LBB1_1
; AVX-NEXT: # %bb.2: # %for_exit600
; AVX-NEXT: retq
;
; KNL-32-LABEL: pr26232:
; KNL-32: # %bb.0: # %for_loop599.preheader
; KNL-32-NEXT: pushl %esi
; KNL-32-NEXT: .cfi_def_cfa_offset 8
; KNL-32-NEXT: .cfi_offset %esi, -8
; KNL-32-NEXT: vpmovsxbd %xmm0, %zmm0
; KNL-32-NEXT: vpslld $31, %zmm0, %zmm0
; KNL-32-NEXT: vptestmd %zmm0, %zmm0, %k0
; KNL-32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL-32-NEXT: movl {{[0-9]+}}(%esp), %ecx
; KNL-32-NEXT: movl $65535, %edx # imm = 0xFFFF
; KNL-32-NEXT: .p2align 4, 0x90
; KNL-32-NEXT: .LBB1_1: # %for_loop599
; KNL-32-NEXT: # =>This Inner Loop Header: Depth=1
; KNL-32-NEXT: cmpl $65536, %ecx # imm = 0x10000
; KNL-32-NEXT: movl %eax, %esi
; KNL-32-NEXT: sbbl $0, %esi
; KNL-32-NEXT: movl $0, %esi
; KNL-32-NEXT: cmovll %edx, %esi
; KNL-32-NEXT: kmovw %esi, %k1
; KNL-32-NEXT: kandw %k0, %k1, %k1
; KNL-32-NEXT: kortestw %k1, %k1
; KNL-32-NEXT: jne .LBB1_1
; KNL-32-NEXT: # %bb.2: # %for_exit600
; KNL-32-NEXT: popl %esi
Correct dwarf unwind information in function epilogue This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: * CFI instructions do not affect code generation (they are not counted as instructions when tail duplicating or tail merging) * Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Added CFIInstrInserter pass: * analyzes each basic block to determine cfa offset and register are valid at its entry and exit * verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors * inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D42848 llvm-svn: 330706
2018-04-24 18:32:08 +08:00
; KNL-32-NEXT: .cfi_def_cfa_offset 4
; KNL-32-NEXT: retl
allocas:
br label %for_test11.preheader
for_test11.preheader: ; preds = %for_test11.preheader, %allocas
br i1 undef, label %for_loop599, label %for_test11.preheader
for_loop599: ; preds = %for_loop599, %for_test11.preheader
%less_i_load605_ = icmp slt i64 %a, 65536
%less_i_load605__broadcast_init = insertelement <16 x i1> undef, i1 %less_i_load605_, i32 0
%less_i_load605__broadcast = shufflevector <16 x i1> %less_i_load605__broadcast_init, <16 x i1> undef, <16 x i32> zeroinitializer
%"oldMask&test607" = and <16 x i1> %less_i_load605__broadcast, %b
%intmask.i894 = bitcast <16 x i1> %"oldMask&test607" to i16
%res.i895 = icmp eq i16 %intmask.i894, 0
br i1 %res.i895, label %for_exit600, label %for_loop599
for_exit600: ; preds = %for_loop599
ret void
}