llvm-project/llvm/test/CodeGen/X86/avx-cvttp2si.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-- -mattr=avx              | FileCheck %s --check-prefixes=AVX,AVX1
; RUN: llc < %s -mtriple=x86_64-- -mattr=avx512f,avx512vl | FileCheck %s --check-prefixes=AVX,AVX512

; PR37751 - https://bugs.llvm.org/show_bug.cgi?id=37751
; We can't combine into 'round' instructions because the behavior is different for out-of-range values.

declare <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float>)
declare <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double>)

define <8 x float> @float_to_int_to_float_mem_v8f32(<8 x float>* %p) #0 {
; AVX-LABEL: float_to_int_to_float_mem_v8f32:
; AVX:       # %bb.0:
; AVX-NEXT:    vcvttps2dq (%rdi), %ymm0
; AVX-NEXT:    vcvtdq2ps %ymm0, %ymm0
; AVX-NEXT:    retq
  %x = load <8 x float>, <8 x float>* %p, align 16
  %fptosi = tail call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %x)
  %sitofp = sitofp <8 x i32> %fptosi to <8 x float>
  ret <8 x float> %sitofp
}

define <8 x float> @float_to_int_to_float_reg_v8f32(<8 x float> %x) #0 {
; AVX-LABEL: float_to_int_to_float_reg_v8f32:
; AVX:       # %bb.0:
; AVX-NEXT:    vcvttps2dq %ymm0, %ymm0
; AVX-NEXT:    vcvtdq2ps %ymm0, %ymm0
; AVX-NEXT:    retq
  %fptosi = tail call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %x)
  %sitofp = sitofp <8 x i32> %fptosi to <8 x float>
  ret <8 x float> %sitofp
}

define <4 x double> @float_to_int_to_float_mem_v4f64(<4 x double>* %p) #0 {
; AVX-LABEL: float_to_int_to_float_mem_v4f64:
; AVX:       # %bb.0:
; AVX-NEXT:    vcvttpd2dqy (%rdi), %xmm0
; AVX-NEXT:    vcvtdq2pd %xmm0, %ymm0
; AVX-NEXT:    retq
  %x = load <4 x double>, <4 x double>* %p, align 16
  %fptosi = tail call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %x)
  %sitofp = sitofp <4 x i32> %fptosi to <4 x double>
  ret <4 x double> %sitofp
}

define <4 x double> @float_to_int_to_float_reg_v4f64(<4 x double> %x) #0 {
; AVX-LABEL: float_to_int_to_float_reg_v4f64:
; AVX:       # %bb.0:
; AVX-NEXT:    vcvttpd2dq %ymm0, %xmm0
; AVX-NEXT:    vcvtdq2pd %xmm0, %ymm0
; AVX-NEXT:    retq
  %fptosi = tail call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %x)
  %sitofp = sitofp <4 x i32> %fptosi to <4 x double>
  ret <4 x double> %sitofp
}

attributes #0 = { "no-signed-zeros-fp-math"="true" }
[x86] add tests for potentially miscompiling cvttp2si (PR37751); NFC llvm-svn: 334367 2018-06-11 01:42:12 +08:00			`; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py`
			`; RUN: llc < %s -mtriple=x86_64-- -mattr=avx \| FileCheck %s --check-prefixes=AVX,AVX1`
			`; RUN: llc < %s -mtriple=x86_64-- -mattr=avx512f,avx512vl \| FileCheck %s --check-prefixes=AVX,AVX512`

[x86] add scalar cvtt intrinsic tests; NFC More coverage for the problem noted in D47993 (although these shouldn't be affected by that patch). llvm-svn: 334404 2018-06-11 21:51:34 +08:00			`; PR37751 - https://bugs.llvm.org/show_bug.cgi?id=37751`
[x86] add tests for potentially miscompiling cvttp2si (PR37751); NFC llvm-svn: 334367 2018-06-11 01:42:12 +08:00			`; We can't combine into 'round' instructions because the behavior is different for out-of-range values.`

			`declare <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float>)`
			`declare <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double>)`

[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761 2018-06-28 02:16:40 +08:00			`define <8 x float> @float_to_int_to_float_mem_v8f32(<8 x float>* %p) #0 {`
[X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc. We were missing packed isel folding patterns for all of sse41, avx, and avx512. For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns. Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch. Some of this was spotted in the review for D47993. This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns. llvm-svn: 334460 2018-06-12 08:48:57 +08:00			`; AVX-LABEL: float_to_int_to_float_mem_v8f32:`
			`; AVX: # %bb.0:`
[x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551) Summary: The tests in: https://bugs.llvm.org/show_bug.cgi?id=37751 ...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes. This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll Reviewers: RKSimon, gbedwell, spatel Reviewed By: spatel Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D47993 llvm-svn: 334685 2018-06-14 11:16:58 +08:00			`; AVX-NEXT: vcvttps2dq (%rdi), %ymm0`
			`; AVX-NEXT: vcvtdq2ps %ymm0, %ymm0`
[X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc. We were missing packed isel folding patterns for all of sse41, avx, and avx512. For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns. Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch. Some of this was spotted in the review for D47993. This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns. llvm-svn: 334460 2018-06-12 08:48:57 +08:00			`; AVX-NEXT: retq`
[x86] add tests for potentially miscompiling cvttp2si (PR37751); NFC llvm-svn: 334367 2018-06-11 01:42:12 +08:00			`%x = load <8 x float>, <8 x float>* %p, align 16`
			`%fptosi = tail call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %x)`
			`%sitofp = sitofp <8 x i32> %fptosi to <8 x float>`
			`ret <8 x float> %sitofp`
			`}`

[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761 2018-06-28 02:16:40 +08:00			`define <8 x float> @float_to_int_to_float_reg_v8f32(<8 x float> %x) #0 {`
[x86] add tests for potentially miscompiling cvttp2si (PR37751); NFC llvm-svn: 334367 2018-06-11 01:42:12 +08:00			`; AVX-LABEL: float_to_int_to_float_reg_v8f32:`
			`; AVX: # %bb.0:`
[x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551) Summary: The tests in: https://bugs.llvm.org/show_bug.cgi?id=37751 ...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes. This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll Reviewers: RKSimon, gbedwell, spatel Reviewed By: spatel Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D47993 llvm-svn: 334685 2018-06-14 11:16:58 +08:00			`; AVX-NEXT: vcvttps2dq %ymm0, %ymm0`
			`; AVX-NEXT: vcvtdq2ps %ymm0, %ymm0`
[x86] add tests for potentially miscompiling cvttp2si (PR37751); NFC llvm-svn: 334367 2018-06-11 01:42:12 +08:00			`; AVX-NEXT: retq`
			`%fptosi = tail call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %x)`
			`%sitofp = sitofp <8 x i32> %fptosi to <8 x float>`
			`ret <8 x float> %sitofp`
			`}`

[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761 2018-06-28 02:16:40 +08:00			`define <4 x double> @float_to_int_to_float_mem_v4f64(<4 x double>* %p) #0 {`
[X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc. We were missing packed isel folding patterns for all of sse41, avx, and avx512. For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns. Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch. Some of this was spotted in the review for D47993. This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns. llvm-svn: 334460 2018-06-12 08:48:57 +08:00			`; AVX-LABEL: float_to_int_to_float_mem_v4f64:`
			`; AVX: # %bb.0:`
[x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551) Summary: The tests in: https://bugs.llvm.org/show_bug.cgi?id=37751 ...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes. This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll Reviewers: RKSimon, gbedwell, spatel Reviewed By: spatel Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D47993 llvm-svn: 334685 2018-06-14 11:16:58 +08:00			`; AVX-NEXT: vcvttpd2dqy (%rdi), %xmm0`
			`; AVX-NEXT: vcvtdq2pd %xmm0, %ymm0`
[X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc. We were missing packed isel folding patterns for all of sse41, avx, and avx512. For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns. Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch. Some of this was spotted in the review for D47993. This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns. llvm-svn: 334460 2018-06-12 08:48:57 +08:00			`; AVX-NEXT: retq`
[x86] add tests for potentially miscompiling cvttp2si (PR37751); NFC llvm-svn: 334367 2018-06-11 01:42:12 +08:00			`%x = load <4 x double>, <4 x double>* %p, align 16`
			`%fptosi = tail call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %x)`
			`%sitofp = sitofp <4 x i32> %fptosi to <4 x double>`
			`ret <4 x double> %sitofp`
			`}`

[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761 2018-06-28 02:16:40 +08:00			`define <4 x double> @float_to_int_to_float_reg_v4f64(<4 x double> %x) #0 {`
[x86] add tests for potentially miscompiling cvttp2si (PR37751); NFC llvm-svn: 334367 2018-06-11 01:42:12 +08:00			`; AVX-LABEL: float_to_int_to_float_reg_v4f64:`
			`; AVX: # %bb.0:`
[x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551) Summary: The tests in: https://bugs.llvm.org/show_bug.cgi?id=37751 ...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes. This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll Reviewers: RKSimon, gbedwell, spatel Reviewed By: spatel Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D47993 llvm-svn: 334685 2018-06-14 11:16:58 +08:00			`; AVX-NEXT: vcvttpd2dq %ymm0, %xmm0`
			`; AVX-NEXT: vcvtdq2pd %xmm0, %ymm0`
[x86] add tests for potentially miscompiling cvttp2si (PR37751); NFC llvm-svn: 334367 2018-06-11 01:42:12 +08:00			`; AVX-NEXT: retq`
			`%fptosi = tail call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %x)`
			`%sitofp = sitofp <4 x i32> %fptosi to <4 x double>`
			`ret <4 x double> %sitofp`
			`}`

[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761 2018-06-28 02:16:40 +08:00			`attributes #0 = { "no-signed-zeros-fp-math"="true" }`