[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
; RUN: llc < %s -mtriple=x86_64 -enable-unsafe-fp-math \
|
2016-12-01 22:56:48 +08:00
|
|
|
; RUN: | FileCheck %s --check-prefix=CHECK --check-prefix=CST --check-prefix=SSE --check-prefix=SSE2
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
; RUN: llc < %s -mtriple=x86_64 -enable-unsafe-fp-math -mattr=+sse4.1 \
|
2016-12-01 22:56:48 +08:00
|
|
|
; RUN: | FileCheck %s --check-prefix=CHECK --check-prefix=CST --check-prefix=SSE --check-prefix=SSE41
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
; RUN: llc < %s -mtriple=x86_64 -enable-unsafe-fp-math -mattr=+avx \
|
2016-12-01 22:56:48 +08:00
|
|
|
; RUN: | FileCheck %s --check-prefix=CHECK --check-prefix=CST --check-prefix=AVX
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
; RUN: llc < %s -mtriple=x86_64 -enable-unsafe-fp-math -mattr=+avx2 \
|
|
|
|
; RUN: | FileCheck %s --check-prefix=CHECK --check-prefix=AVX2
|
2016-01-12 05:16:21 +08:00
|
|
|
; RUN: llc < %s -mtriple=x86_64 -enable-unsafe-fp-math -mattr=+avx512f \
|
|
|
|
; RUN: | FileCheck %s --check-prefix=CHECK --check-prefix=AVX512F
|
|
|
|
; RUN: llc < %s -mtriple=x86_64 -enable-unsafe-fp-math -mattr=+avx512vl \
|
|
|
|
; RUN: | FileCheck %s --check-prefix=CHECK --check-prefix=AVX512VL
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
|
2020-01-03 12:51:13 +08:00
|
|
|
; Check that the constant used in the vectors are the right ones.
|
2016-12-01 23:41:40 +08:00
|
|
|
; SSE2: [[MASKCSTADDR:.LCPI[0-9_]+]]:
|
|
|
|
; SSE2-NEXT: .long 65535 # 0xffff
|
|
|
|
; SSE2-NEXT: .long 65535 # 0xffff
|
|
|
|
; SSE2-NEXT: .long 65535 # 0xffff
|
|
|
|
; SSE2-NEXT: .long 65535 # 0xffff
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
|
2020-01-03 12:51:13 +08:00
|
|
|
; CST: [[LOWCSTADDR:.LCPI[0-9_]+]]:
|
|
|
|
; CST-NEXT: .long 1258291200 # 0x4b000000
|
|
|
|
; CST-NEXT: .long 1258291200 # 0x4b000000
|
|
|
|
; CST-NEXT: .long 1258291200 # 0x4b000000
|
|
|
|
; CST-NEXT: .long 1258291200 # 0x4b000000
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
|
2020-01-03 12:51:13 +08:00
|
|
|
; CST: [[HIGHCSTADDR:.LCPI[0-9_]+]]:
|
|
|
|
; CST-NEXT: .long 1392508928 # 0x53000000
|
|
|
|
; CST-NEXT: .long 1392508928 # 0x53000000
|
|
|
|
; CST-NEXT: .long 1392508928 # 0x53000000
|
|
|
|
; CST-NEXT: .long 1392508928 # 0x53000000
|
|
|
|
|
|
|
|
; CST: [[MAGICCSTADDR:.LCPI[0-9_]+]]:
|
2020-02-07 00:12:10 +08:00
|
|
|
; CST-NEXT: .long 0x53000080 # float 5.49764202E+11
|
|
|
|
; CST-NEXT: .long 0x53000080 # float 5.49764202E+11
|
|
|
|
; CST-NEXT: .long 0x53000080 # float 5.49764202E+11
|
|
|
|
; CST-NEXT: .long 0x53000080 # float 5.49764202E+11
|
2020-01-03 12:51:13 +08:00
|
|
|
|
|
|
|
; AVX2: [[LOWCSTADDR:.LCPI[0-9_]+]]:
|
|
|
|
; AVX2-NEXT: .long 1258291200 # 0x4b000000
|
|
|
|
|
|
|
|
; AVX2: [[HIGHCSTADDR:.LCPI[0-9_]+]]:
|
|
|
|
; AVX2-NEXT: .long 1392508928 # 0x53000000
|
|
|
|
|
|
|
|
; AVX2: [[MAGICCSTADDR:.LCPI[0-9_]+]]:
|
2020-02-07 00:12:10 +08:00
|
|
|
; AVX2-NEXT: .long 0x53000080 # float 5.49764202E+11
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
|
|
|
|
define <4 x float> @test_uitofp_v4i32_to_v4f32(<4 x i32> %arg) {
|
2016-12-01 23:41:40 +08:00
|
|
|
; SSE2-LABEL: test_uitofp_v4i32_to_v4f32:
|
2020-01-03 12:51:13 +08:00
|
|
|
; SSE2: movdqa [[MASKCSTADDR]](%rip), [[MASK:%xmm[0-9]+]]
|
|
|
|
; SSE2-NEXT: pand %xmm0, [[MASK]]
|
|
|
|
; After this instruction, MASK will have the value of the low parts
|
|
|
|
; of the vector.
|
|
|
|
; SSE2-NEXT: por [[LOWCSTADDR]](%rip), [[MASK]]
|
|
|
|
; SSE2-NEXT: psrld $16, %xmm0
|
|
|
|
; SSE2-NEXT: por [[HIGHCSTADDR]](%rip), %xmm0
|
|
|
|
; SSE2-NEXT: subps [[MAGICCSTADDR]](%rip), %xmm0
|
|
|
|
; SSE2-NEXT: addps [[MASK]], %xmm0
|
|
|
|
; SSE2-NEXT: retq
|
2016-12-01 23:41:40 +08:00
|
|
|
;
|
2020-01-03 12:51:13 +08:00
|
|
|
; Currently we commute the arguments of the first blend, but this could be
|
|
|
|
; improved to match the lowering of the second blend.
|
2016-12-01 23:41:40 +08:00
|
|
|
; SSE41-LABEL: test_uitofp_v4i32_to_v4f32:
|
2020-01-03 12:51:13 +08:00
|
|
|
; SSE41: movdqa [[LOWCSTADDR]](%rip), [[LOWVEC:%xmm[0-9]+]]
|
|
|
|
; SSE41-NEXT: pblendw $85, %xmm0, [[LOWVEC]]
|
|
|
|
; SSE41-NEXT: psrld $16, %xmm0
|
|
|
|
; SSE41-NEXT: pblendw $170, [[HIGHCSTADDR]](%rip), %xmm0
|
|
|
|
; SSE41-NEXT: subps [[MAGICCSTADDR]](%rip), %xmm0
|
|
|
|
; SSE41-NEXT: addps [[LOWVEC]], %xmm0
|
|
|
|
; SSE41-NEXT: retq
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
;
|
|
|
|
; AVX-LABEL: test_uitofp_v4i32_to_v4f32:
|
2020-01-03 12:51:13 +08:00
|
|
|
; AVX: vpblendw $170, [[LOWCSTADDR]](%rip), %xmm0, [[LOWVEC:%xmm[0-9]+]]
|
|
|
|
; AVX-NEXT: vpsrld $16, %xmm0, [[SHIFTVEC:%xmm[0-9]+]]
|
|
|
|
; AVX-NEXT: vpblendw $170, [[HIGHCSTADDR]](%rip), [[SHIFTVEC]], [[HIGHVEC:%xmm[0-9]+]]
|
|
|
|
; AVX-NEXT: vsubps [[MAGICCSTADDR]](%rip), [[HIGHVEC]], [[TMP:%xmm[0-9]+]]
|
|
|
|
; AVX-NEXT: vaddps [[TMP]], [[LOWVEC]], %xmm0
|
|
|
|
; AVX-NEXT: retq
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
;
|
2020-01-03 12:51:13 +08:00
|
|
|
; The lowering for AVX2 is a bit messy, because we select broadcast
|
|
|
|
; instructions, instead of folding the constant loads.
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
; AVX2-LABEL: test_uitofp_v4i32_to_v4f32:
|
2020-01-03 12:51:13 +08:00
|
|
|
; AVX2: vpbroadcastd [[LOWCSTADDR]](%rip), [[LOWCST:%xmm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vpblendw $170, [[LOWCST]], %xmm0, [[LOWVEC:%xmm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vpsrld $16, %xmm0, [[SHIFTVEC:%xmm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vpbroadcastd [[HIGHCSTADDR]](%rip), [[HIGHCST:%xmm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vpblendw $170, [[HIGHCST]], [[SHIFTVEC]], [[HIGHVEC:%xmm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vbroadcastss [[MAGICCSTADDR]](%rip), [[MAGICCST:%xmm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vsubps [[MAGICCST]], [[HIGHVEC]], [[TMP:%xmm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vaddps [[TMP]], [[LOWVEC]], %xmm0
|
|
|
|
; AVX2-NEXT: retq
|
2016-01-12 05:16:21 +08:00
|
|
|
;
|
|
|
|
; AVX512F-LABEL: test_uitofp_v4i32_to_v4f32:
|
2017-12-05 01:18:51 +08:00
|
|
|
; AVX512F: # %bb.0:
|
2020-01-03 12:51:13 +08:00
|
|
|
; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
|
2016-01-12 05:16:21 +08:00
|
|
|
; AVX512F-NEXT: vcvtudq2ps %zmm0, %zmm0
|
2020-01-03 12:51:13 +08:00
|
|
|
; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
|
2017-03-03 17:03:24 +08:00
|
|
|
; AVX512F-NEXT: vzeroupper
|
2016-01-12 05:16:21 +08:00
|
|
|
; AVX512F-NEXT: retq
|
|
|
|
;
|
|
|
|
; AVX512VL-LABEL: test_uitofp_v4i32_to_v4f32:
|
2017-12-05 01:18:51 +08:00
|
|
|
; AVX512VL: # %bb.0:
|
2016-01-12 05:16:21 +08:00
|
|
|
; AVX512VL-NEXT: vcvtudq2ps %xmm0, %xmm0
|
|
|
|
; AVX512VL-NEXT: retq
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
%tmp = uitofp <4 x i32> %arg to <4 x float>
|
|
|
|
ret <4 x float> %tmp
|
|
|
|
}
|
|
|
|
|
2020-01-03 12:51:13 +08:00
|
|
|
; Match the AVX2 constants used in the next function
|
|
|
|
; AVX2: [[LOWCSTADDR:.LCPI[0-9_]+]]:
|
|
|
|
; AVX2-NEXT: .long 1258291200 # 0x4b000000
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
|
2020-01-03 12:51:13 +08:00
|
|
|
; AVX2: [[HIGHCSTADDR:.LCPI[0-9_]+]]:
|
|
|
|
; AVX2-NEXT: .long 1392508928 # 0x53000000
|
2016-08-09 11:06:26 +08:00
|
|
|
|
2020-01-03 12:51:13 +08:00
|
|
|
; AVX2: [[MAGICCSTADDR:.LCPI[0-9_]+]]:
|
2020-02-07 00:12:10 +08:00
|
|
|
; AVX2-NEXT: .long 0x53000080 # float 5.49764202E+11
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
|
|
|
|
define <8 x float> @test_uitofp_v8i32_to_v8f32(<8 x i32> %arg) {
|
2020-01-03 12:51:13 +08:00
|
|
|
; Legalization will break the thing is 2 x <4 x i32> on anthing prior AVX.
|
|
|
|
; The constant used for in the vector instruction are shared between the
|
|
|
|
; two sequences of instructions.
|
|
|
|
;
|
2016-12-01 23:41:40 +08:00
|
|
|
; SSE2-LABEL: test_uitofp_v8i32_to_v8f32:
|
2020-01-03 12:51:13 +08:00
|
|
|
; SSE2: movdqa {{.*#+}} [[MASK:xmm[0-9]+]] = [65535,65535,65535,65535]
|
|
|
|
; SSE2-NEXT: movdqa %xmm0, [[VECLOW:%xmm[0-9]+]]
|
|
|
|
; SSE2-NEXT: pand %[[MASK]], [[VECLOW]]
|
|
|
|
; SSE2-NEXT: movdqa {{.*#+}} [[LOWCST:xmm[0-9]+]] = [1258291200,1258291200,1258291200,1258291200]
|
|
|
|
; SSE2-NEXT: por %[[LOWCST]], [[VECLOW]]
|
|
|
|
; SSE2-NEXT: psrld $16, %xmm0
|
|
|
|
; SSE2-NEXT: movdqa {{.*#+}} [[HIGHCST:xmm[0-9]+]] = [1392508928,1392508928,1392508928,1392508928]
|
|
|
|
; SSE2-NEXT: por %[[HIGHCST]], %xmm0
|
|
|
|
; SSE2-NEXT: movaps {{.*#+}} [[MAGICCST:xmm[0-9]+]] = [5.49764202E+11,5.49764202E+11,5.49764202E+11,5.49764202E+11]
|
|
|
|
; SSE2-NEXT: subps %[[MAGICCST]], %xmm0
|
|
|
|
; SSE2-NEXT: addps [[VECLOW]], %xmm0
|
|
|
|
; MASK is the low vector of the second part after this point.
|
|
|
|
; SSE2-NEXT: pand %xmm1, %[[MASK]]
|
|
|
|
; SSE2-NEXT: por %[[LOWCST]], %[[MASK]]
|
|
|
|
; SSE2-NEXT: psrld $16, %xmm1
|
|
|
|
; SSE2-NEXT: por %[[HIGHCST]], %xmm1
|
|
|
|
; SSE2-NEXT: subps %[[MAGICCST]], %xmm1
|
|
|
|
; SSE2-NEXT: addps %[[MASK]], %xmm1
|
|
|
|
; SSE2-NEXT: retq
|
2016-12-01 23:41:40 +08:00
|
|
|
;
|
|
|
|
; SSE41-LABEL: test_uitofp_v8i32_to_v8f32:
|
2020-01-03 12:51:13 +08:00
|
|
|
; SSE41: movdqa {{.*#+}} [[LOWCST:xmm[0-9]+]] = [1258291200,1258291200,1258291200,1258291200]
|
|
|
|
; SSE41-NEXT: movdqa %xmm0, [[VECLOW:%xmm[0-9]+]]
|
|
|
|
; SSE41-NEXT: pblendw $170, %[[LOWCST]], [[VECLOW]]
|
|
|
|
; SSE41-NEXT: psrld $16, %xmm0
|
|
|
|
; SSE41-NEXT: movdqa {{.*#+}} [[HIGHCST:xmm[0-9]+]] = [1392508928,1392508928,1392508928,1392508928]
|
|
|
|
; SSE41-NEXT: pblendw $170, %[[HIGHCST]], %xmm0
|
|
|
|
; SSE41-NEXT: movaps {{.*#+}} [[MAGICCST:xmm[0-9]+]] = [5.49764202E+11,5.49764202E+11,5.49764202E+11,5.49764202E+11]
|
|
|
|
; SSE41-NEXT: subps %[[MAGICCST]], %xmm0
|
|
|
|
; SSE41-NEXT: addps [[VECLOW]], %xmm0
|
|
|
|
; LOWCST is the low vector of the second part after this point.
|
|
|
|
; The operands of the blend are inverted because we reuse xmm1
|
|
|
|
; in the next shift.
|
|
|
|
; SSE41-NEXT: pblendw $85, %xmm1, %[[LOWCST]]
|
|
|
|
; SSE41-NEXT: psrld $16, %xmm1
|
|
|
|
; SSE41-NEXT: pblendw $170, %[[HIGHCST]], %xmm1
|
|
|
|
; SSE41-NEXT: subps %[[MAGICCST]], %xmm1
|
|
|
|
; SSE41-NEXT: addps %[[LOWCST]], %xmm1
|
|
|
|
; SSE41-NEXT: retq
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
;
|
2020-01-03 12:51:13 +08:00
|
|
|
; Test that we are not lowering uinttofp to scalars
|
|
|
|
; AVX-NOT: cvtsd2ss
|
|
|
|
; AVX: retq
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
;
|
|
|
|
; AVX2-LABEL: test_uitofp_v8i32_to_v8f32:
|
2020-01-03 12:51:13 +08:00
|
|
|
; AVX2: vpbroadcastd [[LOWCSTADDR]](%rip), [[LOWCST:%ymm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vpblendw $170, [[LOWCST]], %ymm0, [[LOWVEC:%ymm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vpsrld $16, %ymm0, [[SHIFTVEC:%ymm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vpbroadcastd [[HIGHCSTADDR]](%rip), [[HIGHCST:%ymm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vpblendw $170, [[HIGHCST]], [[SHIFTVEC]], [[HIGHVEC:%ymm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vbroadcastss [[MAGICCSTADDR]](%rip), [[MAGICCST:%ymm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vsubps [[MAGICCST]], [[HIGHVEC]], [[TMP:%ymm[0-9]+]]
|
|
|
|
; AVX2-NEXT: vaddps [[TMP]], [[LOWVEC]], %ymm0
|
|
|
|
; AVX2-NEXT: retq
|
2016-01-12 05:16:21 +08:00
|
|
|
;
|
|
|
|
; AVX512F-LABEL: test_uitofp_v8i32_to_v8f32:
|
2017-12-05 01:18:51 +08:00
|
|
|
; AVX512F: # %bb.0:
|
2016-07-09 08:19:07 +08:00
|
|
|
; AVX512F-NEXT: # kill
|
2016-01-12 05:16:21 +08:00
|
|
|
; AVX512F-NEXT: vcvtudq2ps %zmm0, %zmm0
|
2016-07-09 08:19:07 +08:00
|
|
|
; AVX512F-NEXT: # kill
|
2016-01-12 05:16:21 +08:00
|
|
|
; AVX512F-NEXT: retq
|
|
|
|
;
|
|
|
|
; AVX512VL-LABEL: test_uitofp_v8i32_to_v8f32:
|
2017-12-05 01:18:51 +08:00
|
|
|
; AVX512VL: # %bb.0:
|
2016-01-12 05:16:21 +08:00
|
|
|
; AVX512VL-NEXT: vcvtudq2ps %ymm0, %ymm0
|
|
|
|
; AVX512VL-NEXT: retq
|
[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
llvm-svn: 248965
2015-10-01 08:11:07 +08:00
|
|
|
%tmp = uitofp <8 x i32> %arg to <8 x float>
|
|
|
|
ret <8 x float> %tmp
|
|
|
|
}
|