Commit Graph

18 Commits

Author SHA1 Message Date
Simon Pilgrim 2279e59573 [X86][SSE] Avoid specifying unused arguments in SHUFPD lowering
As discussed on PR26491, we are missing the opportunity to make use of the smaller MOVHLPS instruction because we set both arguments of a SHUFPD when using it to lower a single input shuffle.

This patch sets the lowered argument to UNDEF if that shuffle element is undefined. This in turn makes it easier for target shuffle combining to decode UNDEF shuffle elements, allowing combines to MOVHLPS to occur.

A fix to match against MOVHPD stores was necessary as well.

This builds on the improved MOVLHPS/MOVHLPS lowering and memory folding support added in D16956

Adding similar support for SHUFPS will have to wait until have better support for target combining of binary shuffles.

Differential Revision: https://reviews.llvm.org/D23027

llvm-svn: 279430
2016-08-22 12:56:54 +00:00
Simon Pilgrim 019e102426 [X86][SSE] Fixed issue with memory folding of (v)cvtsd2ss intrinsics
Fixed typo in the intrinsic definitions of (v)cvtsd2ss with memory folding.

This was only unearthed when rL276102 started using the intrinsic again.....

llvm-svn: 276740
2016-07-26 10:41:28 +00:00
Simon Pilgrim 0ea8d275cc [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.

It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).

This patch changes both scalar and packed versions back to using x86-specific builtins.

It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.

A companion clang patch is at D22105

Differential Revision: https://reviews.llvm.org/D22106

llvm-svn: 275981
2016-07-19 15:07:43 +00:00
Michael Kuperstein 3e3652aef2 Recommit r274692 - [X86] Transform setcc + movzbl into xorl + setcc
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.
This fixes PR28146.

The original commit tried inserting an 8bit-subreg into a GR32 (not GR32_ABCD)
which was not appreciated by fast regalloc on 32-bit.

llvm-svn: 274802
2016-07-07 22:50:23 +00:00
Michael Kuperstein edb38a94f8 Revert r274692 to check whether this is what breaks windows selfhost.
llvm-svn: 274771
2016-07-07 16:55:35 +00:00
Michael Kuperstein 1ef6c59b1d [X86] Transform setcc + movzbl into xorl + setcc
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.

This fixes PR28146.

Differential Revision: http://reviews.llvm.org/D21774

llvm-svn: 274692
2016-07-06 21:56:18 +00:00
Sanjay Patel 74b40bdb53 [x86, SSE] update packed FP compare tests for direct translation from builtin to IR
The clang side of this was r272840:
http://reviews.llvm.org/rL272840

A follow-up step would be to auto-upgrade and remove these LLVM intrinsics completely.

Differential Revision: http://reviews.llvm.org/D21269

llvm-svn: 272841
2016-06-15 21:22:15 +00:00
Sanjay Patel 0b526676ab [x86] delete unnecessary function declarations
Missed this in r272806, r272807.

llvm-svn: 272834
2016-06-15 20:51:47 +00:00
Sanjay Patel a6c6f09967 [x86, SSE] remove the GCCBuiltins from the integer min/max intrinsics
This allows us to emit native IR in Clang (next commit).
Also, update the intrinsic tests to show that codegen already knows how to handle
the IR that Clang will soon produce.

llvm-svn: 272806
2016-06-15 17:17:27 +00:00
Simon Pilgrim 0afd5a4d80 [X86][SSE] Replace (V)CVTTPS2DQ and VCVTTPD2DQ truncating (round to zero) f32/f64 to i32 with generic IR (llvm)
This patch removes the llvm intrinsics (V)CVTTPS2DQ and VCVTTPD2DQ truncation (round to zero) conversions and auto-upgrades to FP_TO_SINT calls instead.

Note: I looked at updating CVTTPD2DQ as well but this still requires a lot more work to correctly lower.

Differential Revision: http://reviews.llvm.org/D20860

llvm-svn: 271510
2016-06-02 10:55:21 +00:00
Simon Pilgrim d64af65f6d [X86][SSE] Updated storeu fast-isel tests to match clang builtin tests
Since rL271214 the headers have no longer used the storeu intrinsic

llvm-svn: 271222
2016-05-30 18:42:51 +00:00
Simon Pilgrim 4ed0e07b23 [X86][SSE2] Updated _mm_store_pd1/_mm_store1_pd fast-isel tests to match D20617
llvm-svn: 271220
2016-05-30 18:18:44 +00:00
Simon Pilgrim 4d1e258097 [X86][SSE2] Use storeu intrinsics for _mm_storeu_pd/_mm_storeu_pd tests
Also fixed name of _mm_store1_pd test

llvm-svn: 270681
2016-05-25 09:42:29 +00:00
Simon Pilgrim 8a5ff3c59a [X86][SSE] Updated (V)CVTDQ2PD(Y) and (V)CVTPS2PD(Y) fast-isel codegen to match D20528
llvm-svn: 270501
2016-05-23 22:17:36 +00:00
Simon Pilgrim b1ff2dd145 [X86][SSE2] Fixed shuffle of results in _mm_cmpnge_sd/_mm_cmpngt_sd tests
llvm-svn: 270080
2016-05-19 16:49:53 +00:00
Simon Pilgrim 47825fad71 [X86][SSE2] Added _mm_move_* tests
llvm-svn: 270046
2016-05-19 11:59:57 +00:00
Simon Pilgrim 01809e0506 [X86][SSE2] Added _mm_cast* and _mm_set* tests
llvm-svn: 270041
2016-05-19 10:58:54 +00:00
Simon Pilgrim 5a0d728181 [X86][SSE2] Added fast-isel tests to sync with clang/test/CodeGen/sse2-builtins.c
llvm-svn: 269966
2016-05-18 18:00:43 +00:00