Commit Graph

88 Commits

Author SHA1 Message Date
Craig Topper 8e364c680f [X86] Restore the pavg intrinsics.
The pattern we replaced these with may be too hard to match as demonstrated by
PR41496 and PR41316.

This patch restores the intrinsics and then we can start focusing
on the optimizing the intrinsics.

I've mostly reverted the original patch that removed them. Though I modified
the avx512 intrinsics to not have masking built in.

Differential Revision: https://reviews.llvm.org/D60674

llvm-svn: 358427
2019-04-15 17:17:35 +00:00
Simon Pilgrim b12738d932 [X86] Add shift-by-immediate tests for non-immediate/out-of-range values
As noted on PR40203, for gcc compatibility we need to support non-immediate values in the 'slli/srli/srai' shift by immediate vector intrinsics.

llvm-svn: 350619
2019-01-08 12:59:15 +00:00
Simon Pilgrim 313dc85ce0 [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (clang)
This emits SADD_SAT/SSUB_SAT generic intrinsics for the SSE signed saturated math intrinsics.

LLVM counterpart: https://reviews.llvm.org/D55894

Differential Revision: https://reviews.llvm.org/D55890

llvm-svn: 349743
2018-12-20 11:53:45 +00:00
Simon Pilgrim a7b30b4a58 [X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT generic intrinsics (clang)
Sibling patch to D55855, this emits UADD_SAT/USUB_SAT generic intrinsics for the SSE saturated math intrinsics instead of expanding to a IR code sequence that could be difficult to reassemble.

Differential Revision: https://reviews.llvm.org/D55879

llvm-svn: 349631
2018-12-19 14:43:47 +00:00
Craig Topper eae26bf737 [X86] Add more intrinsics to match icc.
This adds
_mm_loadu_epi8, _mm256_loadu_epi8, _mm512_loadu_epi8
_mm_loadu_epi16, _mm256_loadu_epi16, _mm512_loadu_epi16
_mm_storeu_epi8, _mm256_storeu_epi8, _mm512_storeu_epi8
_mm_storeu_epi16, _mm256_storeu_epi16, _mm512_storeu_epi16

llvm-svn: 344862
2018-10-20 19:28:52 +00:00
Craig Topper d88f76a891 [X86] Add ktest intrinsics to match gcc and icc.
These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc.

Includes these intrinsics:
_ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8
_ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8
_ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8
_ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8

llvm-svn: 341265
2018-08-31 22:29:56 +00:00
Craig Topper 42a4d0822e [X86] Add k-mask conversion and load/store instrinsics to match gcc and icc.
This adds:
_cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64
_cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64
_load_mask8, _load_mask16, _load_mask32, _load_mask64
_store_mask8, _store_mask16, _store_mask32, _store_mask64

These are currently missing from the Intel Intrinsics Guide webpage.

llvm-svn: 341251
2018-08-31 20:41:06 +00:00
Craig Topper 2aa8efc820 [X86] Add kshift intrinsics to match gcc and icc.
This adds the following intrinsics:
_kshiftli_mask8
_kshiftli_mask16
_kshiftli_mask32
_kshiftli_mask64
_kshiftri_mask8
_kshiftri_mask16
_kshiftri_mask32
_kshiftri_mask64

llvm-svn: 341234
2018-08-31 18:22:52 +00:00
Craig Topper a65bf65e0b [X86] Add kadd intrinsics to match gcc and icc.
This adds the following intrinsics:
_kadd_mask64
_kadd_mask32
_kadd_mask16
_kadd_mask8

These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc.

llvm-svn: 340879
2018-08-28 22:32:14 +00:00
Craig Topper cb5fd56c7f [X86] Add kortest intrinsics for 8, 32, and 64 bit masks. Add new intrinsic names for 16 bit masks.
This matches gcc and icc despite not being documented in the Intel Intrinsics Guide.

llvm-svn: 340798
2018-08-28 06:28:25 +00:00
Craig Topper c330ca8611 [X86] Add intrinsics for kand/kandn/knot/kor/kxnor/kxor with 8, 32, and 64-bit mask registers.
This also adds a second intrinsic name for the 16-bit mask versions.

These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed.

llvm-svn: 340719
2018-08-27 06:20:22 +00:00
Craig Topper 0609d1e211 [X86] Remove masking from the 512-bit padds and psubs builtins. Use select builtin instead.
llvm-svn: 339843
2018-08-16 06:20:29 +00:00
Tomasz Krupa e8cf972d86 [X86] Lowering addus/subus intrinsics to native IR
Summary: This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations.

Reviewers: craig.topper, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D46892

llvm-svn: 339651
2018-08-14 08:01:38 +00:00
Craig Topper 91bbe98757 [X86] Remove masking from dbpsadbw builtins, use select builtin instead.
llvm-svn: 334385
2018-06-11 06:18:29 +00:00
Craig Topper 03de166ccd [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking.
llvm-svn: 334265
2018-06-08 06:13:16 +00:00
Craig Topper d3623155a2 [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar intrinsics.
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.

This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.

llvm-svn: 334208
2018-06-07 17:28:03 +00:00
Craig Topper 95ed88a1a9 [X86] Avoid passing _mm_undefined* to builtin_shufflevector if we are able to pass the first input a second time.
This is more consistent with other usages of builtin_shufflevector. Later optimization passes or codegen will detect the duplicate vector and replace it with undef. Using _mm_undefined just puts a zeroinitializer that still needs to be optimized out later.

llvm-svn: 333944
2018-06-04 19:28:09 +00:00
Craig Topper dff5b311af [X86] Reduce the number of setzero intrinsics to just the set defined by the Intel Intrinsics Guide.
We had quite a few for different element sizes of integers sometimes with strange target features attached to them.

We only need a single version for each of _m128i, _m256i, and _m512i with the target feature that first introduced those types.

llvm-svn: 333568
2018-05-30 18:02:11 +00:00
Craig Topper 68a272d501 [X86] Merge the 3 different flavors of masked vpermi2var/vpermt2var builtins to a single version without masking. Use select builtins with appropriate operand instead.
llvm-svn: 333387
2018-05-29 03:26:38 +00:00
Craig Topper 55b4067350 [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead.
Someday maybe we'll use selects for all the builtins.

llvm-svn: 332825
2018-05-20 23:34:10 +00:00
Craig Topper 25de41cfbc [X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins.
As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend.

Differential Revision: https://reviews.llvm.org/D46742

llvm-svn: 332266
2018-05-14 17:50:40 +00:00
Chandler Carruth 16429acacb [x86] Revert r330322 (& r330323): Lowering x86 adds/addus/subs/subus intrinsics
The LLVM commit introduces a crash in LLVM's instruction selection.

I filed http://llvm.org/PR37260 with the test case.

llvm-svn: 330997
2018-04-26 21:46:01 +00:00
Alexander Ivchenko d96ddccdb4 Lowering x86 adds/addus/subs/subus intrinsics (clang)
This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations.

Patch by tkrupa

Differential Revision: https://reviews.llvm.org/D44786

llvm-svn: 330323
2018-04-19 12:15:11 +00:00
Craig Topper 2575454fe9 [X86] Replace 512-bit masked pmaddubsw and pmaddwd intrinsic with unmasked intrinsic and a select.
This makes it consistent with the 128/256-bit functions.

Someday maybe we'll have all the masking moved to selects.

llvm-svn: 329775
2018-04-11 04:55:10 +00:00
Craig Topper 0a70c3c7af [X86] Remove mask from 512 bit pmulhrsw/pmulhw/pmulhuw builtins.
We now use a vselect node in IR around an unmasked builtin. This makes it consistent with the 128 and 256 bit versions.

llvm-svn: 325560
2018-02-20 07:28:18 +00:00
Craig Topper ebb0838f74 [X86] Reverse the operand order of the implementation of the kunpack builtins.
The second operand needs to be in the lower bits of the concatenation. This matches llvm 5.0, gcc, and icc behavior.

Fixes PR36360.

llvm-svn: 324954
2018-02-12 22:38:52 +00:00
Craig Topper f517f1a516 [X86] Implement old kunpck intrinsics using vector ops on vXi1 instead of integer shift/and/or
Summary:
kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation.

I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations.

Reviewers: RKSimon, spatel, zvi, jina.nahias

Reviewed By: RKSimon

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D42016

llvm-svn: 322461
2018-01-14 19:23:50 +00:00
Craig Topper de91dff5d4 [X86] Replace cvt*2mask intrinsics with native IR using 'icmp slt X, zeroinitializer.
llvm-svn: 322038
2018-01-08 22:37:56 +00:00
Jina Nahias eb0829155f [x86][AVX512] Lowering kunpack intrinsics to LLVM IR
This patch, together with a matching llvm patch (https://reviews.llvm.org/D39720), implements the lowering of X86 kunpack intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D39719

Change-Id: Id5d3cb394ad33b98be79a6783d1d15569e2b798d
llvm-svn: 319777
2017-12-05 15:42:47 +00:00
Uriel Korach 5b2b71d909 [X86] test/testn intrinsics lowering to IR. clang side
Change Header files of the intrinsics for lowering test and testn intrinsics to IR code.
Removed test and testn builtins from clang

Differential Revision: https://reviews.llvm.org/D38737

llvm-svn: 318035
2017-11-13 12:50:52 +00:00
Jina Nahias 3ad702a1ed Lowering Mask Set1 intrinsics to LLVM IR
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D37668

llvm-svn: 313624
2017-09-19 11:00:27 +00:00
Uriel Korach 3fba3c3b0c [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (clang)
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37693), implements the lowering of X86 ABS intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D37694

llvm-svn: 313133
2017-09-13 09:02:02 +00:00
Yael Tsafrir 23e7733230 [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IR
Differential Revision: https://reviews.llvm.org/D37562

llvm-svn: 313011
2017-09-12 07:46:32 +00:00
Michael Zuckerman 13bcf4944a Fix problem with test.
llvm-svn: 299442
2017-04-04 15:44:06 +00:00
Michael Zuckerman 755a13db3d [X86][Clang] Converting __mm{|256|512}_movm_epi{8|16|32|64} LLVMIR call into generic intrinsics.
This patch is a part two of two reviews, one for the clang and the other for LLVM. 
In this patch, I covered the clang side, by introducing the intrinsic to the front end. 
This is done by creating a generic replacement.

Differential Revision: https://reviews.llvm.org/D31394a

llvm-svn: 299431
2017-04-04 13:29:53 +00:00
Craig Topper 208c80556c [AVX-512] Fix test cases that were using the builtins directly without typecasts instead of the intrinsic header.
llvm-svn: 298041
2017-03-17 05:59:22 +00:00
Sanjay Patel e795daa55e [x86] these aren't the undefs you're looking for (PR32176)
x86 has undef SSE/AVX intrinsics that should represent a bogus register operand. 
This is not the same as LLVM's undef value which can take on multiple bit patterns.

There are better solutions / follow-ups to this discussed here:
https://bugs.llvm.org/show_bug.cgi?id=32176
...but this should prevent miscompiles with a one-line code change.

Differential Revision: https://reviews.llvm.org/D30834

llvm-svn: 297588
2017-03-12 19:15:10 +00:00
Craig Topper f0d1147fae [AVX-512] Replace 512-bit masked packss/packus builtins and replace with new unmasked builtins.
These new unmasked builtins will enable us to easily support optimizing these builtins in InstCombine in the backend.

llvm-svn: 295291
2017-02-16 06:32:07 +00:00
Craig Topper cdd3603c04 [AVX-512] Remove masking from 512-bit pshufb builtin. The backend now has a version without masking so wrap it with select.
This will allow the backend to constant fold these to generic shuffle vectors like 128-bit and 256-bit without having to working about handling masking.

llvm-svn: 289345
2016-12-10 23:09:52 +00:00
Craig Topper 37bf5c6a3f [AVX-512] Replace masked 16-bit element variable shift builtins with new unmasked versions and selects.
llvm-svn: 287313
2016-11-18 05:04:51 +00:00
Craig Topper 1a44193afd [AVX-512] Convert the rest of the masked shift by immediate and by single element builtins over to the newly added unmasked builtins and a select.
This should also fix PR30691 since the new builtins are handled like the legacy builtins in the backend.

llvm-svn: 286714
2016-11-12 07:16:59 +00:00
Craig Topper 531ce28311 [AVX-512] Replace 64-bit element and 512-bit vector pmin/pmax builtins with native IR like we do for 128/256-bit, but with the addition of masking.
llvm-svn: 284956
2016-10-24 04:04:24 +00:00
Craig Topper 0c5da26572 [AVX-512] Replace 512-bit pmovzx/sx builtins with native IR.
llvm-svn: 284936
2016-10-23 07:35:47 +00:00
Elad Cohen b107a22afb [X86] Remove the mm_malloc.h include guard hack from the X86 builtins tests
The X86 clang/test/CodeGen/*builtins.c tests define the mm_malloc.h include
guard as a hack for avoiding its inclusion (mm_malloc.h requires a hosted
environment since it expects stdlib.h to be available - which is not the case
in these internal clang codegen tests).
This patch removes this hack and instead passes -ffreestanding to clang cc1.

Differential Revision: https://reviews.llvm.org/D24825

llvm-svn: 282581
2016-09-28 11:59:09 +00:00
Craig Topper f43e4a1728 [AVX-512] Remove masked integer mullo builtins and replace with native IR.
llvm-svn: 280597
2016-09-03 19:19:49 +00:00
Craig Topper 0e18976b8d [AVX-512] Remove masked integer add/sub builtins and replace with native IR.
llvm-svn: 280596
2016-09-03 18:29:35 +00:00
Eric Christopher abb2b54ad3 After PR28761 use -Wall with -Werror in builtins tests to identify
possible problems in headers.

llvm-svn: 277696
2016-08-04 06:02:50 +00:00
Simon Pilgrim f5a8837e1b [X86][AVX512] Converted the VBROADCAST intrinsics to generic IR
llvm-svn: 274544
2016-07-05 12:59:33 +00:00
Michael Zuckerman 7dac6fbdf8 [Clang][BuiltIn][AVX512] adding _mm{|256|512}_mask_cvt{s|us|}epi16_storeu_epi8 intrinsics
Differential Revision: http://reviews.llvm.org/D21729

llvm-svn: 274532
2016-07-05 08:08:01 +00:00
Artur Pilipenko 70d4bb566c Update the expected masked load/store intrinsics names in tests
The mangling of their names was changed in order to support arbitrary addrspace pointers as arguments in rL274043.

llvm-svn: 274044
2016-06-28 18:28:45 +00:00