Commit Graph

133 Commits

Author SHA1 Message Date
Craig Topper eae26bf737 [X86] Add more intrinsics to match icc.
This adds
_mm_loadu_epi8, _mm256_loadu_epi8, _mm512_loadu_epi8
_mm_loadu_epi16, _mm256_loadu_epi16, _mm512_loadu_epi16
_mm_storeu_epi8, _mm256_storeu_epi8, _mm512_storeu_epi8
_mm_storeu_epi16, _mm256_storeu_epi16, _mm512_storeu_epi16

llvm-svn: 344862
2018-10-20 19:28:52 +00:00
Craig Topper 58508be3c0 [X86] Add missing intrinsics to match icc.
This adds
_mm_and_epi32, _mm_and_epi64
_mm_andnot_epi32, _mm_andnot_epi64
_mm_or_epi32, _mm_or_epi64
_mm_xor_epi32, _mm_xor_epi64
_mm256_and_epi32, _mm256_and_epi64
_mm256_andnot_epi32, _mm256_andnot_epi64
_mm256_or_epi32, _mm256_or_epi64
_mm256_xor_epi32, _mm256_xor_epi64
_mm_loadu_epi32, _mm_loadu_epi64
_mm_load_epi32, _mm_load_epi64
_mm256_loadu_epi32, _mm256_loadu_epi64
_mm256_load_epi32, _mm256_load_epi64
_mm512_loadu_epi32, _mm512_loadu_epi64
_mm512_load_epi32, _mm512_load_epi64
_mm_storeu_epi32, _mm_storeu_epi64
_mm_store_epi32, _mm_load_epi64
_mm256_storeu_epi32, _mm256_storeu_epi64
_mm256_store_epi32, _mm256_load_epi64
_mm512_storeu_epi32, _mm512_storeu_epi64
_mm512_store_epi32,V _mm512_load_epi64

llvm-svn: 344861
2018-10-20 19:28:50 +00:00
Mikhail Dvoretckii d1bf9ef0c7 [X86] Lowering integer truncation intrinsics to native IR
This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to
native IR in cases where the result's length is less than 128 bits.

The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for
128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps)
instructions are generated instead

Differential Revision: https://reviews.llvm.org/D48712

llvm-svn: 336643
2018-07-10 08:22:44 +00:00
Craig Topper 284c5f342c [X86] Use shufflevector instead of a select with a constant mask for fmaddsub/fmsubadd IR emission.
Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles.

While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle.

llvm-svn: 336388
2018-07-05 20:38:31 +00:00
Gabor Buella 9679eb6527 [X86] Fix some vector cmp builtins - TRUE/FALSE predicates
This patch removes on optimization used with the TRUE/FALSE
predicates, as was suggested in https://reviews.llvm.org/D45616
for r335339.
The optimization was buggy, since r335339 used it also
for *_mask builtins, without actually applying the mask -- the
mask argument was just ignored.

Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D48715

llvm-svn: 336355
2018-07-05 14:26:56 +00:00
Gabor Buella 38a7ce76ae [X86] NFC - add more test cases for vector cmp intrinsics
Add test cases with each predicate using the following
intrinsics:
_mm_cmp_pd
_mm_cmp_ps
_mm256_cmp_pd
_mm256_cmp_ps
_mm_cmp_pd_mask
_mm_cmp_ps_mask
_mm256_cmp_pd_mask
_mm256_cmp_ps_mask
_mm512_cmp_pd_mask
_mm512_cmp_ps_mask
_mm_mask_cmp_pd_mask
_mm_mask_cmp_ps_mask
_mm256_mask_cmp_pd_mask
_mm256_mask_cmp_ps_mask
_mm512_mask_cmp_pd_mask
_mm512_mask_cmp_ps_mask

Some of these are marked with FIXME, as there is bug in lowering
e.g. _mm512_mask_cmp_ps_mask.

llvm-svn: 336346
2018-07-05 12:57:47 +00:00
Craig Topper 0029470dde [X86] Correct the width of mask arguments in intrinsic headers and tests.
All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there.

Some of these were real bugs where we lost bits from the user input:
_mm512_mask_broadcast_f32x8
_mm512_maskz_broadcast_f32x8
_mm512_mask_broadcast_i32x8
_mm512_maskz_broadcast_i32x8
_mm256_mask_cvtusepi16_storeu_epi8

llvm-svn: 336042
2018-06-30 06:05:17 +00:00
Craig Topper 0e9de769a0 [X86] Remove masking from the avx512 rotate builtins. Use a select builtin instead.
llvm-svn: 336036
2018-06-30 01:32:14 +00:00
Gabor Buella 716863c820 [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IR
Summary:
Lowering some vector comparision builtins to fcmp IR instructions.
This ignores the signaling behaviour specified in the predicate
argument of said builtins.

Affected AVX512 builtins:

__builtin_ia32_cmpps128_mask
__builtin_ia32_cmpps256_mask
__builtin_ia32_cmpps512_mask
__builtin_ia32_cmppd128_mask
__builtin_ia32_cmppd256_mask
__builtin_ia32_cmppd512_mask

Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma

Reviewed By: craig.topper, spatel, efriedma

Differential Revision: https://reviews.llvm.org/D45616

llvm-svn: 335339
2018-06-22 11:59:16 +00:00
Tomasz Krupa f1792bb3d6 [X86] Lowering sqrt intrinsics to native IR
Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k

Reviewed By: craig.topper

Subscribers: tkrupa, cfe-commits

Differential Revision: https://reviews.llvm.org/D41168

llvm-svn: 334850
2018-06-15 18:05:59 +00:00
Craig Topper 3cce6a7ed9 [X86] Use target independent masked expandload and compressstore intrinsics to implement expandload/compressstore builtins.
Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them.

Reviewers: RKSimon, delena, spatel, GBuella

Reviewed By: RKSimon

Subscribers: cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D47693

llvm-svn: 334366
2018-06-10 17:27:05 +00:00
Craig Topper 03f4f04b91 [X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking.
llvm-svn: 334311
2018-06-08 18:00:25 +00:00
Craig Topper 03de166ccd [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking.
llvm-svn: 334265
2018-06-08 06:13:16 +00:00
Craig Topper 3428beeb2f [X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.

llvm-svn: 334261
2018-06-08 03:24:47 +00:00
Craig Topper acf5601961 [X86] Add builtins for vpermilps/pd instructions to enable target feature checking.
llvm-svn: 334256
2018-06-08 00:59:27 +00:00
Craig Topper 9392136414 [X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable target feature checking and immediate range checking.
llvm-svn: 334244
2018-06-07 23:03:08 +00:00
Gabor Buella 70d8d51073 [X86] Lowering FMA intrinsics to native IR (Clang part)
This patch replaces all packed (and scalar without rounding
mode) fused intrinsics with fmadd/fmaddsub variations.
Then fmadd/fmaddsub are lowered to native IR.

Patch by tkrupa

Reviewers: craig.topper, sroland, spatel, RKSimon

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D47444

llvm-svn: 333555
2018-05-30 15:27:49 +00:00
Craig Topper 68a272d501 [X86] Merge the 3 different flavors of masked vpermi2var/vpermt2var builtins to a single version without masking. Use select builtins with appropriate operand instead.
llvm-svn: 333387
2018-05-29 03:26:38 +00:00
Craig Topper 288bd2e5a0 [X86] Remove masking from pternlog llvm intrinsics and use a select instruction instead.
Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions.

To avoid that just generate IR directly in CGBuiltin.cpp.

Differential Revision: https://reviews.llvm.org/D47125

llvm-svn: 332891
2018-05-21 20:58:23 +00:00
Craig Topper 842171de36 [X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics.
I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions.

We already do something similar for scalar conversions.

Differential Revision: https://reviews.llvm.org/D46863

llvm-svn: 332882
2018-05-21 20:19:17 +00:00
Craig Topper 55b4067350 [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead.
Someday maybe we'll use selects for all the builtins.

llvm-svn: 332825
2018-05-20 23:34:10 +00:00
Craig Topper 9d146bbaf7 [X86] Revert part of r332266: Use __builtin_convertvector to replace some of the avx512 truncate builtins.
The masking doesn't work right in the backend for the ones that produce byte or word elements without avx512bw.

llvm-svn: 332322
2018-05-15 03:17:52 +00:00
Craig Topper 25de41cfbc [X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins.
As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend.

Differential Revision: https://reviews.llvm.org/D46742

llvm-svn: 332266
2018-05-14 17:50:40 +00:00
Craig Topper ce281a41b5 [X86] Remove '#ifdef __x86_64__' around mask_set1_epi64 intrinsics.
The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either.

llvm-svn: 330681
2018-04-24 03:36:08 +00:00
Craig Topper 304edc1e75 [X86] Emit native IR for pmuldq/pmuludq builtins.
I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts.

Differential Revision: https://reviews.llvm.org/D45421

llvm-svn: 329605
2018-04-09 19:17:54 +00:00
Craig Topper 21f66a3f6b [X86] Remove some masked cvt builtins that can be replaced with legacy sse/avx buiiltins and a select.
llvm-svn: 326039
2018-02-24 18:55:13 +00:00
Craig Topper 5dc6ca8e5b [X86] Remove __builtin_ia32_permvarsf256_mask and __builtin_ia32_permvarsi256_mask and use the avx2 unmasked versions and a select instead.
llvm-svn: 326022
2018-02-24 06:46:42 +00:00
Craig Topper a57d64e30f [X86] Change the signature of the AVX512 packed fp compare intrinsics to return vXi1 mask. Make bitcasts to scalar explicit in IR
Summary: This is the clang equivalent of r324827

Reviewers: zvi, delena, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43143

llvm-svn: 324828
2018-02-10 23:34:27 +00:00
Uriel Korach 5b2b71d909 [X86] test/testn intrinsics lowering to IR. clang side
Change Header files of the intrinsics for lowering test and testn intrinsics to IR code.
Removed test and testn builtins from clang

Differential Revision: https://reviews.llvm.org/D38737

llvm-svn: 318035
2017-11-13 12:50:52 +00:00
Jina Nahias dca979194d [x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IR
This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D38672

Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047
llvm-svn: 318025
2017-11-13 09:15:31 +00:00
Jina Nahias 123c599a0f fixing a bug in mask[z]_set1 intrinsic
Differential Revision: https://reviews.llvm.org/D38231

Change-Id: I80bbff9cbe93e4be54d8a761ef9723edf3f57c57
llvm-svn: 314102
2017-09-25 13:38:08 +00:00
Jina Nahias 3ad702a1ed Lowering Mask Set1 intrinsics to LLVM IR
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D37668

llvm-svn: 313624
2017-09-19 11:00:27 +00:00
Uriel Korach 3fba3c3b0c [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (clang)
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37693), implements the lowering of X86 ABS intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D37694

llvm-svn: 313133
2017-09-13 09:02:02 +00:00
Sanjay Patel e795daa55e [x86] these aren't the undefs you're looking for (PR32176)
x86 has undef SSE/AVX intrinsics that should represent a bogus register operand. 
This is not the same as LLVM's undef value which can take on multiple bit patterns.

There are better solutions / follow-ups to this discussed here:
https://bugs.llvm.org/show_bug.cgi?id=32176
...but this should prevent miscompiles with a one-line code change.

Differential Revision: https://reviews.llvm.org/D30834

llvm-svn: 297588
2017-03-12 19:15:10 +00:00
Craig Topper 367c86ddbe [AVX-512] Replace subvector broadcast builtins with shufflevectors and selects.
Verified that the backend codegens this equally well.

llvm-svn: 292329
2017-01-18 02:17:10 +00:00
Craig Topper 5391c98341 [AVX-512] Remove 128/256-bit masked vpermilvar builtins and replace with select and the avx unmasked builtins.
llvm-svn: 289338
2016-12-10 20:27:39 +00:00
Simon Pilgrim b243bbc87d [X86][AVX512VL] Add missing _mm256_maskz_alignr_epi64 shufflevector check
Missed in rL287733

llvm-svn: 287755
2016-11-23 11:38:52 +00:00
Craig Topper 6aefe00ccf [X86] Replace valignd/q builtins with appropriate __builtin_shufflevector.
llvm-svn: 287733
2016-11-23 01:47:12 +00:00
Simon Pilgrim 698528d83b [X86][AVX512] Replace lossless i32/u32 to f64 conversion intrinsics with generic IR
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen.

This patch removes the clang builtins and their use in the headers - a future patch will deal with removing the llvm intrinsics.

This is an extension patch to D20528 which dealt with the equivalent sse/avx cases.

Differential Revision: https://reviews.llvm.org/D26686

llvm-svn: 287088
2016-11-16 09:27:40 +00:00
Craig Topper 5e0709d60b [AVX-512] Replace masked dword and qword variable shift builtins with unmasked builtins and a select.
This is part of a set of changes to allow InstCombine in the backend to optimize variable shifts without having to know about masking.

llvm-svn: 286757
2016-11-13 07:26:34 +00:00
Craig Topper 1a44193afd [AVX-512] Convert the rest of the masked shift by immediate and by single element builtins over to the newly added unmasked builtins and a select.
This should also fix PR30691 since the new builtins are handled like the legacy builtins in the backend.

llvm-svn: 286714
2016-11-12 07:16:59 +00:00
Craig Topper 08bf53ffda [AVX-512] Remove masked vector insert builtins and replace with native shufflevectors and selects.
Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit.

llvm-svn: 285667
2016-11-01 05:47:56 +00:00
Craig Topper 350729627a [AVX-512] Use selectd instead of selectps for _mm256_mask_extracti32x4_epi32.
llvm-svn: 285545
2016-10-31 05:49:11 +00:00
Craig Topper 93ffabd28d [AVX-512] Remove masked vector extract builtins and replace with native shufflevectors and selects.
Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit.

llvm-svn: 285540
2016-10-31 04:30:56 +00:00
Craig Topper 66b2fd1209 [AVX-512] Remove many of the masked 128/256-bit shift builtins and replace them with unmasked builtins and selects.
llvm-svn: 285539
2016-10-31 04:30:51 +00:00
Craig Topper 2eadf1b67e [AVX-512] Remove masked 128/256-bit sqrt builtins and replace them with unmasked builtins and a select.
llvm-svn: 285504
2016-10-29 19:02:10 +00:00
Craig Topper 09e94007be [AVX-512] Remove masked 128/256-bit pmuludq/pmuldq builtins and replace them with unmasked builtins and a select.
llvm-svn: 285503
2016-10-29 19:02:07 +00:00
Craig Topper 160ca8420d [AVX-512] Remove masked 128/256-bit floating point max/min builtins. Use unmasked builtins with select instead.
llvm-svn: 285502
2016-10-29 19:02:03 +00:00
Craig Topper 531ce28311 [AVX-512] Replace 64-bit element and 512-bit vector pmin/pmax builtins with native IR like we do for 128/256-bit, but with the addition of masking.
llvm-svn: 284956
2016-10-24 04:04:24 +00:00
Craig Topper eee7c0520c [AVX-512] Replace masked 128/256-bit byte, word, and dword min/max builtins with selects and the older unmasked builtins.
llvm-svn: 284954
2016-10-23 23:57:30 +00:00