Commit Graph

191 Commits

Author SHA1 Message Date
Craig Topper e95bde33df [X86] Add support for _mm512_mullox_epi64 and _mm512_mask_mullox_epi64 intrinsics to match icc.
On AVX512F targets we'll produce an emulated sequence using 3 pmuludqs with shifts and adds. On AVX512DQ we'll use vpmulld.

Fixes PR37140.

llvm-svn: 330923
2018-04-26 05:38:39 +00:00
Craig Topper ce281a41b5 [X86] Remove '#ifdef __x86_64__' around mask_set1_epi64 intrinsics.
The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either.

llvm-svn: 330681
2018-04-24 03:36:08 +00:00
Craig Topper f517f1a516 [X86] Implement old kunpck intrinsics using vector ops on vXi1 instead of integer shift/and/or
Summary:
kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation.

I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations.

Reviewers: RKSimon, spatel, zvi, jina.nahias

Reviewed By: RKSimon

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D42016

llvm-svn: 322461
2018-01-14 19:23:50 +00:00
Jina Nahias eb0829155f [x86][AVX512] Lowering kunpack intrinsics to LLVM IR
This patch, together with a matching llvm patch (https://reviews.llvm.org/D39720), implements the lowering of X86 kunpack intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D39719

Change-Id: Id5d3cb394ad33b98be79a6783d1d15569e2b798d
llvm-svn: 319777
2017-12-05 15:42:47 +00:00
Uriel Korach 5b2b71d909 [X86] test/testn intrinsics lowering to IR. clang side
Change Header files of the intrinsics for lowering test and testn intrinsics to IR code.
Removed test and testn builtins from clang

Differential Revision: https://reviews.llvm.org/D38737

llvm-svn: 318035
2017-11-13 12:50:52 +00:00
Jina Nahias dca979194d [x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IR
This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D38672

Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047
llvm-svn: 318025
2017-11-13 09:15:31 +00:00
Craig Topper 57f96ac6dc [X86] Replace the mask cmpeq/cmple/cmplt/cmpgt/cmpge/cmpneq intrinsics with macros that just pass the right comparison predicate value to the regular cmp intrinsic. Remove mask cmpeq/cmpgt builtins that are now unused.
This shortens the intrinsic headers a little and allows us to get rid of the cmpeq and cmpgt handling from CGBuiltin.cpp.

llvm-svn: 317506
2017-11-06 21:00:49 +00:00
Jina Nahias 3ad702a1ed Lowering Mask Set1 intrinsics to LLVM IR
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D37668

llvm-svn: 313624
2017-09-19 11:00:27 +00:00
Craig Topper 04370d3a82 [X86] Disable _mm512_maskz_set1_epi64 intrinsic on 32-bit targets to prevent a backend isel failure.
The __builtin_ia32_pbroadcastq512_mem_mask we were previously trying to use in 32-bit mode is not implemented in the x86 backend and causes isel to fail in release builds. In debug builds it fails even earlier during legalization with an llvm_unreachable.

While there add the missing test case for this intrinsic for this for 64-bit mode.

This fixes PR34631. D37668 should be able to recover this for 32-bit mode soon. But I wanted to fix the crash ahead of that.

llvm-svn: 313392
2017-09-15 20:27:59 +00:00
Simon Pilgrim 1ba2bf2162 [X86][AVX512] _mm512_stream_load_si512 should take a void const* argument (PR33977)
Based off the Intel Intrinsics guide, we should expect a void const* argument.

Prevents 'passing 'const void *' to parameter of type 'void *' discards qualifiers' warnings.

Differential Revision: https://reviews.llvm.org/D37449

llvm-svn: 312523
2017-09-05 10:06:41 +00:00
Simon Pilgrim c14865c0c5 [X86][AVX] Ensure vector non-temporal load/store intrinsics force pointer alignment (PR33830)
Clang specifies a max type alignment of 16 bytes on darwin targets (annoyingly in the driver not via cc1), meaning that the builtin nontemporal stores don't correctly align the loads/stores to 32 or 64 bytes when required, resulting in lowering to temporal unaligned loads/stores.

This patch casts the vectors to explicitly aligned types prior to the load/store to ensure that the require alignment is respected.

Differential Revision: https://reviews.llvm.org/D35996

llvm-svn: 309488
2017-07-29 15:33:34 +00:00
Simon Pilgrim 0b37ffbbf9 Strip trailing whitespace. NFCI.
llvm-svn: 309383
2017-07-28 14:01:51 +00:00
Simon Pilgrim 96d02f5503 [X86][AVX] Added support for _mm256_zext* helper intrinsics (PR32839)
llvm-svn: 301749
2017-04-29 17:17:06 +00:00
Simon Pilgrim 9f6e79c5e4 [X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (clang)
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.

LLVM companion patch: D31767.

Differential Revision: https://reviews.llvm.org/D31766

llvm-svn: 300326
2017-04-14 15:05:57 +00:00
Craig Topper bf82498301 [AVX-512] Fix a couple more intrinsic macros I missed in r299346.
llvm-svn: 299347
2017-04-03 03:51:57 +00:00
Craig Topper ac9959eb53 [AVX-512] Fix some intrinsic macros that use the wrong macro parameter names and don't have parentheses around them.
Thanks to Matthew Barr for reporting this issue.

llvm-svn: 299346
2017-04-03 03:41:29 +00:00
Simon Pilgrim 60e924985c [X86][AVX512] Add _mm512_cvtsd_f64 and _mm512_cvtss_f32 intrinsics (PR32305)
Differential Revision: https://reviews.llvm.org/D31155

llvm-svn: 298364
2017-03-21 12:46:13 +00:00
Igor Breger f050b797ac [X86][AVX512][Clang][Intrinsics] Adding missing intrinsics to Clang .
Summary:
Adding missing intrinsics :
    _mm512_set_epi16,
    _mm512_set_epi8,
    _mm512_permutevar_epi32
    _mm512_mask_permutevar_epi32

Reviewers: zvi, guyblank, eladcohen, craig.topper

Reviewed By: craig.topper

Subscribers: craig.topper, cfe-commits

Differential Revision: https://reviews.llvm.org/D31034

llvm-svn: 298208
2017-03-19 08:27:16 +00:00
Craig Topper 6afc436a78 [AVX-512] Change the input type for some load intrinsics to take void type like the spec (and the test cases say).
llvm-svn: 298042
2017-03-17 05:59:25 +00:00
Craig Topper 2e5058c403 [AVX-512] Add missing typecasts and parentheses to _mm512_mask_i64gather_ps. My macro cleanup script I used on the others last year must have missed it.
llvm-svn: 298040
2017-03-17 05:14:37 +00:00
Craig Topper 367c86ddbe [AVX-512] Replace subvector broadcast builtins with shufflevectors and selects.
Verified that the backend codegens this equally well.

llvm-svn: 292329
2017-01-18 02:17:10 +00:00
Craig Topper 70536f4e47 [AVX-512] Replace masked 512-bit pmuldq and pmuludq builtins with the newly added unmasked versions and selects.
llvm-svn: 290580
2016-12-27 04:04:57 +00:00
Craig Topper 32866ab800 Revert r290574 "foo"
This was supposed to be merged with another commit with a real commit message. Sorry.

llvm-svn: 290579
2016-12-27 04:03:29 +00:00
Craig Topper c5ab78d4c3 Revert r290575 "[AVX-512] Replace masked 512-bit pmuldq and pmuludq builtins with the newly added unmasked versions and selects."
I failed to merge this with r290574.

llvm-svn: 290578
2016-12-27 04:03:25 +00:00
Craig Topper 6ad5bcc8ac [AVX-512] Replace masked 512-bit pmuldq and pmuludq builtins with the newly added unmasked versions and selects.
llvm-svn: 290575
2016-12-27 03:46:16 +00:00
Craig Topper 39b9e32493 foo
llvm-svn: 290574
2016-12-27 03:46:13 +00:00
Craig Topper 678b07fe3c [AVX-512] Remove masking from 512-bit vpermil builtins. The backend now has versions without masking so wrap it with select.
This will allow the backend to constant fold these to generic shuffle vectors like 128-bit and 256-bit without having to working about handling masking.

llvm-svn: 289351
2016-12-11 01:26:52 +00:00
Craig Topper 6aefe00ccf [X86] Replace valignd/q builtins with appropriate __builtin_shufflevector.
llvm-svn: 287733
2016-11-23 01:47:12 +00:00
Simon Pilgrim 698528d83b [X86][AVX512] Replace lossless i32/u32 to f64 conversion intrinsics with generic IR
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen.

This patch removes the clang builtins and their use in the headers - a future patch will deal with removing the llvm intrinsics.

This is an extension patch to D20528 which dealt with the equivalent sse/avx cases.

Differential Revision: https://reviews.llvm.org/D26686

llvm-svn: 287088
2016-11-16 09:27:40 +00:00
Craig Topper 5e0709d60b [AVX-512] Replace masked dword and qword variable shift builtins with unmasked builtins and a select.
This is part of a set of changes to allow InstCombine in the backend to optimize variable shifts without having to know about masking.

llvm-svn: 286757
2016-11-13 07:26:34 +00:00
Craig Topper d7e5b21914 [X86] Remove extra escaped new lines in intrinsic headers left over from an earlier conversion away from a macro. NFC
llvm-svn: 286756
2016-11-13 07:26:31 +00:00
Craig Topper 2c8f49e67b [AVX-512] Use scalar vfmsub/vfnmsub mask3 intrinsics instead of inverting the mask argument of a vfmadd intrinsic.
Summary: Inverting the mask argument does not reflect the intended semantics of the intrinsic.

Reviewers: igorb, delena

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D26019

llvm-svn: 286733
2016-11-12 23:24:34 +00:00
Craig Topper 1a44193afd [AVX-512] Convert the rest of the masked shift by immediate and by single element builtins over to the newly added unmasked builtins and a select.
This should also fix PR30691 since the new builtins are handled like the legacy builtins in the backend.

llvm-svn: 286714
2016-11-12 07:16:59 +00:00
Ayman Musa e60a41ca28 [X86][AVX512][Clang] Add support for mask_{move|store|load}_s{s/d} and int2mask/mask2int intrinsics.
Differential Revision: https://reviews.llvm.org/D26021

llvm-svn: 286229
2016-11-08 12:00:30 +00:00
Craig Topper 08bf53ffda [AVX-512] Remove masked vector insert builtins and replace with native shufflevectors and selects.
Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit.

llvm-svn: 285667
2016-11-01 05:47:56 +00:00
Craig Topper 93ffabd28d [AVX-512] Remove masked vector extract builtins and replace with native shufflevectors and selects.
Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit.

llvm-svn: 285540
2016-10-31 04:30:56 +00:00
Michael Zuckerman d343697f1e Fixing "type" issue for (epi32)
and replaceing hardcoded inf with clang builtin inf "__builtin_inff()" for float ({max|min}_{pd|ps})

llvm-svn: 285519
2016-10-30 14:54:05 +00:00
Michael Zuckerman 25eb420233 [X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (max|min) intrinsics to Clang .
After LGTM and Check-all 

Vector-reduction arithmetic accepts vectors as inputs and produces 
scalars as outputs.This class of vector operation forms the basis 
of many scientific computations. In vector-reduction arithmetic, 
the evaluation off is independent of the order of the input elements of V.

Reviewer: 1. craig.topper 
          2. igorb

Differential Revision: https://reviews.llvm.org/D25988

llvm-svn: 285493
2016-10-29 10:29:20 +00:00
Michael Zuckerman edd99eb07a 1. Fixing small types issue (PD|PS) (reduce) .
2. Cosmetic changes

llvm-svn: 285405
2016-10-28 15:16:03 +00:00
Craig Topper f202365910 [AVX-512] Fix the operand order for all calls to __builtin_ia32_vfmaddss3_mask.
Summary: The preserved input should be the first argument and the vector inputs should be in the same order as the intrinsics it is used to implement.

Reviewers: igorb, delena

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D25902

llvm-svn: 285175
2016-10-26 05:35:38 +00:00
Michael Zuckerman facb37cabf [X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (Operators: +,*,&&,||) intrinsics to Clang
Committed after LGTM and check-all

 
Vector-reduction arithmetic accepts vectors as inputs and produces scalars as outputs.
This class of vector operation forms the basis of many scientific computations.
In vector-reduction arithmetic, the evaluation off is independent of the order of the input elements of V.
 
Used bisection method. At each step, we partition the vector with previous
step in half, and the operation is performed on its two halves.
This takes log2(n) steps where n is the number of elements in the vector.
 
Reviwer: 1. igorb
         2. craig.topper    
Differential Revision: https://reviews.llvm.org/D25527

llvm-svn: 285054
2016-10-25 07:56:04 +00:00
Michael Zuckerman 33bd5b235b revert r284963
because new test file is failing in some OS.
test/CodeGen/avx512-reduceIntrin.c

llvm-svn: 284967
2016-10-24 11:30:23 +00:00
Michael Zuckerman 98cb041891 [X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (Operators: +,*,&&,||) intrinsics to Clang
Committed after LGTM and check-all

Vector-reduction arithmetic accepts vectors as inputs and produces scalars as outputs.
This class of vector operation forms the basis of many scientific computations.
In vector-reduction arithmetic, the evaluation off is independent of the order of the input elements of V.

Used bisection method. At each step, we partition the vector with previous
step in half, and the operation is performed on its two halves.
This takes log2(n) steps where n is the number of elements in the vector.

Differential Revision: https://reviews.llvm.org/D25527

llvm-svn: 284963
2016-10-24 10:53:20 +00:00
Craig Topper 0c5da26572 [AVX-512] Replace 512-bit pmovzx/sx builtins with native IR.
llvm-svn: 284936
2016-10-23 07:35:47 +00:00
Michael Zuckerman 9e43ccfe68 [Clang][AVX512][BuiltIn]Adding missing intrinsics move_{sd|ss} to clang
Differential Revision: http://reviews.llvm.org/D21021

llvm-svn: 283314
2016-10-05 12:56:06 +00:00
Craig Topper c4a8228bcc [AVX-512] Use native IR for masked 512-bit add/sub/mul/div ps/pd intrinsics when rounding mode isn't used.
llvm-svn: 283073
2016-10-02 17:43:00 +00:00
Ayman Musa 17a2819b05 Update to commit r282488, fix the buildboot failure.
llvm-svn: 282492
2016-09-27 15:37:31 +00:00
Ayman Musa 2e250e8845 [avx512] Add aliases to some missing avx512 intrinsics.
Differential Revision:https: //reviews.llvm.org/D24961

llvm-svn: 282488
2016-09-27 14:06:32 +00:00
Craig Topper f43e4a1728 [AVX-512] Remove masked integer mullo builtins and replace with native IR.
llvm-svn: 280597
2016-09-03 19:19:49 +00:00
Craig Topper 0e18976b8d [AVX-512] Remove masked integer add/sub builtins and replace with native IR.
llvm-svn: 280596
2016-09-03 18:29:35 +00:00