Commit Graph

45 Commits

Author SHA1 Message Date
Craig Topper ed78daf810 [X86] Don't use _MM_FROUND_CUR_DIRECTION in the intrinsics tests.
_MM_FROUND_CUR_DIRECTION is the behavior of the intrinsics that
don't take a rounding mode argument. So a better test
is using _MM_FROUND_NO_EXC with the SAE only intrinsics and
an explicit rounding mode with the intrinsics that support
embedded rounding mode.

llvm-svn: 364127
2019-06-22 07:21:48 +00:00
Craig Topper bd7884ed79 [X86] Custom codegen 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics.
Summary:
The 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics all have the possibility of taking an explicit rounding mode argument. If the rounding mode is CUR_DIRECTION we'd like to emit a sitofp/uitofp instruction and a select like we do for 256-bit intrinsics.

For cvt(u)qqtopd and cvt(u)dqtops we do this when the form of the software intrinsics that doesn't take a rounding mode argument is used. This is done by using convertvector in the header with the select builtin. But if the explicit rounding mode form of the intrinsic is used and CUR_DIRECTION is passed, we don't do this. We shouldn't have this inconsistency.

For cvt(u)qqtops nothing is done because we can't use the select builtin in the header without avx512vl. So we need to use custom codegen for this.

Even when the rounding mode isn't CUR_DIRECTION we should also use select in IR for consistency. And it will remove another scalar integer mask from our intrinsics.

To accomplish all of these goals I've taken a slightly unusual approach. I've added two new X86 specific intrinsics for sitofp/uitofp with rounding. These intrinsics are variadic on the input and output type so we only need 2 instead of 6. This avoids the need for a switch to map them in CGBuiltin.cpp. We just need to check signed vs unsigned. I believe other targets also use variadic intrinsics like this.

So if the rounding mode is CUR_DIRECTION we'll use an sitofp/uitofp instruction. Otherwise we'll use one of the new intrinsics. After that we'll emit a select instruction if needed.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D56998

llvm-svn: 352267
2019-01-26 02:42:01 +00:00
Craig Topper d88f76a891 [X86] Add ktest intrinsics to match gcc and icc.
These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc.

Includes these intrinsics:
_ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8
_ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8
_ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8
_ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8

llvm-svn: 341265
2018-08-31 22:29:56 +00:00
Craig Topper 42a4d0822e [X86] Add k-mask conversion and load/store instrinsics to match gcc and icc.
This adds:
_cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64
_cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64
_load_mask8, _load_mask16, _load_mask32, _load_mask64
_store_mask8, _store_mask16, _store_mask32, _store_mask64

These are currently missing from the Intel Intrinsics Guide webpage.

llvm-svn: 341251
2018-08-31 20:41:06 +00:00
Craig Topper 2aa8efc820 [X86] Add kshift intrinsics to match gcc and icc.
This adds the following intrinsics:
_kshiftli_mask8
_kshiftli_mask16
_kshiftli_mask32
_kshiftli_mask64
_kshiftri_mask8
_kshiftri_mask16
_kshiftri_mask32
_kshiftri_mask64

llvm-svn: 341234
2018-08-31 18:22:52 +00:00
Craig Topper a65bf65e0b [X86] Add kadd intrinsics to match gcc and icc.
This adds the following intrinsics:
_kadd_mask64
_kadd_mask32
_kadd_mask16
_kadd_mask8

These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc.

llvm-svn: 340879
2018-08-28 22:32:14 +00:00
Craig Topper cb5fd56c7f [X86] Add kortest intrinsics for 8, 32, and 64 bit masks. Add new intrinsic names for 16 bit masks.
This matches gcc and icc despite not being documented in the Intel Intrinsics Guide.

llvm-svn: 340798
2018-08-28 06:28:25 +00:00
Craig Topper c330ca8611 [X86] Add intrinsics for kand/kandn/knot/kor/kxnor/kxor with 8, 32, and 64-bit mask registers.
This also adds a second intrinsic name for the 16-bit mask versions.

These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed.

llvm-svn: 340719
2018-08-27 06:20:22 +00:00
Craig Topper db5c9aa366 [X86] Also fix the test for _mm512_mullo_epi64 to test the intrinsic instead of a copy of the intrinsic implementation.
This had the same issue I just fixed in r336739. Apparently I copy pasted _mm512_mullo_epi64 when I added _mm512_mullox_epi64.

llvm-svn: 336740
2018-07-10 23:28:05 +00:00
Craig Topper 5cbeeedd27 [X86] Fix various type mismatches in intrinsic headers and intrinsic tests that cause extra bitcasts to be emitted in the IR.
Found via imprecise grepping of the -O0 IR. There could still be more bugs out there.

llvm-svn: 336487
2018-07-07 17:03:32 +00:00
Craig Topper 851f363691 [X86] Rename llvm.x86.avx512.mask.fpclass.p* to exclude 'mask.' from the name to match llvm.
llvm-svn: 335745
2018-06-27 15:57:57 +00:00
Craig Topper 3428beeb2f [X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.

llvm-svn: 334261
2018-06-08 03:24:47 +00:00
Craig Topper 95ed88a1a9 [X86] Avoid passing _mm_undefined* to builtin_shufflevector if we are able to pass the first input a second time.
This is more consistent with other usages of builtin_shufflevector. Later optimization passes or codegen will detect the duplicate vector and replace it with undef. Using _mm_undefined just puts a zeroinitializer that still needs to be optimized out later.

llvm-svn: 333944
2018-06-04 19:28:09 +00:00
Craig Topper 842171de36 [X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics.
I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions.

We already do something similar for scalar conversions.

Differential Revision: https://reviews.llvm.org/D46863

llvm-svn: 332882
2018-05-21 20:19:17 +00:00
Craig Topper de91dff5d4 [X86] Replace cvt*2mask intrinsics with native IR using 'icmp slt X, zeroinitializer.
llvm-svn: 322038
2018-01-08 22:37:56 +00:00
Craig Topper 5ece4cfe1e [X86] Implement broadcastf32x2 and broadcasti32x2 intrinsics using __builtin_shufflevector instead builtins
This patch implements the broadcastf32x2/broadcasti32x2 intrinsics using __builtin_shufflevector.

Differential Revision: https://reviews.llvm.org/D37287

llvm-svn: 312135
2017-08-30 16:15:12 +00:00
Michael Zuckerman 13bcf4944a Fix problem with test.
llvm-svn: 299442
2017-04-04 15:44:06 +00:00
Michael Zuckerman 755a13db3d [X86][Clang] Converting __mm{|256|512}_movm_epi{8|16|32|64} LLVMIR call into generic intrinsics.
This patch is a part two of two reviews, one for the clang and the other for LLVM. 
In this patch, I covered the clang side, by introducing the intrinsic to the front end. 
This is done by creating a generic replacement.

Differential Revision: https://reviews.llvm.org/D31394a

llvm-svn: 299431
2017-04-04 13:29:53 +00:00
Sanjay Patel e795daa55e [x86] these aren't the undefs you're looking for (PR32176)
x86 has undef SSE/AVX intrinsics that should represent a bogus register operand. 
This is not the same as LLVM's undef value which can take on multiple bit patterns.

There are better solutions / follow-ups to this discussed here:
https://bugs.llvm.org/show_bug.cgi?id=32176
...but this should prevent miscompiles with a one-line code change.

Differential Revision: https://reviews.llvm.org/D30834

llvm-svn: 297588
2017-03-12 19:15:10 +00:00
Craig Topper 367c86ddbe [AVX-512] Replace subvector broadcast builtins with shufflevectors and selects.
Verified that the backend codegens this equally well.

llvm-svn: 292329
2017-01-18 02:17:10 +00:00
Craig Topper 08bf53ffda [AVX-512] Remove masked vector insert builtins and replace with native shufflevectors and selects.
Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit.

llvm-svn: 285667
2016-11-01 05:47:56 +00:00
Craig Topper cc012b3a37 [AVX-512] Add a regular expression to a test that was missed in r285540.
llvm-svn: 285547
2016-10-31 06:24:00 +00:00
Craig Topper 93ffabd28d [AVX-512] Remove masked vector extract builtins and replace with native shufflevectors and selects.
Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit.

llvm-svn: 285540
2016-10-31 04:30:56 +00:00
Craig Topper 4910755107 [AVX-512] Add _MM_FROUND_NO_EXC to test cases that pass a rounding mode intrinsics. This is preparation for a follow up commit that will check validity of rounding mode argument.
llvm-svn: 283053
2016-10-01 21:03:46 +00:00
Elad Cohen b107a22afb [X86] Remove the mm_malloc.h include guard hack from the X86 builtins tests
The X86 clang/test/CodeGen/*builtins.c tests define the mm_malloc.h include
guard as a hack for avoiding its inclusion (mm_malloc.h requires a hosted
environment since it expects stdlib.h to be available - which is not the case
in these internal clang codegen tests).
This patch removes this hack and instead passes -ffreestanding to clang cc1.

Differential Revision: https://reviews.llvm.org/D24825

llvm-svn: 282581
2016-09-28 11:59:09 +00:00
Craig Topper 5fbabd77c7 [X86] Fix some illegal rounding modes in some builtin test cases to ones that would properly compile to valid assembly.
llvm-svn: 282137
2016-09-22 06:13:33 +00:00
Craig Topper f43e4a1728 [AVX-512] Remove masked integer mullo builtins and replace with native IR.
llvm-svn: 280597
2016-09-03 19:19:49 +00:00
Craig Topper a815f488d5 [AVX-512] Implement masked floating point logical operations with native IR and remove the builtins.
llvm-svn: 280197
2016-08-31 05:38:58 +00:00
Lama Saba 5d01f224cf [X86][AVX512] lower __mm512_andnot_ps/__mm512_andnot_pd to IR
Differential revision: https://reviews.llvm.org/D23262
 

llvm-svn: 278209
2016-08-10 10:34:45 +00:00
Eric Christopher abb2b54ad3 After PR28761 use -Wall with -Werror in builtins tests to identify
possible problems in headers.

llvm-svn: 277696
2016-08-04 06:02:50 +00:00
Michael Zuckerman 3f316abdce [Clang][Intrinsics][AVX512][BuiltIn] adding intrinsics for vrangesd instruction set
Differential Revision: http://reviews.llvm.org/D21734

llvm-svn: 274218
2016-06-30 08:05:46 +00:00
Craig Topper 08181f795f [AVX512] Fix _mm_setzero_di to not require avx512vl since its used by the avx512dqintrin.h. Also update the avx512dq test to not enable avx512vl feature so we can ensure correct dependencies.
llvm-svn: 273388
2016-06-22 06:36:21 +00:00
Michael Zuckerman c4ae8537cf [Clang][AVX512][BUILTIN]Adding intrinsics for range_round_{sd|ss}
Differential Revision: http://reviews.llvm.org/D21002

llvm-svn: 272123
2016-06-08 08:19:27 +00:00
Michael Zuckerman 96d0399658 [clang][AVX512][Intrinsics] Adding intrinsics reduce_[round]_{ss|sd} to clang
Differential Revision: http://reviews.llvm.org/D21014

llvm-svn: 272012
2016-06-07 14:00:20 +00:00
Craig Topper dca1f230ae [AVX512] Add intrinsics for 512-bit insertf32x8/insertf32x4/inserti32x4.
llvm-svn: 269617
2016-05-15 21:26:20 +00:00
Michael Zuckerman edc82fe3ef [Clang][Builtin][AVX512]Adding intrinsics for vfpclass{sd|ss} vfpclass{pd|ps} instruction set
Differential Revision: http://reviews.llvm.org/D19476

llvm-svn: 267414
2016-04-25 14:48:23 +00:00
Michael Zuckerman ef2979af50 [Clang][AVX512][BUILTIN] Adding intrinsics support to VEXTRACT{I|F} and VINSERT{I|F} instruction set
Differential Revision: http://reviews.llvm.org/D19097

llvm-svn: 266745
2016-04-19 15:18:23 +00:00
Michael Zuckerman c2b6128a8f [Clang][AVX512][Builtin] Adding support for VBROADCAST and VPBROADCASTB/W/D/Q instruction set
Differential Revision: http://reviews.llvm.org/D19012

llvm-svn: 266195
2016-04-13 12:58:01 +00:00
Michael Zuckerman 074edd7c1e [Clang][AVX512][Builtin] Adding supporting to intrinsics of cvt{b|d|q}2mask{128|256|512} and cvtmask2{b|d|q}{128|256|512} instruction set.
Differential Revision: http://reviews.llvm.org/D19009

llvm-svn: 266188
2016-04-13 10:49:37 +00:00
Eric Christopher cd875efa78 Canonicalize some of the x86 builtin tests and either remove or comment
about optimization options.

llvm-svn: 250271
2015-10-14 05:40:21 +00:00
Asaf Badouh 2718051dd7 re-apply r.247881
fixed the tests.

llvm-svn: 247892
2015-09-17 14:53:37 +00:00
Asaf Badouh 8a61250709 revert r.247881 due to tests failures
llvm-svn: 247883
2015-09-17 13:09:33 +00:00
NAKAMURA Takumi 007f75dab7 Appease clang/test/CodeGen/avx512dq-builtins.c for -Asserts, for now.
llvm-svn: 247882
2015-09-17 12:33:34 +00:00
Asaf Badouh a0e5e71ef1 [X86][AVX512DQ] add new intrinsics
convert i64 to FP and vice versa
reduceps & reducepd
rangeps & rangepd
all in their 512bit versions


Differential Revision: http://reviews.llvm.org/D11716

llvm-svn: 247881
2015-09-17 11:56:04 +00:00
Elena Demikhovsky e7d4c2e229 AVX-512: Added AVX-512 intrinsics and tests
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236218
2015-04-30 09:24:29 +00:00