llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	88097d9355	[X86] Add back some masked vector truncate builtins. Custom IRgen a a few others. I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl. By using special buitlins we can emit a select without using the 128/256-bit select builtins. llvm-svn: 334331	2018-06-08 21:50:08 +00:00
Craig Topper	5f50f33806	[X86] Fold masking into subvector extract builtins. I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features. The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported. llvm-svn: 334330	2018-06-08 21:50:07 +00:00
Craig Topper	03f4f04b91	[X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking. llvm-svn: 334311	2018-06-08 18:00:25 +00:00
Craig Topper	03de166ccd	[X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking. llvm-svn: 334265	2018-06-08 06:13:16 +00:00
Craig Topper	3428beeb2f	[X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261	2018-06-08 03:24:47 +00:00
Craig Topper	acf5601961	[X86] Add builtins for vpermilps/pd instructions to enable target feature checking. llvm-svn: 334256	2018-06-08 00:59:27 +00:00
Craig Topper	9392136414	[X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable target feature checking and immediate range checking. llvm-svn: 334244	2018-06-07 23:03:08 +00:00
Craig Topper	e56819eb69	[X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking. We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature. I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that. llvm-svn: 334237	2018-06-07 21:27:41 +00:00
Craig Topper	b92c77d176	[X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit fmadd/fmsub/fmaddsub/fmsubadd builtins. Summary: We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't. I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim. The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform. I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it. Reviewers: tkrupa, RKSimon, spatel, GBuella Reviewed By: tkrupa Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47724 llvm-svn: 334159	2018-06-07 02:46:02 +00:00
Reid Kleckner	89fbd55145	Revert r333791 "Cap "voluntary" vector alignment at 16 for all Darwin platforms." Adding __attribute__((aligned(32))) to __m256 breaks the implementation of _mm256_loadu_ps on Windows. On Windows, alignment attributes have higher precedence than packing attributes. We also might want to carefully consider the consequences of changing our vector typedefs, since many users copy them and invent their own new, non-Intel specific vector type names. llvm-svn: 333958	2018-06-04 21:39:20 +00:00
Craig Topper	95ed88a1a9	[X86] Avoid passing _mm_undefined* to builtin_shufflevector if we are able to pass the first input a second time. This is more consistent with other usages of builtin_shufflevector. Later optimization passes or codegen will detect the duplicate vector and replace it with undef. Using _mm_undefined just puts a zeroinitializer that still needs to be optimized out later. llvm-svn: 333944	2018-06-04 19:28:09 +00:00
Craig Topper	ae5f0a8a78	[X86] Fix a couple places that were using macro arguments twice when of the usages could just be undefined. One of the arguments was being used when the passthru argument is unused due to the mask being all 1s. But in that case the actual value doesn't matter so we should use undef instead to avoid expanding the macro argument unnecessarily. llvm-svn: 333865	2018-06-04 02:56:18 +00:00
Craig Topper	1359a46c48	[X86] Remove superfluous escaped new lines from intrinsic files. llvm-svn: 333858	2018-06-03 23:31:01 +00:00
John McCall	280c656031	Cap "voluntary" vector alignment at 16 for all Darwin platforms. This fixes two major problems: - We were not capping vector alignment as desired on 32-bit ARM. - We were using different alignments based on the AVX settings on Intel, so we did not have a consistent ABI. This is an ABI break, but we think we can get away with it because vectors tend to be used mostly in inline code (which is why not having a consistent ABI has not proven disastrous on Intel). Intel's AVX types are specified as having 32-byte / 64-byte alignment, so align them explicitly instead of relying on the base ABI rule. Note that this sort of attribute is stripped from template arguments in template substitution, so there's a possibility that code templated over vectors will produce inadequately-aligned objects. The right long-term solution for this is for alignment attributes to be interpreted as true qualifiers and thus preserved in the canonical type. llvm-svn: 333791	2018-06-01 21:34:26 +00:00
Martin Storsjo	cad7a5f1aa	[X86] Remove leftover semicolons at end of macros This was missed in a few places in SVN r333613, causing compilation errors if these macros are used e.g. as parameter to a function. llvm-svn: 333734	2018-06-01 09:40:50 +00:00
Craig Topper	a6dd2faaea	[X86] Make 512-bit unmasked load/store builtins more like their 128/256-bit equivalents. Previously we were just passing -1 mask to the masked builtin. This changes it to the more generic way that the 128/256 bit use. llvm-svn: 333626	2018-05-31 05:02:08 +00:00
Craig Topper	cbf3929bc9	[X86] Fix some places where macro arguments to intrinsics weren't cast to _m512(i\|d)/_m256(i\|d/_m128(i\|d) first. The majority of the cases were correct. This fixes the few that weren't. I also removed some superfluous parentheses in non-macros that confused by attempts at grepping for missing casts. llvm-svn: 333615	2018-05-31 01:24:40 +00:00
Craig Topper	c633867944	[X86] Remove __extension__ from macro intrinsics when its not needed. I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added. Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load. It also removed a bogus error message on another test. llvm-svn: 333613	2018-05-31 00:51:20 +00:00
Craig Topper	73d1d403e2	[X86] Use C style comments in intrinsic headers for overall consistency. Most of the origial comments used C style /* */ comments, but some C++ // comments had snuck in over time. Still need to convert all the doxygen comments. Which is much harder to do. llvm-svn: 333603	2018-05-30 22:33:21 +00:00
Craig Topper	63ec0ea7bc	[X86] Add __extension__ to a bunch of places in our intrinsic headers that fail if you run it through -pedantic -ansi. All of these are lines that create a 'compound literal' to concatenate elements together. llvm-svn: 333593	2018-05-30 21:08:27 +00:00
Craig Topper	dff5b311af	[X86] Reduce the number of setzero intrinsics to just the set defined by the Intel Intrinsics Guide. We had quite a few for different element sizes of integers sometimes with strange target features attached to them. We only need a single version for each of _m128i, _m256i, and _m512i with the target feature that first introduced those types. llvm-svn: 333568	2018-05-30 18:02:11 +00:00
Gabor Buella	70d8d51073	[X86] Lowering FMA intrinsics to native IR (Clang part) This patch replaces all packed (and scalar without rounding mode) fused intrinsics with fmadd/fmaddsub variations. Then fmadd/fmaddsub are lowered to native IR. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47444 llvm-svn: 333555	2018-05-30 15:27:49 +00:00
Craig Topper	68a272d501	[X86] Merge the 3 different flavors of masked vpermi2var/vpermt2var builtins to a single version without masking. Use select builtins with appropriate operand instead. llvm-svn: 333387	2018-05-29 03:26:38 +00:00
Craig Topper	f99532faee	Revert r333347 "[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible." This wasn't supposed to be commited yet. llvm-svn: 333349	2018-05-26 18:57:41 +00:00
Craig Topper	e091523c82	[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible. Summary: We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX. For v16i32 and floating point we have legacy 128/256 bit instructions we can use. I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization. Reviewers: RKSimon, spatel, GBuella Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47401 llvm-svn: 333347	2018-05-26 18:55:24 +00:00
Craig Topper	26df8c48da	[X86] Fix a bad cast in _mm512_mask_abs_epi32 and _mm512_maskz_abs_epi32. llvm-svn: 333211	2018-05-24 17:32:49 +00:00
Craig Topper	9bed2e6953	[X86] Undef the vector reduction helper macros when we're done with them. These are implementation helper macros we shouldn't expose them to user code if we don't need to. llvm-svn: 333064	2018-05-23 06:31:36 +00:00
Craig Topper	39e0347e6a	[X86] In the floating point max reduction intrinsics, negate infinity before feeding it to set1. Previously we negated the whole vector after splatting infinity. But its better to negate the infinity before splatting. This generates IR with the negate already folded with the infinity constant. llvm-svn: 333062	2018-05-23 05:51:52 +00:00
Craig Topper	f2043b08b4	[X86] Remove mask argument from more builtins that are handled completely in CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead. llvm-svn: 333061	2018-05-23 04:51:54 +00:00
Craig Topper	842171de36	[X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics. I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions. We already do something similar for scalar conversions. Differential Revision: https://reviews.llvm.org/D46863 llvm-svn: 332882	2018-05-21 20:19:17 +00:00
Craig Topper	55b4067350	[X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead. Someday maybe we'll use selects for all the builtins. llvm-svn: 332825	2018-05-20 23:34:10 +00:00
Craig Topper	9d146bbaf7	[X86] Revert part of r332266: Use __builtin_convertvector to replace some of the avx512 truncate builtins. The masking doesn't work right in the backend for the ones that produce byte or word elements without avx512bw. llvm-svn: 332322	2018-05-15 03:17:52 +00:00
Craig Topper	25de41cfbc	[X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins. As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend. Differential Revision: https://reviews.llvm.org/D46742 llvm-svn: 332266	2018-05-14 17:50:40 +00:00
Craig Topper	8cb261e353	[X86] Use select instrution and fpextend in the implementation of _mm512_mask_cvtps_pd and _mm512_maskz_cvtps_pd. llvm-svn: 332213	2018-05-14 04:57:46 +00:00
Craig Topper	daaf105f86	[X86] Use __builtin_convertvector to implement _mm512_cvtps_pd. If we're using default rounding mode we can let __builtin_convertvector to generate an fpextend. This matches 128 and 256 bit. If we're using the version that takes an explicit rounding mode argument we would need to look at the immediate to see if its CUR_DIRECTION. llvm-svn: 332210	2018-05-14 04:05:06 +00:00
Craig Topper	6fa91254e4	[X86] Emit better code for _mm_cvtu32_sd, _mm_cvtu64_sd, _mm_cvtu32_ss, and _mm_cvtu64_ss. We can use direct C code for these that will use uitofp and insertelement instructions. For the versions that take an explicit rounding mode we can't do this. llvm-svn: 332203	2018-05-13 23:03:30 +00:00
Craig Topper	74ac0eda68	[X86] Change the implementation of scalar masked load/store intrinsics to not use a 512-bit intermediate vector. This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs. Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well. llvm-svn: 331958	2018-05-10 05:43:43 +00:00
Adrian Prantl	9fc8faf9e6	Remove \brief commands from doxygen comments. This is similar to the LLVM change https://reviews.llvm.org/D46290. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46320 llvm-svn: 331834	2018-05-09 01:00:01 +00:00
Craig Topper	e95bde33df	[X86] Add support for _mm512_mullox_epi64 and _mm512_mask_mullox_epi64 intrinsics to match icc. On AVX512F targets we'll produce an emulated sequence using 3 pmuludqs with shifts and adds. On AVX512DQ we'll use vpmulld. Fixes PR37140. llvm-svn: 330923	2018-04-26 05:38:39 +00:00
Craig Topper	ce281a41b5	[X86] Remove '#ifdef __x86_64__' around mask_set1_epi64 intrinsics. The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either. llvm-svn: 330681	2018-04-24 03:36:08 +00:00
Craig Topper	f517f1a516	[X86] Implement old kunpck intrinsics using vector ops on vXi1 instead of integer shift/and/or Summary: kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation. I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations. Reviewers: RKSimon, spatel, zvi, jina.nahias Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D42016 llvm-svn: 322461	2018-01-14 19:23:50 +00:00
Jina Nahias	eb0829155f	[x86][AVX512] Lowering kunpack intrinsics to LLVM IR This patch, together with a matching llvm patch (https://reviews.llvm.org/D39720), implements the lowering of X86 kunpack intrinsics to IR. Differential Revision: https://reviews.llvm.org/D39719 Change-Id: Id5d3cb394ad33b98be79a6783d1d15569e2b798d llvm-svn: 319777	2017-12-05 15:42:47 +00:00
Uriel Korach	5b2b71d909	[X86] test/testn intrinsics lowering to IR. clang side Change Header files of the intrinsics for lowering test and testn intrinsics to IR code. Removed test and testn builtins from clang Differential Revision: https://reviews.llvm.org/D38737 llvm-svn: 318035	2017-11-13 12:50:52 +00:00
Jina Nahias	dca979194d	[x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IR This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38672 Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047 llvm-svn: 318025	2017-11-13 09:15:31 +00:00
Craig Topper	57f96ac6dc	[X86] Replace the mask cmpeq/cmple/cmplt/cmpgt/cmpge/cmpneq intrinsics with macros that just pass the right comparison predicate value to the regular cmp intrinsic. Remove mask cmpeq/cmpgt builtins that are now unused. This shortens the intrinsic headers a little and allows us to get rid of the cmpeq and cmpgt handling from CGBuiltin.cpp. llvm-svn: 317506	2017-11-06 21:00:49 +00:00
Jina Nahias	3ad702a1ed	Lowering Mask Set1 intrinsics to LLVM IR This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37668 llvm-svn: 313624	2017-09-19 11:00:27 +00:00
Craig Topper	04370d3a82	[X86] Disable _mm512_maskz_set1_epi64 intrinsic on 32-bit targets to prevent a backend isel failure. The __builtin_ia32_pbroadcastq512_mem_mask we were previously trying to use in 32-bit mode is not implemented in the x86 backend and causes isel to fail in release builds. In debug builds it fails even earlier during legalization with an llvm_unreachable. While there add the missing test case for this intrinsic for this for 64-bit mode. This fixes PR34631. D37668 should be able to recover this for 32-bit mode soon. But I wanted to fix the crash ahead of that. llvm-svn: 313392	2017-09-15 20:27:59 +00:00
Simon Pilgrim	1ba2bf2162	[X86][AVX512] _mm512_stream_load_si512 should take a void const* argument (PR33977) Based off the Intel Intrinsics guide, we should expect a void const* argument. Prevents 'passing 'const void ' to parameter of type 'void ' discards qualifiers' warnings. Differential Revision: https://reviews.llvm.org/D37449 llvm-svn: 312523	2017-09-05 10:06:41 +00:00
Simon Pilgrim	c14865c0c5	[X86][AVX] Ensure vector non-temporal load/store intrinsics force pointer alignment (PR33830) Clang specifies a max type alignment of 16 bytes on darwin targets (annoyingly in the driver not via cc1), meaning that the builtin nontemporal stores don't correctly align the loads/stores to 32 or 64 bytes when required, resulting in lowering to temporal unaligned loads/stores. This patch casts the vectors to explicitly aligned types prior to the load/store to ensure that the require alignment is respected. Differential Revision: https://reviews.llvm.org/D35996 llvm-svn: 309488	2017-07-29 15:33:34 +00:00
Simon Pilgrim	0b37ffbbf9	Strip trailing whitespace. NFCI. llvm-svn: 309383	2017-07-28 14:01:51 +00:00
Simon Pilgrim	96d02f5503	[X86][AVX] Added support for _mm256_zext* helper intrinsics (PR32839) llvm-svn: 301749	2017-04-29 17:17:06 +00:00
Simon Pilgrim	9f6e79c5e4	[X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (clang) MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics. LLVM companion patch: D31767. Differential Revision: https://reviews.llvm.org/D31766 llvm-svn: 300326	2017-04-14 15:05:57 +00:00
Craig Topper	bf82498301	[AVX-512] Fix a couple more intrinsic macros I missed in r299346. llvm-svn: 299347	2017-04-03 03:51:57 +00:00
Craig Topper	ac9959eb53	[AVX-512] Fix some intrinsic macros that use the wrong macro parameter names and don't have parentheses around them. Thanks to Matthew Barr for reporting this issue. llvm-svn: 299346	2017-04-03 03:41:29 +00:00
Simon Pilgrim	60e924985c	[X86][AVX512] Add _mm512_cvtsd_f64 and _mm512_cvtss_f32 intrinsics (PR32305) Differential Revision: https://reviews.llvm.org/D31155 llvm-svn: 298364	2017-03-21 12:46:13 +00:00
Igor Breger	f050b797ac	[X86][AVX512][Clang][Intrinsics] Adding missing intrinsics to Clang . Summary: Adding missing intrinsics : _mm512_set_epi16, _mm512_set_epi8, _mm512_permutevar_epi32 _mm512_mask_permutevar_epi32 Reviewers: zvi, guyblank, eladcohen, craig.topper Reviewed By: craig.topper Subscribers: craig.topper, cfe-commits Differential Revision: https://reviews.llvm.org/D31034 llvm-svn: 298208	2017-03-19 08:27:16 +00:00
Craig Topper	6afc436a78	[AVX-512] Change the input type for some load intrinsics to take void type like the spec (and the test cases say). llvm-svn: 298042	2017-03-17 05:59:25 +00:00
Craig Topper	2e5058c403	[AVX-512] Add missing typecasts and parentheses to _mm512_mask_i64gather_ps. My macro cleanup script I used on the others last year must have missed it. llvm-svn: 298040	2017-03-17 05:14:37 +00:00
Craig Topper	367c86ddbe	[AVX-512] Replace subvector broadcast builtins with shufflevectors and selects. Verified that the backend codegens this equally well. llvm-svn: 292329	2017-01-18 02:17:10 +00:00
Craig Topper	70536f4e47	[AVX-512] Replace masked 512-bit pmuldq and pmuludq builtins with the newly added unmasked versions and selects. llvm-svn: 290580	2016-12-27 04:04:57 +00:00
Craig Topper	32866ab800	Revert r290574 "foo" This was supposed to be merged with another commit with a real commit message. Sorry. llvm-svn: 290579	2016-12-27 04:03:29 +00:00
Craig Topper	c5ab78d4c3	Revert r290575 "[AVX-512] Replace masked 512-bit pmuldq and pmuludq builtins with the newly added unmasked versions and selects." I failed to merge this with r290574. llvm-svn: 290578	2016-12-27 04:03:25 +00:00
Craig Topper	6ad5bcc8ac	[AVX-512] Replace masked 512-bit pmuldq and pmuludq builtins with the newly added unmasked versions and selects. llvm-svn: 290575	2016-12-27 03:46:16 +00:00
Craig Topper	39b9e32493	foo llvm-svn: 290574	2016-12-27 03:46:13 +00:00
Craig Topper	678b07fe3c	[AVX-512] Remove masking from 512-bit vpermil builtins. The backend now has versions without masking so wrap it with select. This will allow the backend to constant fold these to generic shuffle vectors like 128-bit and 256-bit without having to working about handling masking. llvm-svn: 289351	2016-12-11 01:26:52 +00:00
Craig Topper	6aefe00ccf	[X86] Replace valignd/q builtins with appropriate __builtin_shufflevector. llvm-svn: 287733	2016-11-23 01:47:12 +00:00
Simon Pilgrim	698528d83b	[X86][AVX512] Replace lossless i32/u32 to f64 conversion intrinsics with generic IR Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen. This patch removes the clang builtins and their use in the headers - a future patch will deal with removing the llvm intrinsics. This is an extension patch to D20528 which dealt with the equivalent sse/avx cases. Differential Revision: https://reviews.llvm.org/D26686 llvm-svn: 287088	2016-11-16 09:27:40 +00:00
Craig Topper	5e0709d60b	[AVX-512] Replace masked dword and qword variable shift builtins with unmasked builtins and a select. This is part of a set of changes to allow InstCombine in the backend to optimize variable shifts without having to know about masking. llvm-svn: 286757	2016-11-13 07:26:34 +00:00
Craig Topper	d7e5b21914	[X86] Remove extra escaped new lines in intrinsic headers left over from an earlier conversion away from a macro. NFC llvm-svn: 286756	2016-11-13 07:26:31 +00:00
Craig Topper	2c8f49e67b	[AVX-512] Use scalar vfmsub/vfnmsub mask3 intrinsics instead of inverting the mask argument of a vfmadd intrinsic. Summary: Inverting the mask argument does not reflect the intended semantics of the intrinsic. Reviewers: igorb, delena Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D26019 llvm-svn: 286733	2016-11-12 23:24:34 +00:00
Craig Topper	1a44193afd	[AVX-512] Convert the rest of the masked shift by immediate and by single element builtins over to the newly added unmasked builtins and a select. This should also fix PR30691 since the new builtins are handled like the legacy builtins in the backend. llvm-svn: 286714	2016-11-12 07:16:59 +00:00
Ayman Musa	e60a41ca28	[X86][AVX512][Clang] Add support for mask_{move\|store\|load}_s{s/d} and int2mask/mask2int intrinsics. Differential Revision: https://reviews.llvm.org/D26021 llvm-svn: 286229	2016-11-08 12:00:30 +00:00
Craig Topper	08bf53ffda	[AVX-512] Remove masked vector insert builtins and replace with native shufflevectors and selects. Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit. llvm-svn: 285667	2016-11-01 05:47:56 +00:00
Craig Topper	93ffabd28d	[AVX-512] Remove masked vector extract builtins and replace with native shufflevectors and selects. Unfortunately, the backend currently doesn't fold masks into the instructions correctly when they come from these shufflevectors. I'll work on that in a future commit. llvm-svn: 285540	2016-10-31 04:30:56 +00:00
Michael Zuckerman	d343697f1e	Fixing "type" issue for (epi32) and replaceing hardcoded inf with clang builtin inf "__builtin_inff()" for float ({max\|min}_{pd\|ps}) llvm-svn: 285519	2016-10-30 14:54:05 +00:00
Michael Zuckerman	25eb420233	[X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (max\|min) intrinsics to Clang . After LGTM and Check-all Vector-reduction arithmetic accepts vectors as inputs and produces scalars as outputs.This class of vector operation forms the basis of many scientific computations. In vector-reduction arithmetic, the evaluation off is independent of the order of the input elements of V. Reviewer: 1. craig.topper 2. igorb Differential Revision: https://reviews.llvm.org/D25988 llvm-svn: 285493	2016-10-29 10:29:20 +00:00
Michael Zuckerman	edd99eb07a	1. Fixing small types issue (PD\|PS) (reduce) . 2. Cosmetic changes llvm-svn: 285405	2016-10-28 15:16:03 +00:00
Craig Topper	f202365910	[AVX-512] Fix the operand order for all calls to __builtin_ia32_vfmaddss3_mask. Summary: The preserved input should be the first argument and the vector inputs should be in the same order as the intrinsics it is used to implement. Reviewers: igorb, delena Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D25902 llvm-svn: 285175	2016-10-26 05:35:38 +00:00
Michael Zuckerman	facb37cabf	[X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (Operators: +,*,&&,\|\|) intrinsics to Clang Committed after LGTM and check-all Vector-reduction arithmetic accepts vectors as inputs and produces scalars as outputs. This class of vector operation forms the basis of many scientific computations. In vector-reduction arithmetic, the evaluation off is independent of the order of the input elements of V. Used bisection method. At each step, we partition the vector with previous step in half, and the operation is performed on its two halves. This takes log2(n) steps where n is the number of elements in the vector. Reviwer: 1. igorb 2. craig.topper Differential Revision: https://reviews.llvm.org/D25527 llvm-svn: 285054	2016-10-25 07:56:04 +00:00
Michael Zuckerman	33bd5b235b	revert r284963 because new test file is failing in some OS. test/CodeGen/avx512-reduceIntrin.c llvm-svn: 284967	2016-10-24 11:30:23 +00:00
Michael Zuckerman	98cb041891	[X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (Operators: +,*,&&,\|\|) intrinsics to Clang Committed after LGTM and check-all Vector-reduction arithmetic accepts vectors as inputs and produces scalars as outputs. This class of vector operation forms the basis of many scientific computations. In vector-reduction arithmetic, the evaluation off is independent of the order of the input elements of V. Used bisection method. At each step, we partition the vector with previous step in half, and the operation is performed on its two halves. This takes log2(n) steps where n is the number of elements in the vector. Differential Revision: https://reviews.llvm.org/D25527 llvm-svn: 284963	2016-10-24 10:53:20 +00:00
Craig Topper	0c5da26572	[AVX-512] Replace 512-bit pmovzx/sx builtins with native IR. llvm-svn: 284936	2016-10-23 07:35:47 +00:00
Michael Zuckerman	9e43ccfe68	[Clang][AVX512][BuiltIn]Adding missing intrinsics move_{sd\|ss} to clang Differential Revision: http://reviews.llvm.org/D21021 llvm-svn: 283314	2016-10-05 12:56:06 +00:00
Craig Topper	c4a8228bcc	[AVX-512] Use native IR for masked 512-bit add/sub/mul/div ps/pd intrinsics when rounding mode isn't used. llvm-svn: 283073	2016-10-02 17:43:00 +00:00
Ayman Musa	17a2819b05	Update to commit r282488, fix the buildboot failure. llvm-svn: 282492	2016-09-27 15:37:31 +00:00
Ayman Musa	2e250e8845	[avx512] Add aliases to some missing avx512 intrinsics. Differential Revision:https: //reviews.llvm.org/D24961 llvm-svn: 282488	2016-09-27 14:06:32 +00:00
Craig Topper	f43e4a1728	[AVX-512] Remove masked integer mullo builtins and replace with native IR. llvm-svn: 280597	2016-09-03 19:19:49 +00:00
Craig Topper	0e18976b8d	[AVX-512] Remove masked integer add/sub builtins and replace with native IR. llvm-svn: 280596	2016-09-03 18:29:35 +00:00
Asaf Badouh	356bb76809	[X86][AVX512F] minor fix of the parameter names add "__" prefix Bug 28842 https://llvm.org/bugs/show_bug.cgi?id=29040 Differential Revision: https://reviews.llvm.org/D23753 llvm-svn: 279392	2016-08-21 07:56:47 +00:00
Asaf Badouh	2f344b788c	[AVX512] integer comparisions enumeration. fix Bug 28842 https://llvm.org/bugs/show_bug.cgi?id=28842 Differential Revision: https://reviews.llvm.org/D22212 llvm-svn: 277955	2016-08-07 10:43:04 +00:00
Eric Christopher	b638558e12	Remove unused variable. Fixes PR28761. llvm-svn: 277221	2016-07-29 22:11:11 +00:00
Craig Topper	45db56c375	[X86] Add missing __x86_64__ qualifiers on a bunch of intrinsics that assume 64-bit GPRs are available. Usages of these intrinsics in a 32-bit build results in assertions in the backend. llvm-svn: 276249	2016-07-21 07:38:39 +00:00
Asaf Badouh	a0b6f8fb56	[X86][AVX512F] minor fix of the parameter names add "__" prefix llvm-svn: 275384	2016-07-14 08:40:30 +00:00
Craig Topper	4d61a3c2d8	[AVX512] Replace masked AND/OR/XOR intrinsics with native code and remove the builtins. llvm-svn: 275049	2016-07-11 06:14:18 +00:00
Craig Topper	6e76fb61a7	[X86] Use __butilin_shufflevector for 512-bit shufps intrinsics. llvm-svn: 275012	2016-07-10 05:57:21 +00:00
Simon Pilgrim	f5a8837e1b	[X86][AVX512] Converted the VBROADCAST intrinsics to generic IR llvm-svn: 274544	2016-07-05 12:59:33 +00:00
Asaf Badouh	136332888a	[X86][AVX512F] add float/double abs intrinsics add abs intrinsics that use native LLVM-IR. change _mm512_mask[z]_and_epi{32\|64} to use select intrinsic Differential Revision: http://reviews.llvm.org/D21973 llvm-svn: 274542	2016-07-05 12:24:14 +00:00
Asaf Badouh	f9cdb8de7a	[AVX512] minor fix in sqrt{ss\|sd} intrinsics arguments Differential Revision: http://reviews.llvm.org/D21988 llvm-svn: 274541	2016-07-05 11:36:21 +00:00
Craig Topper	2a383c9273	[X86] Use undefined instead of setzero in shufflevector based intrinsics when the second source is unused. Rewrite immediate extractions in shuffle intrinsics to be in ((c >> x) & y) form instead of ((c & z) >> x). This way only x varies between each use instead of having to vary x and z. llvm-svn: 274525	2016-07-04 22:18:01 +00:00
Simon Pilgrim	427154db2a	[X86][AVX512] Converted the VSHUFPD intrinsics to generic IR llvm-svn: 274523	2016-07-04 21:30:47 +00:00

1 2 3 4 5 ...

279 Commits