llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	36ab775cc1	[X86] Use masked the masked scalar fma builtins to implement the default rounding version of the fma intrinsics. The rounding mode is checked in CGBuiltin.cpp to generate the correct intrinsic call. Making this switch switchs the masking to use the i8 bitcast to <8 x i1> and extract i1 version of the IR for the mask. Previously we ended up with a scalar 'and' plus an icmp. llvm-svn: 336637	2018-07-10 04:38:29 +00:00
Craig Topper	638426fc36	[X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that is suitable for use in scalar mask intrinsics. This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics. This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling. I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle. llvm-svn: 336622	2018-07-10 00:37:25 +00:00
Craig Topper	74c10e3236	[Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter. Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible. To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future. To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin. To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute. To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins. There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch. Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue. Differential Revision: https://reviews.llvm.org/D48617 llvm-svn: 336583	2018-07-09 19:00:16 +00:00
Craig Topper	0a485d13a4	[X86] Remove some unnecessarily escaped new lines from avx512fintrin.h llvm-svn: 336499	2018-07-07 22:03:19 +00:00
Craig Topper	3e720a302c	[X86] Fix a few intrinsics that were ignoring their rounding mode argument and hardcoded _MM_FROUND_CUR_DIRECTION internally. I believe these have been broken since their introduction into clang. I've enhanced the tests for these intrinsics to using a real rounding mode and checking all the intrinsic arguments instead of just the name. llvm-svn: 336498	2018-07-07 22:03:16 +00:00
Craig Topper	218da62091	[X86] Change _mm512_shuffle_pd and _mm512_shuffle_ps to use target specific shuffle builtins instead of generic __builtin_shufflevector. I added the builtins for 128, 256, and 512 bits recently but looks like I failed to convert to using the 512 bit one. llvm-svn: 336488	2018-07-07 17:03:34 +00:00
Craig Topper	5cbeeedd27	[X86] Fix various type mismatches in intrinsic headers and intrinsic tests that cause extra bitcasts to be emitted in the IR. Found via imprecise grepping of the -O0 IR. There could still be more bugs out there. llvm-svn: 336487	2018-07-07 17:03:32 +00:00
Craig Topper	10f20fc42b	[X86] Add missing scalar fma intrinsics with rounding, but no mask. We had the mask versions of the rounding intrinsics, but not one without masking. Also change the rounding tests to not use the CUR_DIRECTION rounding mode. llvm-svn: 336470	2018-07-06 22:08:43 +00:00
Craig Topper	0029470dde	[X86] Correct the width of mask arguments in intrinsic headers and tests. All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there. Some of these were real bugs where we lost bits from the user input: _mm512_mask_broadcast_f32x8 _mm512_maskz_broadcast_f32x8 _mm512_mask_broadcast_i32x8 _mm512_maskz_broadcast_i32x8 _mm256_mask_cvtusepi16_storeu_epi8 llvm-svn: 336042	2018-06-30 06:05:17 +00:00
Craig Topper	0e9de769a0	[X86] Remove masking from the avx512 rotate builtins. Use a select builtin instead. llvm-svn: 336036	2018-06-30 01:32:14 +00:00
Justin Lebar	2a192abaec	[CUDA] Make __host__/__device__ min/max overloads constexpr in C++14. Summary: Tests in a separate change to the test-suite. Reviewers: rsmith, tra Subscribers: lahwaacz, sanjoy, cfe-commits Differential Revision: https://reviews.llvm.org/D48151 llvm-svn: 336026	2018-06-29 22:28:09 +00:00
Justin Lebar	5cb41c2acf	[CUDA] Make min/max shims host+device. Summary: Fixes PR37753: min/max can't be called from __host__ __device__ functions in C++14 mode. Testcase in a separate test-suite commit. Reviewers: rsmith Subscribers: sanjoy, lahwaacz, cfe-commits Differential Revision: https://reviews.llvm.org/D48036 llvm-svn: 336025	2018-06-29 22:27:56 +00:00
Craig Topper	8bf793fb35	[X86] Remove masking from the avx512 packed sqrt builtins. Use select builtins instead. llvm-svn: 335945	2018-06-29 05:43:33 +00:00
Craig Topper	1763dbb278	[X86] Correct the inline assembly implementations of __movsb/w/d/q and __stosw/d/q to mark registers/memory as modified The inline assembly for these didn't mark that edi, esi, ecx are modified by movs/stos instruction. It also didn't mark that memory is modified. This issue was reported to llvm-dev last year http://lists.llvm.org/pipermail/cfe-dev/2017-November/055863.html but no bug was ever filed. Differential Revision: https://reviews.llvm.org/D48448 llvm-svn: 335270	2018-06-21 18:56:30 +00:00
Craig Topper	b2431c6c33	[Intrinsics] Add/move some builtin declarations in intrin.h to get ms-intrinsics.c to not issue warnings ud2 and int2c were missing declarations entirely. And the bitscans were only under x86_64, but they seem to be in BuiltinsARM.def as well and are tested by ms_intrinsics.c Differential Revision: https://reviews.llvm.org/D48187 llvm-svn: 335259	2018-06-21 17:07:04 +00:00
Craig Topper	ddfe69cc99	[X86] Rewrite the add/mul/or/and reduction intrinsics to make better use of other intrinsics and remove undef shuffle indices. Similar to what was done to max/min recently. These already reduced the vector width to 256 and 128 bit as we go unlike the original max/min code. Differential Revision: https://reviews.llvm.org/D48346 llvm-svn: 335253	2018-06-21 16:41:28 +00:00
Craig Topper	2da60bc231	[X86] Remove masking from the 512-bit floating point max/min builtins. Use select in IR instead. llvm-svn: 335200	2018-06-21 05:01:01 +00:00
Craig Topper	d334615c23	[X86] Undefine _mm512_mask_reduce_operator macro in avx512fintrin.h before redefining it. llvm-svn: 335086	2018-06-20 00:31:39 +00:00
Craig Topper	61495b3650	Recommit r335070 "[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible."" Test has been updated to reflect the IRGen. llvm-svn: 335075	2018-06-19 21:00:30 +00:00
Craig Topper	79b13a0348	Revert r335070 "[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible." The test changes are failing the buildbot and its going to take me some time to fix it. llvm-svn: 335072	2018-06-19 19:37:07 +00:00
Craig Topper	873afd0262	[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible. We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX. For v16i32 and floating point we have legacy 128/256 bit instructions we can use. I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization. Differential Revision: https://reviews.llvm.org/D47401 llvm-svn: 335070	2018-06-19 19:13:54 +00:00
Craig Topper	31730ae761	[X86] Rename __builtin_ia32_pslldqi128 to __builtin_ia32_pslldqi128_byteshift and similar for other sizes. Remove the multiply by 8 from the header files. The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8. This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic. Fixes the remaining issue from PR37795. llvm-svn: 334773	2018-06-14 22:02:35 +00:00
Craig Topper	b521dc3acf	[X86] Add inline assembly versions of _InterlockedExchange_HLEAcquire/Release and _InterlockedCompareExchange_HLEAcquire/Release for MSVC compatibility. Clang/LLVM doesn't have a way to pass an HLE hint through to the X86 backend to emit HLE prefixed instructions. So this is a good short term fix. Differential Revision: https://reviews.llvm.org/D47672 llvm-svn: 334751	2018-06-14 18:43:52 +00:00
Tomasz Krupa	82aa42af49	[X86] Lowering Mask Scalar intrinsics to native IR (Clang part) Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls to native IR. Reviewers: craig.topper, RKSimon, spatel, sroland Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47979 llvm-svn: 334741	2018-06-14 17:36:23 +00:00
Craig Topper	2527c378c6	[X86] Remove masking from avx512vbmi2 concat and shift by immediate builtins. Use select builtins instead. llvm-svn: 334577	2018-06-13 07:19:28 +00:00
Craig Topper	91bbe98757	[X86] Remove masking from dbpsadbw builtins, use select builtin instead. llvm-svn: 334385	2018-06-11 06:18:29 +00:00
Craig Topper	3614b41a8e	[X86] Remove masking from the 512-bit packed floating point add/sub/mul/div builtins. Use select in IR instead. llvm-svn: 334359	2018-06-10 06:01:42 +00:00
Craig Topper	88097d9355	[X86] Add back some masked vector truncate builtins. Custom IRgen a a few others. I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl. By using special buitlins we can emit a select without using the 128/256-bit select builtins. llvm-svn: 334331	2018-06-08 21:50:08 +00:00
Craig Topper	5f50f33806	[X86] Fold masking into subvector extract builtins. I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features. The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported. llvm-svn: 334330	2018-06-08 21:50:07 +00:00
Craig Topper	03f4f04b91	[X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking. llvm-svn: 334311	2018-06-08 18:00:25 +00:00
Craig Topper	9d3962f4f1	[X86] Change immediate type for some builtins from char to int. These builtins are all handled by CGBuiltin.cpp so it doesn't much matter what the immediate type is, but int matches the intrinsic spec. llvm-svn: 334310	2018-06-08 18:00:22 +00:00
Craig Topper	422a1bbb84	[X86] Add builtins for shufps and shufpd to enable target feature and immediate range checking. llvm-svn: 334266	2018-06-08 07:18:33 +00:00
Craig Topper	03de166ccd	[X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking. llvm-svn: 334265	2018-06-08 06:13:16 +00:00
Craig Topper	573dab1553	[X86] Fix some typecasts in intrinsic headers that I messed up in r334261. This was caught by the Header tests, but not the CodeGen tests. llvm-svn: 334264	2018-06-08 04:09:14 +00:00
Craig Topper	3428beeb2f	[X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261	2018-06-08 03:24:47 +00:00
Craig Topper	acf5601961	[X86] Add builtins for vpermilps/pd instructions to enable target feature checking. llvm-svn: 334256	2018-06-08 00:59:27 +00:00
Craig Topper	7d17d7278b	[X86] Add builtins for blend with immediate control to enforce target feature requirements and check immediate range. llvm-svn: 334249	2018-06-08 00:00:21 +00:00
Craig Topper	9392136414	[X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable target feature checking and immediate range checking. llvm-svn: 334244	2018-06-07 23:03:08 +00:00
Reid Kleckner	aa46ed9278	[MS] Re-add support for the ARM interlocked bittest intrinscs Adds support for these intrinsics, which are ARM and ARM64 only: _interlockedbittestandreset_acq _interlockedbittestandreset_rel _interlockedbittestandreset_nf _interlockedbittestandset_acq _interlockedbittestandset_rel _interlockedbittestandset_nf Refactor the bittest intrinsic handling to decompose each intrinsic into its action, its width, and its atomicity. llvm-svn: 334239	2018-06-07 21:39:04 +00:00
Craig Topper	e56819eb69	[X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking. We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature. I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that. llvm-svn: 334237	2018-06-07 21:27:41 +00:00
Craig Topper	d3623155a2	[X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar intrinsics. We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits. This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously. llvm-svn: 334208	2018-06-07 17:28:03 +00:00
Craig Topper	b92c77d176	[X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit fmadd/fmsub/fmaddsub/fmsubadd builtins. Summary: We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't. I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim. The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform. I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it. Reviewers: tkrupa, RKSimon, spatel, GBuella Reviewed By: tkrupa Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47724 llvm-svn: 334159	2018-06-07 02:46:02 +00:00
Artem Belevich	a6cd1a994c	[CUDA] Replace 'nv_weak' attributes in CUDA headers with 'weak'. Differential Revision: https://reviews.llvm.org/D47804 llvm-svn: 334108	2018-06-06 17:52:55 +00:00
Craig Topper	f3914b74c1	[X86] Add builtins for vector element insert and extract for different 128 and 256 bit vector types. Use them to implement the extract and insert intrinsics. Previously we were just using extended vector operations in the header file. This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load. By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count. User code still has the option to use extended vector operations themselves if they need non-constant indexing. llvm-svn: 334057	2018-06-06 00:24:55 +00:00
Craig Topper	9b0c61e9de	[X86] Mark all the builtins and intrinsics that require MMX and an SSE feature as requiring both mmx and the sse feature. Previously we only checked the sse feature, but this means that if you passed -mno-mmx, the builtins/intrinsics wouldn't be disabled in the frontend and would instead fail backend isel. llvm-svn: 333980	2018-06-05 03:12:14 +00:00
Reid Kleckner	1d9c249db5	Reimplement the bittest intrinsic family as builtins with inline asm We need to implement _interlockedbittestandset as a builtin for windows.h, so we might as well do the whole family. It reduces code duplication anyway. Fixes PR33188, a long standing bug in our bittest implementation encountered by Chakra. llvm-svn: 333978	2018-06-05 01:33:40 +00:00
Reid Kleckner	89fbd55145	Revert r333791 "Cap "voluntary" vector alignment at 16 for all Darwin platforms." Adding __attribute__((aligned(32))) to __m256 breaks the implementation of _mm256_loadu_ps on Windows. On Windows, alignment attributes have higher precedence than packing attributes. We also might want to carefully consider the consequences of changing our vector typedefs, since many users copy them and invent their own new, non-Intel specific vector type names. llvm-svn: 333958	2018-06-04 21:39:20 +00:00
Craig Topper	95ed88a1a9	[X86] Avoid passing _mm_undefined* to builtin_shufflevector if we are able to pass the first input a second time. This is more consistent with other usages of builtin_shufflevector. Later optimization passes or codegen will detect the duplicate vector and replace it with undef. Using _mm_undefined just puts a zeroinitializer that still needs to be optimized out later. llvm-svn: 333944	2018-06-04 19:28:09 +00:00
Craig Topper	ae5f0a8a78	[X86] Fix a couple places that were using macro arguments twice when of the usages could just be undefined. One of the arguments was being used when the passthru argument is unused due to the mask being all 1s. But in that case the actual value doesn't matter so we should use undef instead to avoid expanding the macro argument unnecessarily. llvm-svn: 333865	2018-06-04 02:56:18 +00:00
Craig Topper	1359a46c48	[X86] Remove superfluous escaped new lines from intrinsic files. llvm-svn: 333858	2018-06-03 23:31:01 +00:00
Craig Topper	b41c6b854b	[X86] Explicitly make the arguments to __slwpcb intrinsic 'void'. This is the correct way to say it takes no arguments in C. llvm-svn: 333855	2018-06-03 22:05:19 +00:00
Craig Topper	6fb26f93ef	[X86] Replace __builtin_ia32_vbroadcastf128_pd256 and __builtin_ia32_vbroadcastf128_ps256 with an unaligned load intrinsics and a __builtin_shufflevector call. llvm-svn: 333853	2018-06-03 19:42:59 +00:00
John McCall	280c656031	Cap "voluntary" vector alignment at 16 for all Darwin platforms. This fixes two major problems: - We were not capping vector alignment as desired on 32-bit ARM. - We were using different alignments based on the AVX settings on Intel, so we did not have a consistent ABI. This is an ABI break, but we think we can get away with it because vectors tend to be used mostly in inline code (which is why not having a consistent ABI has not proven disastrous on Intel). Intel's AVX types are specified as having 32-byte / 64-byte alignment, so align them explicitly instead of relying on the base ABI rule. Note that this sort of attribute is stripped from template arguments in template substitution, so there's a possibility that code templated over vectors will produce inadequately-aligned objects. The right long-term solution for this is for alignment attributes to be interpreted as true qualifiers and thus preserved in the canonical type. llvm-svn: 333791	2018-06-01 21:34:26 +00:00
Craig Topper	d521d16ba4	[X86] Rewrite avx512vbmi unmasked and maskz macro intrinsics to be wrappers around their __builtin function with appropriate arguments rather than just passing arguments to the masked intrinsic. This is more consistent with all of our other avx512 macro intrinsics. It also fixes a bad cast where an argument was casted to mmask8 when it should have been a mmask16. llvm-svn: 333778	2018-06-01 18:26:35 +00:00
Martin Storsjo	cad7a5f1aa	[X86] Remove leftover semicolons at end of macros This was missed in a few places in SVN r333613, causing compilation errors if these macros are used e.g. as parameter to a function. llvm-svn: 333734	2018-06-01 09:40:50 +00:00
Craig Topper	a6dd2faaea	[X86] Make 512-bit unmasked load/store builtins more like their 128/256-bit equivalents. Previously we were just passing -1 mask to the masked builtin. This changes it to the more generic way that the 128/256 bit use. llvm-svn: 333626	2018-05-31 05:02:08 +00:00
Tim Shen	f811de484c	[X86] Fix wrong intrinsic semantic. llvm-svn: 333617	2018-05-31 01:51:07 +00:00
Craig Topper	cbf3929bc9	[X86] Fix some places where macro arguments to intrinsics weren't cast to _m512(i\|d)/_m256(i\|d/_m128(i\|d) first. The majority of the cases were correct. This fixes the few that weren't. I also removed some superfluous parentheses in non-macros that confused by attempts at grepping for missing casts. llvm-svn: 333615	2018-05-31 01:24:40 +00:00
Craig Topper	c633867944	[X86] Remove __extension__ from macro intrinsics when its not needed. I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added. Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load. It also removed a bogus error message on another test. llvm-svn: 333613	2018-05-31 00:51:20 +00:00
Craig Topper	73d1d403e2	[X86] Use C style comments in intrinsic headers for overall consistency. Most of the origial comments used C style /* */ comments, but some C++ // comments had snuck in over time. Still need to convert all the doxygen comments. Which is much harder to do. llvm-svn: 333603	2018-05-30 22:33:21 +00:00
Craig Topper	63ec0ea7bc	[X86] Add __extension__ to a bunch of places in our intrinsic headers that fail if you run it through -pedantic -ansi. All of these are lines that create a 'compound literal' to concatenate elements together. llvm-svn: 333593	2018-05-30 21:08:27 +00:00
Craig Topper	c5ec55e921	[X86] Simplify the implementation of _mm_sqrt_ss, _mm_rcp_ss, and _mm_rsqrt_ss. We don't need the insertion back into the original vector at the end. The builtin already understands that. This is different than _mm_sqrt_sd which takes two arguments and we do need to insert. llvm-svn: 333572	2018-05-30 18:27:07 +00:00
Craig Topper	dff5b311af	[X86] Reduce the number of setzero intrinsics to just the set defined by the Intel Intrinsics Guide. We had quite a few for different element sizes of integers sometimes with strange target features attached to them. We only need a single version for each of _m128i, _m256i, and _m512i with the target feature that first introduced those types. llvm-svn: 333568	2018-05-30 18:02:11 +00:00
Craig Topper	819f2a20c3	[X86] Remove 'return' from a bunch of intrinsics that return void and use a builtin that returns void. Found by running the intrinsic headers through -pedantic -ansi. llvm-svn: 333563	2018-05-30 17:23:45 +00:00
Gabor Buella	70d8d51073	[X86] Lowering FMA intrinsics to native IR (Clang part) This patch replaces all packed (and scalar without rounding mode) fused intrinsics with fmadd/fmaddsub variations. Then fmadd/fmaddsub are lowered to native IR. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47444 llvm-svn: 333555	2018-05-30 15:27:49 +00:00
Hans Wennborg	64fcb04950	Add missing curly from r333509 llvm-svn: 333515	2018-05-30 08:05:24 +00:00
Craig Topper	f6e79c6d3f	[X86] Remove masking from the AVX512VNNI builtins. Use a select in IR instead. llvm-svn: 333509	2018-05-30 05:26:04 +00:00
Craig Topper	3d9305f28b	[X86] Fix the names of a bunch of icelake intrinsics. Mostly this fixes the names of all the 128-bit intrinsics to start with _mm_ instead of _mm128_ as is the convention and what the Intel docs say. This also fixes the name of the bitshuffle intrinsics to say epi64 for 128 and 256 bit versions. llvm-svn: 333497	2018-05-30 03:38:15 +00:00
Craig Topper	68a272d501	[X86] Merge the 3 different flavors of masked vpermi2var/vpermt2var builtins to a single version without masking. Use select builtins with appropriate operand instead. llvm-svn: 333387	2018-05-29 03:26:38 +00:00
Craig Topper	f99532faee	Revert r333347 "[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible." This wasn't supposed to be commited yet. llvm-svn: 333349	2018-05-26 18:57:41 +00:00
Craig Topper	387b1423db	[X86] Remove mask from avx512ifma builtins. Use a select instruction instead. This reduces from 12 builtins to 6 since we no longer need a mask and maskz version. llvm-svn: 333348	2018-05-26 18:55:26 +00:00
Craig Topper	e091523c82	[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible. Summary: We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX. For v16i32 and floating point we have legacy 128/256 bit instructions we can use. I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization. Reviewers: RKSimon, spatel, GBuella Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47401 llvm-svn: 333347	2018-05-26 18:55:24 +00:00
Gabor Buella	078bb99a90	[x86] invpcid intrinsic An intrinsic for an old instruction, as described in the Intel SDM. Reviewers: craig.topper, rnk Reviewed By: craig.topper, rnk Differential Revision: https://reviews.llvm.org/D47142 llvm-svn: 333256	2018-05-25 06:34:42 +00:00
Craig Topper	26df8c48da	[X86] Fix a bad cast in _mm512_mask_abs_epi32 and _mm512_maskz_abs_epi32. llvm-svn: 333211	2018-05-24 17:32:49 +00:00
Craig Topper	3e7d8dfae3	[X86] Move the include of clzerointrin.h from immintrin.h back to x86intrin.h. This is an AMD intrinsic not an Intel intrinsic so it shouldn't be in immintrin.h llvm-svn: 333124	2018-05-23 21:04:26 +00:00
Raphael Isemann	83bdfe5e51	[modules] Mark __wmmintrin_pclmul.h/__wmmintrin_aes.h as textual Summary: Since clang r332929 these two headers throw errors when included from somewhere else than their wrapper header. It seems marking them as textual is the best way to fix the builds. Fixes this new module build error: While building module '_Builtin_intrinsics' imported from ...: In file included from <module-includes>:2: In file included from lib/clang/7.0.0/include/immintrin.h:54: In file included from lib/clang/7.0.0/include/wmmintrin.h:29: lib/clang/7.0.0/include/__wmmintrin_aes.h:25:2: error: "Never use <__wmmintrin_aes.h> directly; include <wmmintrin.h> instead." #error "Never use <__wmmintrin_aes.h> directly; include <wmmintrin.h> instead." Reviewers: rsmith, v.g.vassilev, craig.topper Reviewed By: craig.topper Subscribers: craig.topper, cfe-commits Differential Revision: https://reviews.llvm.org/D47277 llvm-svn: 333123	2018-05-23 20:59:46 +00:00
Craig Topper	664af9bc34	[X86] Move all Intel defined intrinsic includes into immintrin.h This matches the Intel documentation which shows them available by importing immintrin.h. x86intrin.h also includes immintrin.h so anyone including x86intrin.h will still get them. This is different than gcc, but I don't think we were a perfect match there already. I'm unclear what gcc's policy is about how they choose which to add things to. Differential Revision: https://reviews.llvm.org/D47182 llvm-svn: 333110	2018-05-23 18:32:58 +00:00
Ekaterina Romanova	9b412153bf	[DOXYGEN] Formatting changes for better intrinsics documentation rendering (1) I added some \see cross-references to a few select intrinsics that are related (and have the same or similar semantics). (2) pmmintrin.h, smmintrin.h, xmmintrin.h have very few minor formatting changes. They make rendering of our intrinsics documentation better. llvm-svn: 333065	2018-05-23 06:33:22 +00:00
Craig Topper	9bed2e6953	[X86] Undef the vector reduction helper macros when we're done with them. These are implementation helper macros we shouldn't expose them to user code if we don't need to. llvm-svn: 333064	2018-05-23 06:31:36 +00:00
Craig Topper	39e0347e6a	[X86] In the floating point max reduction intrinsics, negate infinity before feeding it to set1. Previously we negated the whole vector after splatting infinity. But its better to negate the infinity before splatting. This generates IR with the negate already folded with the infinity constant. llvm-svn: 333062	2018-05-23 05:51:52 +00:00
Craig Topper	f2043b08b4	[X86] Remove mask argument from more builtins that are handled completely in CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead. llvm-svn: 333061	2018-05-23 04:51:54 +00:00
Craig Topper	25caca72f4	[X86] As mentioned in post-commit feedback in D47174, move the 128 bit f16c intrinsics into f16cintrin.h and remove __emmintrin_f16c.h These were included in emmintrin.h to match Intel Intrinsics Guide documentation. But this is because icc is capable of emulating them on targets that don't support F16C using library calls. Clang/LLVM doesn't have this emulation support. So it makes more sense to include them in immintrin.h instead. I've left a comment behind to hopefully deter someone from trying to move them again in the future. llvm-svn: 333033	2018-05-22 22:19:19 +00:00
Craig Topper	8e3689c066	[X86] Remove mask argument from some builtins that are handled completely in CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead. llvm-svn: 333027	2018-05-22 20:48:24 +00:00
Craig Topper	99be40c363	[X86] Another attempt at fixing the intrinsic module map for rr333014. llvm-svn: 333026	2018-05-22 20:48:20 +00:00
Craig Topper	1fceff99d8	[X86] Add two missing #endif directives to immintrin.h that should have been in r333014. llvm-svn: 333023	2018-05-22 20:33:04 +00:00
Craig Topper	a82ee182d4	[X86] Add __emmintrin_f16c.h to module map and CMakeLists. I missed this in r333014 llvm-svn: 333020	2018-05-22 20:19:05 +00:00
Craig Topper	34c8c0d858	[X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h Intel documents the 128-bit versions as being in emmintrin.h and the 256-bit version as being in immintrin.h. This patch makes a new __emmtrin_f16c.h to hold the 128-bit versions to be included from emmintrin.h. And makes the existing f16cintrin.h contain the 256-bit versions and include it from immintrin.h with an error if its included directly. Differential Revision: https://reviews.llvm.org/D47174 llvm-svn: 333014	2018-05-22 18:54:19 +00:00
Craig Topper	d97a95ae2c	[X86] Prevent inclusion of __wmmintrin_aes.h and __wmmintrin_pclmul.h without including wmmintrin.h llvm-svn: 332929	2018-05-22 02:02:13 +00:00
Craig Topper	842171de36	[X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics. I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions. We already do something similar for scalar conversions. Differential Revision: https://reviews.llvm.org/D46863 llvm-svn: 332882	2018-05-21 20:19:17 +00:00
Craig Topper	092d42557b	[X86] Remove some preprocessor feature checks from intrinsic headers Summary: These look to be a couple things that weren't removed when we switched to target attribute. The popcnt makes including just smmintrin.h also include popcntintrin.h. The popcnt file itself already contains target attrributes. The prefetch ones are just wrappers around __builtin_prefetch which we have graceful fallbacks for in the backend if the exact instruction isn't available. So there's no reason to hide them. And it makes them available in functions that have the write target attribute but not a -march command line flag. Reviewers: echristo, RKSimon, spatel, DavidKreitzer Reviewed By: echristo Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47029 llvm-svn: 332830	2018-05-21 06:07:49 +00:00
Craig Topper	55b4067350	[X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead. Someday maybe we'll use selects for all the builtins. llvm-svn: 332825	2018-05-20 23:34:10 +00:00
Craig Topper	b809fc3d63	[X86] Fix a bad cast from mask16 to mask8 in _mm256_mask_cvtepi16_epi8 introduced in r332266. llvm-svn: 332738	2018-05-18 17:18:46 +00:00
Justin Lebar	c70121b99b	[CUDA] Make std::min/max work when compiling in C++14 mode with a C++11 stdlib. Reviewers: rsmith Subscribers: sanjoy, cfe-commits, tra Differential Revision: https://reviews.llvm.org/D46993 llvm-svn: 332619	2018-05-17 16:12:42 +00:00
Craig Topper	9d146bbaf7	[X86] Revert part of r332266: Use __builtin_convertvector to replace some of the avx512 truncate builtins. The masking doesn't work right in the backend for the ones that produce byte or word elements without avx512bw. llvm-svn: 332322	2018-05-15 03:17:52 +00:00
Craig Topper	25de41cfbc	[X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins. As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend. Differential Revision: https://reviews.llvm.org/D46742 llvm-svn: 332266	2018-05-14 17:50:40 +00:00
Craig Topper	8cb261e353	[X86] Use select instrution and fpextend in the implementation of _mm512_mask_cvtps_pd and _mm512_maskz_cvtps_pd. llvm-svn: 332213	2018-05-14 04:57:46 +00:00
Craig Topper	daaf105f86	[X86] Use __builtin_convertvector to implement _mm512_cvtps_pd. If we're using default rounding mode we can let __builtin_convertvector to generate an fpextend. This matches 128 and 256 bit. If we're using the version that takes an explicit rounding mode argument we would need to look at the immediate to see if its CUR_DIRECTION. llvm-svn: 332210	2018-05-14 04:05:06 +00:00
Craig Topper	6fa91254e4	[X86] Emit better code for _mm_cvtu32_sd, _mm_cvtu64_sd, _mm_cvtu32_ss, and _mm_cvtu64_ss. We can use direct C code for these that will use uitofp and insertelement instructions. For the versions that take an explicit rounding mode we can't do this. llvm-svn: 332203	2018-05-13 23:03:30 +00:00
Craig Topper	65ef3280b8	[X86] Fix the file header name on fmaintrin.h llvm-svn: 332108	2018-05-11 17:37:40 +00:00
Gabor Buella	9cd4f16601	[X86] Assume alignment of movdir64b dst argument Reviewers: craig.topper Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D46683 llvm-svn: 332091	2018-05-11 14:22:04 +00:00
Gabor Buella	3a7571259e	[X86] ptwrite intrinsic Reviewers: craig.topper, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D46540 llvm-svn: 331962	2018-05-10 07:28:54 +00:00
Craig Topper	74ac0eda68	[X86] Change the implementation of scalar masked load/store intrinsics to not use a 512-bit intermediate vector. This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs. Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well. llvm-svn: 331958	2018-05-10 05:43:43 +00:00
Adrian Prantl	9fc8faf9e6	Remove \brief commands from doxygen comments. This is similar to the LLVM change https://reviews.llvm.org/D46290. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46320 llvm-svn: 331834	2018-05-09 01:00:01 +00:00
Gabor Buella	5e52fa9035	[x86] Introduce the encl[u\|s\|v] intrinsics Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D46435 llvm-svn: 331743	2018-05-08 07:12:34 +00:00
Gabor Buella	b0f310d51d	[x86] Introduce the pconfig intrinsic Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D46431 llvm-svn: 331740	2018-05-08 06:49:41 +00:00
Craig Topper	66ef4185bc	[X86] Make _mm256_gf2p8mul_epi8 require avx features since its 256 bits. Without this we throw an error on the header file instead of the user code when the right features aren't enabled in clang. Rename the other DEFAULT_FN_ATTRS defines to _Z for 512-bit since I used _Y for this case. llvm-svn: 331682	2018-05-07 21:47:11 +00:00
Craig Topper	934f86a848	[X86] Fix some inconsistent formatting in the first line of our intrinsics headers. Some were too long and some were too short. llvm-svn: 331559	2018-05-04 21:45:25 +00:00
Volodymyr Sapsai	2d77119f72	Revert "Emit an error when mixing <stdatomic.h> and <atomic>" It reverts r331378 as it caused test failures ThreadSanitizer-x86_64 :: Darwin/gcd-groups-destructor.mm ThreadSanitizer-x86_64 :: Darwin/libcxx-shared-ptr-stress.mm ThreadSanitizer-x86_64 :: Darwin/xpc-race.mm Only clang part of the change is reverted, libc++ part remains as is because it emits error less aggressively. llvm-svn: 331392	2018-05-02 19:52:07 +00:00
Volodymyr Sapsai	c0a278aada	Emit an error when mixing <stdatomic.h> and <atomic> Atomics in C and C++ are incompatible at the moment and mixing the headers can result in confusing error messages. Emit an error explicitly telling about the incompatibility. Introduce the macro `__ALLOW_STDC_ATOMICS_IN_CXX__` that allows to choose in C++ between C atomics and C++ atomics. rdar://problem/27435938 Reviewers: rsmith, EricWF, mclow.lists Reviewed By: mclow.lists Subscribers: jkorous-apple, christof, bumblebritches57, JonChesterfield, smeenai, cfe-commits Differential Revision: https://reviews.llvm.org/D45470 llvm-svn: 331378	2018-05-02 17:50:43 +00:00
Gabor Buella	a51e0c2243	[X86] directstore and movdir64b intrinsics Reviewers: spatel, craig.topper, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45984 llvm-svn: 331249	2018-05-01 10:05:42 +00:00
Craig Topper	e95bde33df	[X86] Add support for _mm512_mullox_epi64 and _mm512_mask_mullox_epi64 intrinsics to match icc. On AVX512F targets we'll produce an emulated sequence using 3 pmuludqs with shifts and adds. On AVX512DQ we'll use vpmulld. Fixes PR37140. llvm-svn: 330923	2018-04-26 05:38:39 +00:00
Artem Belevich	3cce307799	[CUDA] Enable CUDA compilation with CUDA-9.2 Differential Revision: https://reviews.llvm.org/D45827 llvm-svn: 330753	2018-04-24 18:23:19 +00:00
Craig Topper	5f1d10e26e	[X86] Add recently added intrinsic headers to the module map. llvm-svn: 330744	2018-04-24 17:40:49 +00:00
Craig Topper	bd16b11255	[X86] Consistently use double underscore at the beginning of the include guards in our intrinsic headers. Most files used double underscore, but a few used single. This converges them all to double. llvm-svn: 330743	2018-04-24 17:40:47 +00:00
Craig Topper	ce281a41b5	[X86] Remove '#ifdef __x86_64__' around mask_set1_epi64 intrinsics. The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either. llvm-svn: 330681	2018-04-24 03:36:08 +00:00
Gabor Buella	eba6c42e66	[X86] WaitPKG intrinsics Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45254 llvm-svn: 330463	2018-04-20 18:44:33 +00:00
Artem Belevich	5832eb4cfd	[CUDA] added missing __ldg(const signed char *) Differential Revision: https://reviews.llvm.org/D45780 llvm-svn: 330280	2018-04-18 18:33:43 +00:00
Gabor Buella	b220dd2b6c	[X86] Introduce cldemote intrinsic Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45257 llvm-svn: 329993	2018-04-13 07:37:24 +00:00
Gabor Buella	e708a09e21	[X86] Introduce wbinvd intrinsic A previously missing intrinsic for an old instruction. Reviewers: craig.topper, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45311 llvm-svn: 329937	2018-04-12 18:42:02 +00:00
Gabor Buella	a052016ef2	[x86] wbnoinvd intrinsic The WBNOINVD instruction writes back all modified cache lines in the processor’s internal cache to main memory but does not invalidate (flush) the internal caches. Reviewers: craig.topper, zvi, ashlykov Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D43817 llvm-svn: 329848	2018-04-11 20:09:09 +00:00
Craig Topper	dcdac965f1	[X86] Fix typo in intrinsic header file __mask16->__mmask16 from r329775. llvm-svn: 329777	2018-04-11 05:17:14 +00:00
Craig Topper	2575454fe9	[X86] Replace 512-bit masked pmaddubsw and pmaddwd intrinsic with unmasked intrinsic and a select. This makes it consistent with the 128/256-bit functions. Someday maybe we'll have all the masking moved to selects. llvm-svn: 329775	2018-04-11 04:55:10 +00:00
Alexander Kornienko	2a8c18d991	Fix typos in clang Found via codespell -q 3 -I ../clang-whitelist.txt Where whitelist consists of: archtype cas classs checkk compres definit frome iff inteval ith lod methode nd optin ot pres statics te thru Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few files that have dubious fixes reverted.) Differential revision: https://reviews.llvm.org/D44188 llvm-svn: 329399	2018-04-06 15:14:32 +00:00
Douglas Yung	17d2ef90e0	[DOXYGEN] Fix doxygen and content issues in mmintrin.h - Fix instruction mappings/listings for various intrinsics This patch was made by Craig Flores Differential Revision: https://reviews.llvm.org/D41517 llvm-svn: 327090	2018-03-09 00:38:51 +00:00
Craig Topper	260ed8647a	[X86] Fix typo in cpuid.h, bit_AVX51SER->bit_AVX512ER. llvm-svn: 326807	2018-03-06 16:06:44 +00:00
Alexander Ivchenko	9d3b45301f	[x86][CET] Introduce _get_ssp, _inc_ssp intrinsics Summary: The _get_ssp intrinsic can be used to retrieve the shadow stack pointer, independent of the current arch -- in contract with the rdsspd and the rdsspq intrinsics. Also, this intrinsic returns zero on CPUs which don't support CET. The rdssp[d\|q] instruction is decoded as nop, essentially just returning the input operand, which is zero. Example result of compilation: ``` xorl %eax, %eax movl %eax, %ecx rdsspq %rcx # NOP when CET is not supported movq %rcx, %rax # return zero ``` Reviewers: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D43814 llvm-svn: 326689	2018-03-05 11:30:28 +00:00
Craig Topper	21f66a3f6b	[X86] Remove some masked cvt builtins that can be replaced with legacy sse/avx buiiltins and a select. llvm-svn: 326039	2018-02-24 18:55:13 +00:00
Craig Topper	5dc6ca8e5b	[X86] Remove __builtin_ia32_permvarsf256_mask and __builtin_ia32_permvarsi256_mask and use the avx2 unmasked versions and a select instead. llvm-svn: 326022	2018-02-24 06:46:42 +00:00
Artem Belevich	df38f155ec	[CUDA] Added missing functions. Initial commit missed sincos(float), llabs() and few atomics that we used to pull in from device_functions.hpp, which we no longer include. Differential Revision: https://reviews.llvm.org/D43602 llvm-svn: 325814	2018-02-22 18:40:52 +00:00
Artem Belevich	4dbea99137	[CUDA] Added missing __threadfence_system() function for CUDA9. llvm-svn: 325626	2018-02-20 21:25:30 +00:00
Craig Topper	0a70c3c7af	[X86] Remove mask from 512 bit pmulhrsw/pmulhw/pmulhuw builtins. We now use a vselect node in IR around an unmasked builtin. This makes it consistent with the 128 and 256 bit versions. llvm-svn: 325560	2018-02-20 07:28:18 +00:00
Ekaterina Romanova	f28751849e	[DOXYGEN] There was a request in the review D41507 to change the notation for hex numbers in doxygen documentation from <...>h to 0x<...>. Both of these notations were used in x86 intrinsics documentation. I promised to change them to 0x<...> for consistency. Differential Revision: https://reviews.llvm.org/D41888 llvm-svn: 325312	2018-02-16 03:11:35 +00:00
Artem Belevich	fbc56a904f	[CUDA] Added partial support for CUDA-9.1 Clang can use CUDA-9.1 now, though new APIs (are not implemented yet. The major change is that headers in CUDA-9.1 went through substantial changes that started in CUDA-9.0 which required substantial changes in the cuda compatibility headers provided by clang. There are two major issues: * CUDA SDK no longer provides declarations for libdevice functions. * A lot of device-side functions have become nvcc's builtins and CUDA headers no longer contain their implementations. This patch changes the way CUDA headers are handled if we compile with CUDA 9.x. Both 9.0 and 9.1 are affected. * Clang provides its own declarations of libdevice functions. * For CUDA-9.x clang now provides implementation of device-side 'standard library' functions using libdevice. This patch should not affect compilation with CUDA-8. There may be some observable differences for CUDA-9.0, though they are not expected to affect functionality. Tested: CUDA test-suite tests for all supported combinations of: CUDA: 7.0,7.5,8.0,9.0,9.1 GPU: sm_20, sm_35, sm_60, sm_70 Differential Revision: https://reviews.llvm.org/D42513 llvm-svn: 323713	2018-01-30 00:00:12 +00:00
Hiroshi Inoue	1019f8a98e	[NFC] fix trivial typos in comments "to to" -> "to" llvm-svn: 323627	2018-01-29 05:15:18 +00:00
Craig Topper	8cdb94901d	[X86] Add rdpid command line option and intrinsics. Summary: This patch adds -mrdpid/-mno-rdpid and the rdpid intrinsic. The corresponding LLVM commit has already been made. Reviewers: RKSimon, spatel, zvi, AndreiGrischenko Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D42272 llvm-svn: 323047	2018-01-20 18:36:52 +00:00
Abderrazek Zaafrani	ce8746d178	[AArch64] Add ARMv8.2-A FP16 scalar intrinsics https://reviews.llvm.org/D41792 llvm-svn: 323006	2018-01-19 23:11:18 +00:00
Douglas Yung	46474dae4d	[DOXYGEN] Fix doxygen and content issues in xmmintrin.h - Fix inaccurate instruction listings. - Fix small issues in _mm_getcsr and _mm_setcsr. - Fix description of NaN handling in comparison intrinsics. - Fix inaccurate description of _mm_movemask_pi8. - Fix inaccurate instruction mappings. - Fix typos. - Clarify wording on some descriptions. - Fix bit ranges in return value. - Fix typo in _mm_move_ms intrinsic instruction since it operates on singe-precision values, not double. - This patch was made by Craig Flores Differential Revision: https://reviews.llvm.org/D41523 llvm-svn: 322778	2018-01-17 22:53:15 +00:00
Craig Topper	f517f1a516	[X86] Implement old kunpck intrinsics using vector ops on vXi1 instead of integer shift/and/or Summary: kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation. I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations. Reviewers: RKSimon, spatel, zvi, jina.nahias Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D42016 llvm-svn: 322461	2018-01-14 19:23:50 +00:00
Sven van Haastregt	774355e321	[OpenCL] Reorder the CLK_sRGBx/sRGBA defines, NFC Swap them so that all channel order defines are ordered according to their values. llvm-svn: 322278	2018-01-11 14:05:38 +00:00
Douglas Yung	7ff91421b4	[DOXYGEN] Fix doxygen and content issues in avxintrin.h - Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed". - Fix a few typos and errors found during review. - Restore new line endings. This patch was made by Craig Flores llvm-svn: 322027	2018-01-08 21:21:17 +00:00
Douglas Yung	4c549c31bb	[DOXYGEN] Fix doxygen and content issues in smmintrin.h - Fix formatting issue due to hyphenated terms at line breaks. - Fix typo This patch was made by Craig Flores Differential Revision: https://reviews.llvm.org/D41520 llvm-svn: 321671	2018-01-02 20:45:29 +00:00
Douglas Yung	df1e9ef156	[DOXYGEN] Fix doxygen and content issues in pmmintrin.h - Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed". This patch was made by Craig Flores Differential Revision: https://reviews.llvm.org/D41518 llvm-svn: 321670	2018-01-02 20:42:53 +00:00
Douglas Yung	0686df106c	[DOXYGEN] Fix doxygen and content issues in emmintrin.h - Fixed innaccurate instruction mappings for various intrinsics. - Fixed description of NaN handling in comparison intrinsics. - Unify description of _mm_store_pd1 to match _mm_store1_pd. - Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed". - Fix typos. - Add missing italics command (\a) for params and fixed some parameter spellings. This patch was made by Craig Flores Differential Revision: https://reviews.llvm.org/D41516 llvm-svn: 321669	2018-01-02 20:39:29 +00:00
Coby Tayree	a09663a5c1	[x86][icelake][vbmi2] added vbmi2 feature recognition added intrinsics support for vbmi2 instructions _mm[128,256,512]_mask[z]_compress_epi[16,32] _mm[128,256,512]_mask_compressstoreu_epi[16,32] _mm[128,256,512]_mask[z]_expand_epi[16,32] _mm[128,256,512]_mask[z]_expandloadu_epi[16,32] _mm[128,256,512]_mask[z]_sh[l,r]di_epi[16,32,64] _mm[128,256,512]_mask_sh[l,r]dv_epi[16,32,64] matching a similar work on the backend (D40206) Differential Revision: https://reviews.llvm.org/D41557 llvm-svn: 321487	2017-12-27 11:25:07 +00:00
Coby Tayree	3d9c88cfec	[x86][icelake][vnni] added vnni feature recognition added intrinsics support for VNNI instructions _mm256_mask_dpbusd_epi32 _mm256_maskz_dpbusd_epi32 _mm256_dpbusd_epi32 _mm256_mask_dpbusds_epi32 _mm256_maskz_dpbusds_epi32 _mm256_dpbusds_epi32 _mm256_mask_dpwssd_epi32 _mm256_maskz_dpwssd_epi32 _mm256_dpwssd_epi32 _mm256_mask_dpwssds_epi32 _mm256_maskz_dpwssds_epi32 _mm256_dpwssds_epi32 _mm128_mask_dpbusd_epi32 _mm128_maskz_dpbusd_epi32 _mm128_dpbusd_epi32 _mm128_mask_dpbusds_epi32 _mm128_maskz_dpbusds_epi32 _mm128_dpbusds_epi32 _mm128_mask_dpwssd_epi32 _mm128_maskz_dpwssd_epi32 _mm128_dpwssd_epi32 _mm128_mask_dpwssds_epi32 _mm128_maskz_dpwssds_epi32 _mm128_dpwssds_epi32 _mm512_mask_dpbusd_epi32 _mm512_maskz_dpbusd_epi32 _mm512_dpbusd_epi32 _mm512_mask_dpbusds_epi32 _mm512_maskz_dpbusds_epi32 _mm512_dpbusds_epi32 _mm512_mask_dpwssd_epi32 _mm512_maskz_dpwssd_epi32 _mm512_dpwssd_epi32 _mm512_mask_dpwssds_epi32 _mm512_maskz_dpwssds_epi32 _mm512_dpwssds_epi32 matching a similar work on the backend (D40208) Differential Revision: https://reviews.llvm.org/D41558 llvm-svn: 321484	2017-12-27 10:37:51 +00:00
Coby Tayree	2268576fa0	[x86][icelake][bitalg] added bitalg feature recognition added intrinsics support for bitalg instructions _mm512_popcnt_epi16 _mm512_mask_popcnt_epi16 _mm512_maskz_popcnt_epi16 _mm512_popcnt_epi8 _mm512_mask_popcnt_epi8 _mm512_maskz_popcnt_epi8 _mm512_mask_bitshuffle_epi64_mask _mm512_bitshuffle_epi64_mask _mm256_popcnt_epi16 _mm256_mask_popcnt_epi16 _mm256_maskz_popcnt_epi16 _mm128_popcnt_epi16 _mm128_mask_popcnt_epi16 _mm128_maskz_popcnt_epi16 _mm256_popcnt_epi8 _mm256_mask_popcnt_epi8 _mm256_maskz_popcnt_epi8 _mm128_popcnt_epi8 _mm128_mask_popcnt_epi8 _mm128_maskz_popcnt_epi8 _mm256_mask_bitshuffle_epi32_mask _mm256_bitshuffle_epi32_mask _mm128_mask_bitshuffle_epi16_mask _mm128_bitshuffle_epi16_mask matching a similar work on the backend (D40222) Differential Revision: https://reviews.llvm.org/D41564 llvm-svn: 321483	2017-12-27 10:01:00 +00:00
Coby Tayree	cf96c876c6	[x86][icelake][vpclmulqdq] added vpclmulqdq feature recognition added intrinsics support for vpclmulqdq instructions _mm256_clmulepi64_epi128 _mm512_clmulepi64_epi128 matching a similar work on the backend (D40101) Differential Revision: https://reviews.llvm.org/D41573 llvm-svn: 321480	2017-12-27 09:00:31 +00:00
Coby Tayree	f4811ebc39	[x86][icelake][gfni] added gfni feature recognition added intrinsics support for gfni instructions _mm_gf2p8affineinv_epi64_epi8 _mm_mask_gf2p8affineinv_epi64_epi8 _mm_maskz_gf2p8affineinv_epi64_epi8 _mm256_gf2p8affineinv_epi64_epi8 _mm256_mask_gf2p8affineinv_epi64_epi8 _mm256_maskz_gf2p8affineinv_epi64_epi8 _mm512_gf2p8affineinv_epi64_epi8 _mm512_mask_gf2p8affineinv_epi64_epi8 _mm512_maskz_gf2p8affineinv_epi64_epi8 _mm_gf2p8affine_epi64_epi8 _mm_mask_gf2p8affine_epi64_epi8 _mm_maskz_gf2p8affine_epi64_epi8 _mm256_gf2p8affine_epi64_epi8 _mm256_mask_gf2p8affine_epi64_epi8 _mm256_maskz_gf2p8affine_epi64_epi8 _mm512_gf2p8affine_epi64_epi8 _mm512_mask_gf2p8affine_epi64_epi8 _mm512_maskz_gf2p8affine_epi64_epi8 _mm_gf2p8mul_epi8 _mm_mask_gf2p8mul_epi8 _mm_maskz_gf2p8mul_epi8 _mm256_gf2p8mul_epi8 _mm256_mask_gf2p8mul_epi8 _mm256_maskz_gf2p8mul_epi8 _mm512_gf2p8mul_epi8 _mm512_mask_gf2p8mul_epi8 _mm512_maskz_gf2p8mul_epi8 matching a similar work on the backend (D40373) Differential Revision: https://reviews.llvm.org/D41582 llvm-svn: 321477	2017-12-27 08:37:47 +00:00
Coby Tayree	a1e5f0c339	[x86][icelake][vaes] added vaes feature recognition added intrinsics support for vaes instructions, matching a similar work on the backend (D40078) _mm256_aesenc_epi128 _mm512_aesenc_epi128 _mm256_aesenclast_epi128 _mm512_aesenclast_epi128 _mm256_aesdec_epi128 _mm512_aesdec_epi128 _mm256_aesdeclast_epi128 _mm512_aesdeclast_epi128 llvm-svn: 321474	2017-12-27 08:16:54 +00:00
Artem Belevich	3cebc738b6	[CUDA] More fixes for __shfl_* intrinsics. * __shfl_{up,down}* uses unsigned int for the third parameter. * added [unsigned] long overloads for non-sync shuffles. Differential Revision: https://reviews.llvm.org/D41521 llvm-svn: 321326	2017-12-21 23:52:09 +00:00
Craig Topper	170de4b4ba	[X86] Allow _mm_prefetch (both the header implementation and the builtin) to accept bit 2 which is supposed to indicate the prefetched addresses will be written to Add the appropriate _MM_HINT_ET0/ET1 defines to match gcc. llvm-svn: 321325	2017-12-21 23:50:22 +00:00
Craig Topper	54b3f718e4	[X86] Add more CPUID bits to cpuid.h to match gcc and support icelake features. llvm-svn: 321129	2017-12-20 00:46:09 +00:00
Craig Topper	798f2c037c	[X86] Add the two files I forgot to commit in r320915. llvm-svn: 320916	2017-12-16 06:10:24 +00:00
Craig Topper	b846d1ff76	[X86] Add builtins and tests for 128 and 256 bit vpopcntdq. llvm-svn: 320915	2017-12-16 06:02:31 +00:00
Stephan Bergmann	feed26ff07	In stdbool.h, define bool, false, true only in gnu++98 GCC has meanwhile corrected that with the similar <https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=216679> "C++11 explicitly forbids macros for bool, true and false." Differential Revision: https://reviews.llvm.org/D40167 llvm-svn: 320135	2017-12-08 08:28:08 +00:00
Artem Belevich	a659d2590e	[NVPTX,CUDA] Added llvm.nvvm.fns intrinsic and matching __nvvm_fns builtin in clang. Differential Revision: https://reviews.llvm.org/D40872 llvm-svn: 319909	2017-12-06 17:50:05 +00:00
Artem Belevich	4631ef1e43	[CUDA] Added overloads for '[unsigned] long' variants of shfl builtins. Differential Revision: https://reviews.llvm.org/D40871 llvm-svn: 319908	2017-12-06 17:40:35 +00:00
Jina Nahias	eb0829155f	[x86][AVX512] Lowering kunpack intrinsics to LLVM IR This patch, together with a matching llvm patch (https://reviews.llvm.org/D39720), implements the lowering of X86 kunpack intrinsics to IR. Differential Revision: https://reviews.llvm.org/D39719 Change-Id: Id5d3cb394ad33b98be79a6783d1d15569e2b798d llvm-svn: 319777	2017-12-05 15:42:47 +00:00
Shoaib Meenai	669cae1f28	[clang] Use add_llvm_install_targets Use this function to create the install targets rather than doing so manually, which gains us the `-stripped` install targets to perform stripped installations. Differential Revision: https://reviews.llvm.org/D40675 llvm-svn: 319489	2017-11-30 22:35:02 +00:00
Artem Belevich	05914bf482	[CUDA] Tweak CUDA wrappers to make cuda-9 work with libc++ CUDA-9 headers check for specific libc++ version and ifdef out some of the definitions we need if LIBCPP_VERSION >= 3800. Differential Revision: https://reviews.llvm.org/D40198 llvm-svn: 319485	2017-11-30 22:22:21 +00:00
Alexey Sotkin	b833bf6ae1	[OpenCL] Add extensions cl_intel_subgroups and cl_intel_subgroups_short Reviewers: yaxunl, Anastasia, bader Reviewed By: Anastasia, bader Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D39936 llvm-svn: 319011	2017-11-27 09:14:17 +00:00
Oren Ben Simhon	fec21ec0c6	Control-Flow Enforcement Technology - Shadow Stack and Indirect Branch Tracking support (Clang side) Shadow stack solution introduces a new stack for return addresses only. The stack has a Shadow Stack Pointer (SSP) that points to the last address to which we expect to return. If we return to a different address an exception is triggered. This patch includes shadow stack intrinsics as well as the corresponding CET header. It includes CET clang flags for shadow stack and Indirect Branch Tracking. For more information, please see the following: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf Differential Revision: https://reviews.llvm.org/D40224 Change-Id: I79ad0925a028bbc94c8ecad75f6daa2f214171f1 llvm-svn: 318995	2017-11-26 12:34:54 +00:00
Craig Topper	9e032ed55a	[X86] Use separate builtins for fma4 scalar intrinsics. Use negations to remove some of the scalar fma3 builtins. fma4 instructions zero the upper bits of the xmm register. fma3 instructions leave the bits unmodified. This requires separate builtins for the different semantics. While we're cleaning up the scalar builtins this also removes the fma3 fmsub/fnmadd/fnmsub builtins by using negates in the header file. llvm-svn: 318985	2017-11-25 19:32:12 +00:00
Justin Lebar	370c766e40	[CUDA] Remove implementations of nexttoward. Summary: __builtin_nexttoward lowers to a libcall, e.g. nexttowardf(), that CUDA does not have. Rather than try to implement it, we simply remove these functions -- nvcc doesn't support them either, and nextafter, which does work, does essentially the same thing on GPUs, because GPUs don't have long double. Reviewers: tra Subscribers: cfe-commits, sanjoy Differential Revision: https://reviews.llvm.org/D40152 llvm-svn: 318494	2017-11-17 01:15:43 +00:00
Uriel Korach	5b2b71d909	[X86] test/testn intrinsics lowering to IR. clang side Change Header files of the intrinsics for lowering test and testn intrinsics to IR code. Removed test and testn builtins from clang Differential Revision: https://reviews.llvm.org/D38737 llvm-svn: 318035	2017-11-13 12:50:52 +00:00
Jina Nahias	dca979194d	[x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IR This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38672 Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047 llvm-svn: 318025	2017-11-13 09:15:31 +00:00
Justin Lebar	7c56dfe441	[CUDA] Fix std::min on device side to return the min, not the max. Summary: How embarrassing. This is tested in the test-suite -- fix to come there in a separate patch. Reviewers: tra Subscribers: sanjoy, cfe-commits Differential Revision: https://reviews.llvm.org/D39817 llvm-svn: 317961	2017-11-11 01:25:44 +00:00
Craig Topper	b3d447356f	[X86] Reduce the number of FMA builtins needed by the frontend by adding negates to operands of the fmadd and fmaddsub builtins. The backend should be able to combine the negates to create fmsub, fnmadd, and fnmsub. faddsub converting to fsubadd still needs work I think, but should be very doable. This matches what we already do for the masked builtins. This only covers the packed builtins. Scalar builtins will be done after FMA4 is fixed. llvm-svn: 317873	2017-11-10 05:20:32 +00:00
Craig Topper	e5b84ec2a1	[X86] Rename the VEX scalar fma builtins to end with a '3' to match gcc I think we need to use different builtins for the FMA4 instructions since those instructions zero the upper bits and FMA3 instructions pass the bits through. So this moves the existing builtins to be the FMA3 versions. New versions will be added for FMA4. llvm-svn: 317766	2017-11-09 04:10:46 +00:00
Craig Topper	57f96ac6dc	[X86] Replace the mask cmpeq/cmple/cmplt/cmpgt/cmpge/cmpneq intrinsics with macros that just pass the right comparison predicate value to the regular cmp intrinsic. Remove mask cmpeq/cmpgt builtins that are now unused. This shortens the intrinsic headers a little and allows us to get rid of the cmpeq and cmpgt handling from CGBuiltin.cpp. llvm-svn: 317506	2017-11-06 21:00:49 +00:00
Jina Nahias	48e298b8c4	lowering broadcastm Change-Id: I0661abea3e3742860e0a03ff9e4fcdc367eff7db llvm-svn: 317456	2017-11-06 07:04:12 +00:00
Martin Storsjo	5aa613ed2f	[Headers] Fix typoed __ARM_DWARF_EH__ ifdefs These typos appeared in SVN r309226 and r309327. llvm-svn: 316149	2017-10-19 07:40:45 +00:00
Craig Topper	89cd7533f7	[X86] Add CLWB intrinsic. clang part Reviewers: RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D38781 llvm-svn: 315607	2017-10-12 18:57:15 +00:00
Craig Topper	189576f80e	[X86] Correct type for argument to clflushopt intrinsic. Summary: According to Intel docs this should take void const . We had char. The lack of const is the main issue. Reviewers: RKSimon, zvi, igorb Reviewed By: igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38782 llvm-svn: 315470	2017-10-11 16:06:08 +00:00
Jonas Hahnfeld	f21a60233c	[CUDA] Fix name of __activemask() The name has two underscores in the official CUDA documentation: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-vote-functions Differential Revision: https://reviews.llvm.org/D38468 llvm-svn: 314691	2017-10-02 17:50:11 +00:00
Artem Belevich	93e33f8fb3	[CUDA] Work around conflicting function definitions in CUDA-9 headers. Differential Revision: https://reviews.llvm.org/D38326 llvm-svn: 314334	2017-09-27 19:07:15 +00:00
Artem Belevich	bab95c7087	[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins. Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314223	2017-09-26 17:07:23 +00:00
Justin Lebar	d31d5e6aa2	Revert "[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.", rL314135. Causing assertion failures on macos: > Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"), > function getOperand, file > /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h, > line 835. http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/ llvm-svn: 314142	2017-09-25 19:41:56 +00:00
Artem Belevich	9941ee9529	[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins. Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314135	2017-09-25 18:53:57 +00:00
Artem Belevich	4d80105792	[CUDA] Fix names of __nvvm_vote* intrinsics. Also fixed a syntax error in activemask(). Differential Revision: https://reviews.llvm.org/D38188 llvm-svn: 314129	2017-09-25 17:55:26 +00:00
Jina Nahias	123c599a0f	fixing a bug in mask[z]_set1 intrinsic Differential Revision: https://reviews.llvm.org/D38231 Change-Id: I80bbff9cbe93e4be54d8a761ef9723edf3f57c57 llvm-svn: 314102	2017-09-25 13:38:08 +00:00
Artem Belevich	b542f1f3df	[CUDA] Fixed order of words in the names of shfl builtins. Differential Revision: https://reviews.llvm.org/D38147 llvm-svn: 313899	2017-09-21 18:46:39 +00:00
Artem Belevich	42960b4188	[NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} instructions/intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38148 llvm-svn: 313898	2017-09-21 18:44:49 +00:00
Artem Belevich	4654dc89be	[NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38090 llvm-svn: 313820	2017-09-20 21:23:07 +00:00
Jina Nahias	3ad702a1ed	Lowering Mask Set1 intrinsics to LLVM IR This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37668 llvm-svn: 313624	2017-09-19 11:00:27 +00:00
Craig Topper	04370d3a82	[X86] Disable _mm512_maskz_set1_epi64 intrinsic on 32-bit targets to prevent a backend isel failure. The __builtin_ia32_pbroadcastq512_mem_mask we were previously trying to use in 32-bit mode is not implemented in the x86 backend and causes isel to fail in release builds. In debug builds it fails even earlier during legalization with an llvm_unreachable. While there add the missing test case for this intrinsic for this for 64-bit mode. This fixes PR34631. D37668 should be able to recover this for 32-bit mode soon. But I wanted to fix the crash ahead of that. llvm-svn: 313392	2017-09-15 20:27:59 +00:00
Artem Belevich	9d0052160f	[CUDA] Work around a new quirk in CUDA9 headers. In CUDA-9 some of device-side math functions that we need are conditionally defined within '#if _GLIBCXX_MATH_H'. We need to temporarily undo the guard around inclusion of math_functions.hpp. Differential Revision: https://reviews.llvm.org/D37906 llvm-svn: 313369	2017-09-15 17:30:53 +00:00
Martin Storsjo	0fd7c5ccd6	[Headers] Fix the return type of _InterlockedCompareExchange_rel This was a typo in SVN r282447, where it was added. llvm-svn: 313232	2017-09-14 07:04:59 +00:00
Sjoerd Meijer	c05609ca36	This adds the _Float16 preprocessor macro definitions. Differential Revision: https://reviews.llvm.org/D34695 llvm-svn: 313152	2017-09-13 15:23:19 +00:00
Yael Tsafrir	23e7733230	[X86] Lower _mm[256\|512]_[mask[z]]_avg_epu[8\|16] intrinsics to native llvm IR Differential Revision: https://reviews.llvm.org/D37562 llvm-svn: 313011	2017-09-12 07:46:32 +00:00
Artem Belevich	8af4e23d1e	[CUDA] Added rudimentary support for CUDA-9 and sm_70. For now CUDA-9 is not included in the list of CUDA versions clang searches for, so the path to CUDA-9 must be explicitly passed via --cuda-path=. On LLVM side NVPTX added sm_70 GPU type which bumps required PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment. Differential Revision: https://reviews.llvm.org/D37576 llvm-svn: 312734	2017-09-07 18:14:32 +00:00
Justin Lebar	3310888aec	[CUDA] Add device overloads for non-placement new/delete. Summary: Tests have to live in the test-suite, and so will come in a separate patch. Fixes PR34360. Reviewers: tra Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D37539 llvm-svn: 312681	2017-09-07 00:37:20 +00:00
Simon Pilgrim	1ba2bf2162	[X86][AVX512] _mm512_stream_load_si512 should take a void const* argument (PR33977) Based off the Intel Intrinsics guide, we should expect a void const* argument. Prevents 'passing 'const void ' to parameter of type 'void ' discards qualifiers' warnings. Differential Revision: https://reviews.llvm.org/D37449 llvm-svn: 312523	2017-09-05 10:06:41 +00:00
Craig Topper	5ece4cfe1e	[X86] Implement broadcastf32x2 and broadcasti32x2 intrinsics using __builtin_shufflevector instead builtins This patch implements the broadcastf32x2/broadcasti32x2 intrinsics using __builtin_shufflevector. Differential Revision: https://reviews.llvm.org/D37287 llvm-svn: 312135	2017-08-30 16:15:12 +00:00
Saleem Abdulrasool	65101adb16	Headers: explicitly specify double-word alignment GCC will interpret `__attribute__((__aligned__))` as 8-byte alignment on ARM, but clang will not. Explicitly specify the alignment. This mirrors the declaration in libunwind. llvm-svn: 311576	2017-08-23 16:57:55 +00:00
Saleem Abdulrasool	75cfabef35	Headers: give _Unwind_Control_Block double-word alignment The C++ ABI requires that the exception object (which under AEABI is the `_Unwind_Control_Block`) is double-word aligned. The attribute was applied to the `_Unwind_Exception` type, but not the `_Unwind_Control_Block`. This should fix the libunwind test for the alignment of the exception type. llvm-svn: 311563	2017-08-23 15:35:33 +00:00
Yaxun Liu	a3c3d7b442	[OpenCL] Remove extra select functions from opencl-c.h OpenCL spec v2.0 s6.13.6: gentype select (gentype a, gentype b, igentype c) gentype select (gentype a, gentype b, ugentype c) igentype and ugentype must have the same number of elements and bits as gentype. Differential Revision: https://reviews.llvm.org/D36259 llvm-svn: 310160	2017-08-05 02:23:47 +00:00
Yaxun Liu	39195062c2	Add OpenCL 2.0 atomic builtin functions as Clang builtin OpenCL 2.0 atomic builtin functions have a scope argument which is ideally represented as synchronization scope argument in LLVM atomic instructions. Clang supports translating Clang atomic builtin functions to LLVM atomic instructions. However it currently does not support synchronization scope of LLVM atomic instructions. Without this, users have to use LLVM assembly code to implement OpenCL atomic builtin functions. This patch adds OpenCL 2.0 atomic builtin functions as Clang builtin functions, which supports generating LLVM atomic instructions with synchronization scope operand. Currently only constant memory scope argument is supported. Support of non-constant memory scope argument will be added later. Differential Revision: https://reviews.llvm.org/D28691 llvm-svn: 310082	2017-08-04 18:16:31 +00:00
Bruno Cardoso Lopes	d89a1eb4fb	[Headers][Darwin] Allow #include_next<float.h> to work on Darwin prior to 10.7 This fixes PR31504 and it's a follow up from adding #include_next<float.h> for Darwin in r289018. rdar://problem/29856682 llvm-svn: 309752	2017-08-01 22:10:36 +00:00
Simon Pilgrim	c14865c0c5	[X86][AVX] Ensure vector non-temporal load/store intrinsics force pointer alignment (PR33830) Clang specifies a max type alignment of 16 bytes on darwin targets (annoyingly in the driver not via cc1), meaning that the builtin nontemporal stores don't correctly align the loads/stores to 32 or 64 bytes when required, resulting in lowering to temporal unaligned loads/stores. This patch casts the vectors to explicitly aligned types prior to the load/store to ensure that the require alignment is respected. Differential Revision: https://reviews.llvm.org/D35996 llvm-svn: 309488	2017-07-29 15:33:34 +00:00
Simon Pilgrim	0b37ffbbf9	Strip trailing whitespace. NFCI. llvm-svn: 309383	2017-07-28 14:01:51 +00:00
Saleem Abdulrasool	b5eca2f9a2	Headers: fix _Unwind_{G,S}etGR for non-EHABI targets The EHABI definition was being inlined into the users even when EHABI was not in use. Adjust the condition to ensure that the right version is defined. llvm-svn: 309327	2017-07-27 21:56:25 +00:00
Saleem Abdulrasool	9c13bbe953	Headers: improve ARM EHABI coverage of unwind.h Ensure that we define the `_Unwind_Control_Block` structure used on ARM EHABI targets. This is needed for building libc++abi with the unwind.h from the resource dir. A minor fallout of this is that we needed to create a typedef for _Unwind_Exception to work across ARM EHABI and non-EHABI targets. The structure definitions here are based originally on the documentation from ARM under the "Exception Handling ABI for the ARM® Architecture" Section 7.2. They are then adjusted to more closely reflect the definition in libunwind from LLVM. Those changes are compatible in layout but permit easier use in libc++abi and help maintain compatibility between libunwind and the compiler provided definition. llvm-svn: 309226	2017-07-26 22:55:23 +00:00
Mandeep Singh Grang	79249e1be7	[clang] Add ARM64 support to armintr.h for MSVC compatibility Summary: This fixes compiling with headers from the Windows SDK for ARM64. Reviewers: compnerd, ruiu, mstorsjo Reviewed By: compnerd, mstorsjo Subscribers: mgorny, aemerson, javed.absar, kristof.beyls, llvm-commits, cfe-commits Differential Revision: https://reviews.llvm.org/D35862 llvm-svn: 309081	2017-07-26 05:29:40 +00:00
Ulrich Weigand	6af2559562	[SystemZ] Add support for IBM z14 processor (3/3) This patch updates the vecintrin.h header file to provide the new set of high-level vector built-in functions. This matches the updated definition implemented by other compilers for the platform, indicated by the pre-defined macro __VEC__ == 10302. Note that some of the new functions (notably those involving the vector float data type) are only available with -march=z14 (indicated by __ARCH__ == 12). llvm-svn: 308199	2017-07-17 17:47:35 +00:00
Ekaterina Romanova	03ecd774ba	[DOXYGEN] Corrected typos and incorrect parameters description. Corrected several typos and incorrect parameters description that Sony 's techinical writer found during review. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 307838	2017-07-12 20:18:55 +00:00
Zvi Rackover	064f00061b	X86 Intrinsics: _bit_scan_forward should not be under #ifdef __RDRND__ Summary: The _bit_scan_forward and _bit_scan_reverse intrinsics were accidentally masked under the preprocessor checks that prune intrinsics definitions for the benefit of faster compile-time on Windows. This patch moves the definitons out of that region. Fixes pr33722 Reviewers: craig.topper, aaboud, thakis Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D35184 llvm-svn: 307524	2017-07-10 07:13:56 +00:00
Craig Topper	b2f8b311d1	[X86] Add more feature flag bit defines to cpuid.h for gcc compatibility. llvm-svn: 307507	2017-07-09 17:43:11 +00:00
Craig Topper	f6e8408a11	[X86] Add __get_cpuid_count to cpuid.h. Update __get_cpuid to check the maximum level support before accessing the leaf. Rename level to leaf everywhere. This matches gcc behavior. llvm-svn: 307506	2017-07-09 17:43:10 +00:00
Ekaterina Romanova	cb3603a4eb	[DOXYGEN] Corrected several typos and incorrect parameters description that Sony's techinical writer found during review. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 304840	2017-06-06 22:58:01 +00:00
Benjamin Kramer	c796245431	[PPC] Make altivec conversion function macros. The second argument must be a constant, otherwise instruction selection will fail. always_inline is not enough for isel to always fold everything away at -O0. Sadly the overloading turned this into a big macro mess. Fixes PR33212. llvm-svn: 304205	2017-05-30 11:37:29 +00:00
Oren Ben Simhon	140c1fb9ec	[X86] Adding avx512_vpopcntdq feature set and its intrinsics AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the Clang side of the addition of six intrinsics for two new machine instructions (vpopcntd and vpopcntq). It also includes the addition of the new feature set. Differential Revision: https://reviews.llvm.org/D33170 llvm-svn: 303857	2017-05-25 13:44:11 +00:00
Tony Jiang	9aa2c0383d	[PowerPC] Implement vec_xxsldwi builtin. The vec_xxsldwi builtin is missing from altivec.h. This has been requested by developers working on libvpx for VP9 support for Google. The patch fixes PR: https://bugs.llvm.org/show_bug.cgi?id=32653 Differential Revision: https://reviews.llvm.org/D33236 llvm-svn: 303766	2017-05-24 15:54:13 +00:00
Tony Jiang	bbc48e9164	[PowerPC] Implement vec_xxpermdi builtin. The vec_xxpermdi builtin is missing from altivec.h. This has been requested by developers working on libvpx for VP9 support for Google. The patch fixes PR: https://bugs.llvm.org/show_bug.cgi?id=32653 Differential Revision: https://reviews.llvm.org/D33053 llvm-svn: 303760	2017-05-24 15:13:32 +00:00
Ekaterina Romanova	bfc1e3a84e	(1) Fixed mismatch in intrinsics names in declarations and in doxygen comments. (2) Removed uncessary anymore \c commands, since the same effect will be achived by <c> ... </c> sequence. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 303228	2017-05-17 01:46:11 +00:00
Ekaterina Romanova	1d4a0f270c	[DOXYGEN] Minor improvements in doxygen comments. Separated very long brief sections into two sections. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 303031	2017-05-15 03:25:04 +00:00
Egor Churaev	44800c5aba	[OpenCL] Added checking OpenCL version for cl_khr_mipmap_image built-ins Reviewers: Anastasia, cfe-commits Reviewed By: Anastasia Subscribers: bader, yaxunl Differential Revision: https://reviews.llvm.org/D32897 llvm-svn: 302630	2017-05-10 08:23:01 +00:00
Simon Pilgrim	073c4e66b0	[X86][LWP] Remove MSVC LWP intrinsics stubs. Now provided in lwpintrin.h llvm-svn: 302559	2017-05-09 17:50:16 +00:00
Simon Pilgrim	7855510ae3	[X86][LWP] Removing LWP todo comment. NFCI. LWP / lwpintrin.h is now supported llvm-svn: 302557	2017-05-09 17:43:16 +00:00
Simon Pilgrim	3511348dbb	[X86][LWP] Add clang support for LWP instructions. This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Differential Revision: https://reviews.llvm.org/D32770 llvm-svn: 302418	2017-05-08 12:09:45 +00:00
Sam Parker	b9ea36f9c1	[ARM] ACLE Chapter 9 intrinsics Implemented the remaining integer data processing intrinsics from the ARM ACLE v2.1 spec, such as parallel arithemtic and DSP style multiplications. Differential Revision: https://reviews.llvm.org/D32282 llvm-svn: 302131	2017-05-04 08:37:59 +00:00
Simon Pilgrim	96d02f5503	[X86][AVX] Added support for _mm256_zext* helper intrinsics (PR32839) llvm-svn: 301749	2017-04-29 17:17:06 +00:00
Ekaterina Romanova	ea8702d393	[DOXYGEN] Minor improvements in doxygen comments. - I removed doxygen comments for the intrinsics that "alias" the other existing documented intrinsics and that only sligtly differ in spelling (single underscores vs. double underscores). #define _tzcnt_u16(a) (__tzcnt_u16((a))) It will be very hard to keep the documentation for these "aliases" in sync with the documentation for the intrinsics they alias to. Out of sync documentation will be more confusing than no documentation. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 301652	2017-04-28 16:45:39 +00:00
Simon Pilgrim	99ed27053d	[X86][SSE] Add _mm_set_pd1 (PR32827) Matches _mm_set_ps1 implementation llvm-svn: 301637	2017-04-28 10:28:32 +00:00
Duncan P. N. Exon Smith	e77a3aff6f	Headers: Make the type of SIZE_MAX the same as size_t size_t is usually defined as unsigned long, but on 64-bit platforms, stdint.h currently defines SIZE_MAX using "ull" (unsigned long long). Although this is the same width, it doesn't necessarily have the same alignment or calling convention. It also triggers printf warnings when using the format flag "%zu" to print SIZE_MAX. This changes SIZE_MAX to reuse the compiler-provided __SIZE_MAX__, and provides similar fixes for the other integers: - INTPTR_MIN - INTPTR_MAX - UINTPTR_MAX - PTRDIFF_MIN - PTRDIFF_MAX - INTMAX_MIN - INTMAX_MAX - UINTMAX_MAX - INTMAX_C() - UINTMAX_C() ... and fixes the typedefs for intptr_t and uintptr_t to use __INTPTR_TYPE__ and __UINTPTR_TYPE__ instead of int32_t, effectively reverting r89224, r89226, and r89237 (r89221 already having been effectively reverted). We can probably also kill __INTPTR_WIDTH__, __INTMAX_WIDTH__, and __UINTMAX_WIDTH__ in a follow-up, but I was hesitant to delete all the per-target CHECK lines in this commit since those might serve their own purpose. rdar://problem/11811377 llvm-svn: 301593	2017-04-27 21:49:45 +00:00
Eric Fiselier	56be04284f	Use __CLANG_ATOMIC_TYPE_LOCK_FREE macros in `stdatomic.h` Summary: This patch makes the header `stdatomic.h` work when `-fms-compatibility` is specified. Reviewers: rsmith Reviewed By: rsmith Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D32322 llvm-svn: 300919	2017-04-20 23:07:38 +00:00
Ekaterina Romanova	0a40d67b20	[DOXYGEN] Minor improvements in doxygen comments. - To be consistent with the rest of the intrinsics headers, I removed the tags <i> .. </i> for marking instruction names in italics in in smmintrin.h. - Formatting changes to fit into 80 characters. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 300578	2017-04-18 19:44:07 +00:00
Simon Pilgrim	9f6e79c5e4	[X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (clang) MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics. LLVM companion patch: D31767. Differential Revision: https://reviews.llvm.org/D31766 llvm-svn: 300326	2017-04-14 15:05:57 +00:00
Sanjay Patel	bd0d0068ef	[x86] fix AVX FP cmp intrinsic documentation (PR28110) This copies the text used in the #define statements to the code comments. The conflicting text comes from AMD manuals, but those are wrong. Sadly, that FP cmp text has not been updated even after some docs were updated for Zen: http://support.amd.com/en-us/search/tech-docs ( AMD64 Architecture Programmer's Manual Volume 4 ) See PR28110 for more discussion: https://bugs.llvm.org/show_bug.cgi?id=28110 Differential Revision: https://reviews.llvm.org/D31428 llvm-svn: 300068	2017-04-12 15:19:08 +00:00
Hans Wennborg	5c3c51fe05	Implement _interlockedbittestandset as a builtin It's used by MS headers in VS 2017 without including intrin.h, so we can't implement it in the header anymore. Differential Revision: https://reviews.llvm.org/D31736 llvm-svn: 299782	2017-04-07 16:41:47 +00:00
Craig Topper	01bba17819	Recommit r299321 '[X86] Add __extension__ to f16c macro intrinsics to suppress warnings about compound literals when compiled for with earlier language standards enabled.' The bot didn't recover after the revert. So it looks like this wasn't the issue. llvm-svn: 299397	2017-04-03 22:59:30 +00:00
Craig Topper	27b71e5b1b	Revert r299321 '[X86] Add __extension__ to f16c macro intrinsics to suppress warnings about compound literals when compiled for with earlier language standards enabled.' to see if recovers a fuzzer bot. llvm-svn: 299382	2017-04-03 19:43:47 +00:00
Craig Topper	bf82498301	[AVX-512] Fix a couple more intrinsic macros I missed in r299346. llvm-svn: 299347	2017-04-03 03:51:57 +00:00
Craig Topper	ac9959eb53	[AVX-512] Fix some intrinsic macros that use the wrong macro parameter names and don't have parentheses around them. Thanks to Matthew Barr for reporting this issue. llvm-svn: 299346	2017-04-03 03:41:29 +00:00
Craig Topper	ce272ae2c5	[X86] Add __extension__ to f16c macro intrinsics to suppress warnings about compound literals when compiled for with earlier language standards enabled. Fixes PR32491. llvm-svn: 299321	2017-04-02 03:02:53 +00:00
Hans Wennborg	043f402586	[X86] Implement __readgsqword (and the rest) as builtins (PR32373) It seems MS headers have started using __readgsqword, and since it's used in a header that doesn't include intrin.h, we can't implement it as an inline function anymore. That was already the case for __readfsdword, which Saleem added support for in r220859. This patch reuses that codegen to implement all of __read[fg]s{byte,word,dword,qword}. Differential Revision: https://reviews.llvm.org/D31248 llvm-svn: 298538	2017-03-22 19:13:13 +00:00
Ekaterina Romanova	6a5702a093	[DOXYGEN] Improvements to smmintrin.h and emmintrin.h intrinsics. I made some small changes in smmintrin.h and emmintrin.h intrinsics. - changed some regular comments '//' into doxygen-style comments '///' where necessary - removed some trailing spaces in doxygen comments. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 298371	2017-03-21 13:34:06 +00:00
Simon Pilgrim	60e924985c	[X86][AVX512] Add _mm512_cvtsd_f64 and _mm512_cvtss_f32 intrinsics (PR32305) Differential Revision: https://reviews.llvm.org/D31155 llvm-svn: 298364	2017-03-21 12:46:13 +00:00
Eric Christopher	5ba576ffe6	Fix parsing of htmxlintrin.h in C++ mode - Fix a variable naming mismatch - Fix gcc extension pointer arithmetic on void to cast to char *. - Test that the header (and htmintrin.h) parse. llvm-svn: 298318	2017-03-20 22:31:33 +00:00
Anastasia Stulova	bb27dfe049	[OpenCL] Fix extension guards for atomic functions Review: D30830 Patch by James Price! llvm-svn: 298256	2017-03-20 15:02:54 +00:00
Igor Breger	f050b797ac	[X86][AVX512][Clang][Intrinsics] Adding missing intrinsics to Clang . Summary: Adding missing intrinsics : _mm512_set_epi16, _mm512_set_epi8, _mm512_permutevar_epi32 _mm512_mask_permutevar_epi32 Reviewers: zvi, guyblank, eladcohen, craig.topper Reviewed By: craig.topper Subscribers: craig.topper, cfe-commits Differential Revision: https://reviews.llvm.org/D31034 llvm-svn: 298208	2017-03-19 08:27:16 +00:00
Craig Topper	6afc436a78	[AVX-512] Change the input type for some load intrinsics to take void type like the spec (and the test cases say). llvm-svn: 298042	2017-03-17 05:59:25 +00:00
Craig Topper	2e5058c403	[AVX-512] Add missing typecasts and parentheses to _mm512_mask_i64gather_ps. My macro cleanup script I used on the others last year must have missed it. llvm-svn: 298040	2017-03-17 05:14:37 +00:00
Bruno Cardoso Lopes	ae1249e4f2	[Headers] Reapply: Add #include_next for tgmath.h on Darwin Reapply r289181 but rename the include guard to avoid conflict with the one from Darwin. Allow darwin to provide additional definitions and implementation specifc values for tgmath.h on Apple platforms. rdar://problem/19019845 llvm-svn: 298013	2017-03-16 23:19:00 +00:00
Egor Churaev	60c30ae1f1	[OpenCL] Implement as_type operator as alias of __builtin_astype. Reviewers: Anastasia Reviewed By: Anastasia Subscribers: cfe-commits, yaxunl, bader Differential Revision: https://reviews.llvm.org/D28136 llvm-svn: 297947	2017-03-16 12:15:10 +00:00
Reid Kleckner	b04cb9ab7a	[MS] Add support for __ud2 and __int2c MSVC intrinsics This was requested in PR31958 and elsewhere. llvm-svn: 297057	2017-03-06 19:43:16 +00:00
Oren Ben Simhon	259b091669	[X86] DAZ Macros Relocation The DAZ feature introduces the denormal zero support for x86. Currently the definitions are located under SSE3 header, however there are some SSE2 targets that support the feature as well. Differential Revision: https://reviews.llvm.org/D30194 llvm-svn: 296296	2017-02-26 11:58:15 +00:00
Simon Pilgrim	a81d45a1ba	[X86][XOP] Fix type conversion warning in vpcmov generic implementations. llvm-svn: 295584	2017-02-18 23:47:34 +00:00
Craig Topper	117892098a	[X86] Replace XOP vpcmov builtins with native vector logical operations. llvm-svn: 295570	2017-02-18 21:15:30 +00:00
Ekaterina Romanova	ff266f5236	Added doxygen comments to smmintrin.h's intrinsics. Note: The doxygen comments are automatically generated based on Sony's intrinsic s document. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 295404	2017-02-17 02:49:50 +00:00
Anastasia Stulova	58984e7087	[OpenCL] Correct ndrange_t implementation Removed ndrange_t as Clang builtin type and added as a struct type in the OpenCL header. Use type name to do the Sema checking in enqueue_kernel and modify IR generation accordingly. Review: D28058 Patch by Dmitry Borisenkov! llvm-svn: 295311	2017-02-16 12:27:47 +00:00
Craig Topper	f0d1147fae	[AVX-512] Replace 512-bit masked packss/packus builtins and replace with new unmasked builtins. These new unmasked builtins will enable us to easily support optimizing these builtins in InstCombine in the backend. llvm-svn: 295291	2017-02-16 06:32:07 +00:00
Reid Kleckner	2a02c2e331	Fix some warnings in intrin.h llvm-svn: 295082	2017-02-14 18:38:19 +00:00
Reid Kleckner	04f9f91da6	[MS] Implement the __fastfail intrinsic as a builtin __fastfail terminates the process immediately with a special system call. It does not run any process shutdown code or exception recovery logic. Fixes PR31854 llvm-svn: 294606	2017-02-09 18:31:06 +00:00
Craig Topper	4574226c3f	[X86] Clzero flag addition and inclusion under znver1 1. Adds the command line flag for clzero. 2. Includes the clzero flag under znver1. 3. Defines the macro for clzero. 4. Adds a new file which has the intrinsic definition for clzero instruction. Patch by Ganesh Gopalasubramanian with some additional tests from me. Differential revision: https://reviews.llvm.org/D29386 llvm-svn: 294559	2017-02-09 06:10:14 +00:00
Ekaterina Romanova	ae7b82eaf8	Doxygen comments for prfchwintrin.h Added doxygen comments to prfchwintrin.h's intrinsics. Note: The doxygen comments are automatically generated based on Sony's intrinsic s document. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 293745	2017-02-01 07:37:40 +00:00
Anastasia Stulova	d1f390ef99	[OpenCL] Diagnose write_only image3d when extension is disabled Prior to OpenCL 2.0, image3d_t can only be used with the write_only access qualifier when the cl_khr_3d_image_writes extension is enabled, see e.g. OpenCL 1.1 s6.8b. Require the extension for write_only image3d_t types and guard uses of write_only image3d_t in the OpenCL header. Patch by Sven van Haastregt! Review: https://reviews.llvm.org/D28860 llvm-svn: 293050	2017-01-25 12:18:50 +00:00
Paul Robinson	a363d14538	Guard __gnuc_va_list typedef. Differential Revision: http://reviews.llvm.org/D28620 llvm-svn: 292819	2017-01-23 19:09:21 +00:00
Tim Shen	867be0d14c	[Altivec] Change vec_sl to a << (b % (sizeof(a) * 8)) For a << b (as original vec_sl does), if b >= sizeof(a) * 8, the behavior is undefined. However, Power instructions do define the behavior, which is equivalent to a << (b % (sizeof(a) * 8)). This patch changes altivec.h to use a << (b % (sizeof(a) * 8)), to ensure the consistent semantic of the instructions. Then it combines the generated multiple instructions back to a single shift. This patch handles left shift only. Right shift, on the other hand, is more complicated, considering arithematic/logical right shift. Differential Revision: https://reviews.llvm.org/D28037 llvm-svn: 292659	2017-01-20 22:05:33 +00:00
Craig Topper	367c86ddbe	[AVX-512] Replace subvector broadcast builtins with shufflevectors and selects. Verified that the backend codegens this equally well. llvm-svn: 292329	2017-01-18 02:17:10 +00:00
Ekaterina Romanova	2e041c9c20	[DOXYGEN] Documentation for the newly added x86 intrinsics. Added doxygen comments for the newly added intrinsics in avxintrin.h, namely _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and _mm256_cvtss_f32 Added doxygen comments for the new intrinsics in emmintrin.h, namely _mm_loadu_si64 and _mm_load_sd. Explicit parameter names were added for _mm_clflush and _mm_setcsr The rest of the changes are editorial, removing trailing spaces at the end of the lines. Differential Revision: https://reviews.llvm.org/D28503 llvm-svn: 291876	2017-01-13 01:14:08 +00:00
Tony Jiang	974e4c7899	[PowerPC] Fix the wrong implementation of builtin vec_rlnm. llvm-svn: 291702	2017-01-11 20:59:42 +00:00
Sean Fertile	96d9e0ec05	Add vec_insert4b and vec_extract4b functions to altivec.h Add builtins for the functions and custom codegen mapping the builtins to their corresponding intrinsics and handling the endian related swapping. https://reviews.llvm.org/D26546 llvm-svn: 291179	2017-01-05 21:43:30 +00:00
Justin Lebar	b8f7a3b8b1	[CUDA] Rename keywords used in macro so they don't conflict with MSVC. Summary: MSVC seems to use "__in" and "__out" for its own purposes, so we have to pick different names in this macro. Reviewers: tra Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D28325 llvm-svn: 291138	2017-01-05 16:54:11 +00:00
Justin Lebar	11d5116904	[CUDA] Don't define functions that the CUDA headers themselves define on Windows. Reviewers: tra Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D28324 llvm-svn: 291137	2017-01-05 16:53:55 +00:00
Justin Lebar	1863d611f8	[Windows] Remove functions in intrin.h that are defined in Builtin.def. Summary: These duplicate declarations cause a problem for CUDA compiles on Windows. All implicitly-defined functions are host+device, and this applies to the declarations in Builtin.def. But then when we see the declarations in intrin.h, they have no attributes, so are host-only functions. This is an error. (A better fix might be to make these builtins host-only, but that is a much bigger change.) Reviewers: rnk Subscribers: cfe-commits, echristo Differential Revision: https://reviews.llvm.org/D28317 llvm-svn: 291128	2017-01-05 16:51:37 +00:00
Artem Belevich	60f25f70c8	[CUDA] Pre-include sm_60 and sm_61 headers. CUDA-8.0 comes with new headers which nvcc pre-includes via cuda_runtime.h Clang now makes them available as well. Differential Revision: https://reviews.llvm.org/D28301 llvm-svn: 290982	2017-01-04 18:39:29 +00:00
Ekaterina Romanova	c9ed514632	[DOXYGEN] Improved doxygen comments for xmmintrin.h intrinsics. Added \n commands to insert a line breaks where necessary, since one long line of documentation is nearly unreadable. Formatted comments to fit into 80 chars. In some cases added \a command in front of the parameter names to display them in italics. llvm-svn: 290619	2016-12-27 18:53:29 +00:00
Craig Topper	70536f4e47	[AVX-512] Replace masked 512-bit pmuldq and pmuludq builtins with the newly added unmasked versions and selects. llvm-svn: 290580	2016-12-27 04:04:57 +00:00
Craig Topper	32866ab800	Revert r290574 "foo" This was supposed to be merged with another commit with a real commit message. Sorry. llvm-svn: 290579	2016-12-27 04:03:29 +00:00
Craig Topper	c5ab78d4c3	Revert r290575 "[AVX-512] Replace masked 512-bit pmuldq and pmuludq builtins with the newly added unmasked versions and selects." I failed to merge this with r290574. llvm-svn: 290578	2016-12-27 04:03:25 +00:00
Craig Topper	6ad5bcc8ac	[AVX-512] Replace masked 512-bit pmuldq and pmuludq builtins with the newly added unmasked versions and selects. llvm-svn: 290575	2016-12-27 03:46:16 +00:00
Craig Topper	39b9e32493	foo llvm-svn: 290574	2016-12-27 03:46:13 +00:00
Ekaterina Romanova	dffe45b3e6	[DOXYGEN] Improved doxygen comments for x86 intrinsics. Improved doxygen comments for the following intrinsics headers: __wmmintrin_pclmul.h, bmiintrin.h, emmintrin.h, f16cintrin.h, immintrin.h, mmintrin.h, pmmintrin.h, tmmintrin.h Added \n commands to insert a line breaks where necessary, since one long line of documentation is nearly unreadable. Formatted comments to fit into 80 chars. In some cases added \a command in front of the parameter names to display them in italics. llvm-svn: 290561	2016-12-27 00:49:38 +00:00
Marina Yatsina	c42fd03bf8	[inline-asm]No error for conflict between inputs\outputs and clobber list According to extended asm syntax, a case where the clobber list includes a variable from the inputs or outputs should be an error - conflict. for example: const long double a = 0.0; int main() { char b; double t1 = a; __asm__ ("fucompp": "=a" (b) : "u" (t1), "t" (t1) : "cc", "st", "st(1)"); return 0; } This should conflict with the output - t1 which is st, and st which is st aswell. The patch fixes it. Commit on behald of Ziv Izhar. Differential Revision: https://reviews.llvm.org/D15075 llvm-svn: 290539	2016-12-26 12:23:42 +00:00
Ekaterina Romanova	16166a4d71	[DOXYGEN] Improved doxygen comments for tmmintrin.h intrinsics. Added \n commands to insert a line breaks where necessary to make the documentation more readable. Formatted comments to fit into 80 chars. llvm-svn: 290458	2016-12-23 23:36:26 +00:00
Ekaterina Romanova	6de0cd870b	[DOXYGEN] Improved doxygen comments for tmmintrin.h intrinsics. Tagged parameter names with \a doxygen command to display parameters in italics. Added \n commands to insert a line break to make the documentation more readable. Formatted comments to fit into 80 chars. llvm-svn: 290455	2016-12-23 22:47:16 +00:00
Yaxun Liu	5b74665a41	Recommit r289979 [OpenCL] Allow disabling types and declarations associated with extensions Fixed undefined behavior due to cast integer to bool in initializer list. llvm-svn: 290056	2016-12-18 05:18:55 +00:00
Yaxun Liu	35f6d66b0d	Revert r289979 due to regressions llvm-svn: 289991	2016-12-16 21:23:55 +00:00
Yaxun Liu	2e8331cab6	[OpenCL] Allow disabling types and declarations associated with extensions Added a map to associate types and declarations with extensions. Refactored existing diagnostic for disabled types associated with extensions and extended it to declarations for generic situation. Fixed some bugs for types associated with extensions. Allow users to use pragma to declare types and functions for supported extensions, e.g. #pragma OPENCL EXTENSION the_new_extension_name : begin // declare types and functions associated with the extension here #pragma OPENCL EXTENSION the_new_extension_name : end Differential Revision: https://reviews.llvm.org/D21698 llvm-svn: 289979	2016-12-16 19:22:08 +00:00
Bruno Cardoso Lopes	88458c31e7	Revert "[Headers] Add #include_next for tgmath.h on Darwin" Reverts r289181: it's currently breaking modules using simd.h in 10.12 SDK. This reverts commit 6e73e3464e96a4e00492c24aa790d36e1adb5702. llvm-svn: 289487	2016-12-12 23:06:58 +00:00
Craig Topper	678b07fe3c	[AVX-512] Remove masking from 512-bit vpermil builtins. The backend now has versions without masking so wrap it with select. This will allow the backend to constant fold these to generic shuffle vectors like 128-bit and 256-bit without having to working about handling masking. llvm-svn: 289351	2016-12-11 01:26:52 +00:00
Craig Topper	cdd3603c04	[AVX-512] Remove masking from 512-bit pshufb builtin. The backend now has a version without masking so wrap it with select. This will allow the backend to constant fold these to generic shuffle vectors like 128-bit and 256-bit without having to working about handling masking. llvm-svn: 289345	2016-12-10 23:09:52 +00:00
Craig Topper	5391c98341	[AVX-512] Remove 128/256-bit masked vpermilvar builtins and replace with select and the avx unmasked builtins. llvm-svn: 289338	2016-12-10 20:27:39 +00:00
Ekaterina Romanova	0c1c3bbc78	[DOXYGEN] Improved doxygen comments for x86 intrinsics headers. Tagged instruction names with <c> INSTR_NAME </c> to display them in typewriter font. In the past, \c command was used, unfortunately it applied to only one word. <c> .. </c> has the same meaning, but applies to all words in between the tags. llvm-svn: 289249	2016-12-09 18:35:50 +00:00
Bruno Cardoso Lopes	052e6ddf27	[Headers] Add #include_next for tgmath.h on Darwin Allow darwin to provide additional definitions and implementation specifc values for tgmath.h on Apple platforms. rdar://problem/19019845 llvm-svn: 289181	2016-12-09 03:30:46 +00:00
Ekaterina Romanova	08da283295	[DOXYGEN] Improved doxygen comments for xmmintrin.h intrinsics. Tagged parameter names with \a doxygen command to display parameters in italics. Formatted comments to fit into 80 chars. llvm-svn: 289159	2016-12-08 23:58:39 +00:00
Ekaterina Romanova	3494a597e9	[DOXYGEN] Improved doxygen comments. Improved doxygen comments for fxsrintrin.h and mmintrin.h intrinsics by taagging parameter names with \a doxygen command to display parameters in italics. Formatted comments to fit into 80 chars. llvm-svn: 289154	2016-12-08 23:32:07 +00:00
Ekaterina Romanova	797b0ebf2d	[DOXYGEN] Improved doxygen comments for emmintrin.h intrinsics. Tagged parameter names with \a doxygen command to display parameters in italics. Formatted comments to fit into 80 chars. llvm-svn: 289116	2016-12-08 22:10:51 +00:00
Ekaterina Romanova	a8fde7ce8b	[DOXYGEN] Improved doxygen comments. Improved doxygen comments for __wmmintrin_pclmul.h and ammintrin.h intrinsics by taagging parameter names with \a doxygen command to display parameters in italics. Formatted comments to fit into 80 chars. llvm-svn: 289083	2016-12-08 17:57:23 +00:00
Ekaterina Romanova	d6042197db	[DOXYGEN] Improved doxygen comments for avxintrin.h intrinsics. Tagged parameter names with \a doxygen command to display them in italics. Formatted comments to fit into 80 chars. llvm-svn: 289022	2016-12-08 04:09:17 +00:00
Bruno Cardoso Lopes	d93779da15	[Headers] Enable #include_next<float.h> on Darwin Allows darwin targets to provide additional definitions and implementation specifc values for float.h rdar://problem/21961491 llvm-svn: 289018	2016-12-08 02:13:56 +00:00
Ekaterina Romanova	4c77e8940e	[DOXYGEN] Updated instruction names corresponding to avxintrin.h intrinsics. Documentation for some of the avxintrin.h's intrinsics errorneously said that non VEX-prefixed instructions could be generated. This was fixed. I tried several different solutions to achieve pretty printing of unordered lists (nested and non-nested) in param sections in doxygen. llvm-svn: 287990	2016-11-26 19:38:19 +00:00
Ehsan Amiri	85f5bfcf0d	[PPC] support for arithmetic builtins in the FE (commit again after fixing the buildbot failures) This adds various overloads of the following builtins to altivec.h: vec_neg vec_nabs vec_adde vec_addec vec_sube vec_subec vec_subc Note that for vec_sub builtins on 32 bit integers, the semantics is similar to what ISA describes for instructions like vsubecuq that work on quadwords: the first operand is added to the one's complement of the second operand. (As opposed to two's complement which I expected). llvm-svn: 287872	2016-11-24 12:40:04 +00:00
Ehsan Amiri	9cce1ee88c	[PPC] revert r287795 A test that passed locally is failing on one of the build bots. llvm-svn: 287796	2016-11-23 18:55:17 +00:00
Ehsan Amiri	9b91cfa0b0	[PPC] support for arithmetic builtins in the FE (commit again after fixing the buildbot failures) This adds various overloads of the following builtins to altivec.h: vec_neg vec_nabs vec_adde vec_addec vec_sube vec_subec vec_subc Note that for vec_sub builtins on 32 bit integers, the semantics is similar to what ISA describes for instructions like vsubecuq that work on quadwords: the first operand is added to the one's complement of the second operand. (As opposed to two's complement which I expected). llvm-svn: 287795	2016-11-23 18:36:29 +00:00
Ehsan Amiri	ac10595b0d	[PPC] Reverting r287772 Due to buildbot failure, I revert. Will recommit after investigation. llvm-svn: 287775	2016-11-23 16:56:03 +00:00
Ehsan Amiri	5ea1054dab	[PPC] support for arithmetic builtins in the FE This adds various overloads of the following builtins to altivec.h: vec_neg vec_nabs vec_adde vec_addec vec_sube vec_subec vec_subc Note that for vec_sub builtins on 32 bit integers, the semantics is similar to what ISA describes for instructions like vsubecuq that work on quadwords: the first operand is added to the one's complement of the second operand. (As opposed to two's complement which I expected). llvm-svn: 287772	2016-11-23 16:32:05 +00:00
Craig Topper	6aefe00ccf	[X86] Replace valignd/q builtins with appropriate __builtin_shufflevector. llvm-svn: 287733	2016-11-23 01:47:12 +00:00
Ekaterina Romanova	bf667b21ac	Add doxygen comments to immintrin.h's intrinsics. The doxygen comments are automatically generated based on Sony's intrinsics docu ment. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. This patch was internally reviewed by Charles Li. llvm-svn: 287483	2016-11-20 08:35:05 +00:00

... 4 5 6 7 8 ...

1667 Commits