llvm-project

Commit Graph

Author	SHA1	Message	Date
Nemanja Ivanovic	85a28dcc5d	[PPC] Implement vector reverse elements builtins (vec_reve) This patch corresponds to review https://reviews.llvm.org/D25906. Committing on behalf of Tony Jiang. llvm-svn: 285218	2016-10-26 18:25:45 +00:00
Craig Topper	f202365910	[AVX-512] Fix the operand order for all calls to __builtin_ia32_vfmaddss3_mask. Summary: The preserved input should be the first argument and the vector inputs should be in the same order as the intrinsics it is used to implement. Reviewers: igorb, delena Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D25902 llvm-svn: 285175	2016-10-26 05:35:38 +00:00
Yaxun Liu	a49bd14843	[OpenCL] Add missing atom_xor for 64 bit to opencl-c.h Differential Revision: https://reviews.llvm.org/D25954 llvm-svn: 285125	2016-10-25 21:37:05 +00:00
Michael Zuckerman	facb37cabf	[X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (Operators: +,*,&&,\|\|) intrinsics to Clang Committed after LGTM and check-all Vector-reduction arithmetic accepts vectors as inputs and produces scalars as outputs. This class of vector operation forms the basis of many scientific computations. In vector-reduction arithmetic, the evaluation off is independent of the order of the input elements of V. Used bisection method. At each step, we partition the vector with previous step in half, and the operation is performed on its two halves. This takes log2(n) steps where n is the number of elements in the vector. Reviwer: 1. igorb 2. craig.topper Differential Revision: https://reviews.llvm.org/D25527 llvm-svn: 285054	2016-10-25 07:56:04 +00:00
Michael Zuckerman	33bd5b235b	revert r284963 because new test file is failing in some OS. test/CodeGen/avx512-reduceIntrin.c llvm-svn: 284967	2016-10-24 11:30:23 +00:00
Michael Zuckerman	98cb041891	[X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (Operators: +,*,&&,\|\|) intrinsics to Clang Committed after LGTM and check-all Vector-reduction arithmetic accepts vectors as inputs and produces scalars as outputs. This class of vector operation forms the basis of many scientific computations. In vector-reduction arithmetic, the evaluation off is independent of the order of the input elements of V. Used bisection method. At each step, we partition the vector with previous step in half, and the operation is performed on its two halves. This takes log2(n) steps where n is the number of elements in the vector. Differential Revision: https://reviews.llvm.org/D25527 llvm-svn: 284963	2016-10-24 10:53:20 +00:00
Craig Topper	eee7c0520c	[AVX-512] Replace masked 128/256-bit byte, word, and dword min/max builtins with selects and the older unmasked builtins. llvm-svn: 284954	2016-10-23 23:57:30 +00:00
Craig Topper	0c5da26572	[AVX-512] Replace 512-bit pmovzx/sx builtins with native IR. llvm-svn: 284936	2016-10-23 07:35:47 +00:00
Craig Topper	4ef879ac2c	[AVX-512] Remove masked 128/256-bit packss/packus builtins and replace with selects and the older unmasked builtins. llvm-svn: 284935	2016-10-23 07:35:39 +00:00
Ekaterina Romanova	06477bf035	Add more doxygen comments to emmintrin.h's intrinsics. With this patch, all intrinsics in this file (with an exception of a handful of a recently added ones) will be documented. I will send out a patch for 4 missining intrisics later. The doxygen comments are automatically generated based on Sony's intrinsics document. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. This patch was internally reviewed by Yunzhong Gao. llvm-svn: 284934	2016-10-23 07:30:50 +00:00
Craig Topper	4d63dfc286	[AVX-512] Replace masked 128/256-bit pavg builtins and replace with select and older unmasked builtins. llvm-svn: 284929	2016-10-22 21:24:56 +00:00
Craig Topper	622c63614d	[AVX-512] Replace masked 128/256-bit saturating add/sub builtins with select and older unmasked builtins. llvm-svn: 284928	2016-10-22 21:24:52 +00:00
Craig Topper	11dda92405	[AVX-512] Replace masked 128/256-bit vpmovzx/vpmovsx builtins with native IR. llvm-svn: 284927	2016-10-22 21:24:48 +00:00
Craig Topper	eb1c0afa90	[AVX-512] Remove masked 128/256-bit pshufb builtins. Replace with a select and the older unmaksed builtins. llvm-svn: 284925	2016-10-22 21:24:42 +00:00
Craig Topper	78a9c40326	[AVX-512] Remove builtins for 128/256-bit pabsb/pabsw. We can use a select and the older non-masked versions instead. llvm-svn: 284924	2016-10-22 21:24:38 +00:00
Craig Topper	c2c7e42bfe	[AVX-512] Add typecasts to alignr intrinsics that were modified in r284920. llvm-svn: 284923	2016-10-22 21:24:34 +00:00
Craig Topper	f6373bc6fd	[AVX-512] Remove masked 128/256-bit palignr builtins. We can just use a select in the header file with the older unmasked versions instead. llvm-svn: 284920	2016-10-22 18:32:33 +00:00
Ekaterina Romanova	493091fdef	Add more doxygen comments to emmintrin.h's intrinsics. With this patch, 75% of the intrinsics in this file will be documented now. The patches for the rest of the intrisics in this file will be send out later. The doxygen comments are automatically generated based on Sony's intrinsics document. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. This patch was internally reviewed by Yunzhong Gao. llvm-svn: 284754	2016-10-20 17:59:15 +00:00
Albert Gutowski	1deab38717	Implement __stosb intrinsic as a volatile memset Summary: We need `__stosb` to be an intrinsic, because SecureZeroMemory function uses it without including intrin.h. Implementing it as a volatile memset is not consistent with MSDN specification, but it gives us target-independent IR while keeping the most important properties of `__stosb`. Reviewers: rnk, hans, thakis, majnemer Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D25334 llvm-svn: 284253	2016-10-14 17:33:05 +00:00
Albert Gutowski	5e08df0266	Add 64-bit MS _Interlocked functions as builtins again Summary: Previously global 64-bit versions of _Interlocked functions broke buildbots on i386, so now I'm adding them as builtins for x86-64 and ARM only (should they be also on AArch64? I had problems with testing it for AArch64, so I left it) Reviewers: hans, majnemer, mstorsjo, rnk Subscribers: cfe-commits, aemerson Differential Revision: https://reviews.llvm.org/D25576 llvm-svn: 284172	2016-10-13 22:35:07 +00:00
Albert Gutowski	397d81bb9a	Implement MS _ReturnAddress and _AddressOfReturnAddress intrinsics Reviewers: rnk, thakis, majnemer, hans Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D25540 llvm-svn: 284131	2016-10-13 16:03:42 +00:00
Yunzhong Gao	d9fa56a4fb	[NFC] Fixing the description for _mm_store_ps and _mm_store_ps1. It seems that the doxygen description of these two intrinsics were swapped by mistake. llvm-svn: 284080	2016-10-12 23:27:27 +00:00
Albert Gutowski	2a0621e58a	Implement MS _BitScan intrinsics Summary: _BitScan intrinsics (and some others, for example _Interlocked and _bittest) are supposed to work on both ARM and x86. This is an attempt to isolate them, avoiding repeating their code or writing separate function for each builtin. Reviewers: hans, thakis, rnk, majnemer Subscribers: RKSimon, cfe-commits, aemerson Differential Revision: https://reviews.llvm.org/D25264 llvm-svn: 284060	2016-10-12 22:01:05 +00:00
Yunzhong Gao	c37e2231ad	[NFC] Trial change to remove a redundant blank line. llvm-svn: 284033	2016-10-12 19:33:33 +00:00
Justin Lebar	49ec14692a	[CUDA] Re-land support for <complex> (r283683 and r283680). These were reverted in r283753 and r283747. The first patch added a header to the root 'Headers' install directory, instead of into 'Headers/cuda_wrappers'. This was fixed in the second patch, but by then the damage was done: The bad header stayed in the 'Headers' directory, continuing to break the build. We reverted both patches in an attempt to fix things, but that still didn't get rid of the header, so the Windows boostrap build remained broken. It's probably worth fixing up our cmake logic to remove things from the install dirs, but in the meantime, re-land these patches, since we believe they no longer have this bug. llvm-svn: 283907	2016-10-11 17:36:03 +00:00
Albert Gutowski	fcea61c563	Implement MS read/write barriers and __faststorefence intrinsic Reviewers: hans, rnk, majnemer Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D25442 llvm-svn: 283793	2016-10-10 19:40:51 +00:00
Albert Gutowski	7216f17653	Implement __emul, __emulu, _mul128 and _umul128 MS intrinsics Reviewers: rnk, thakis, majnemer, hans Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D25353 llvm-svn: 283785	2016-10-10 18:09:27 +00:00
Nico Weber	21b9c7a6dc	Revert r283683 because r283680 got reverted. llvm-svn: 283753	2016-10-10 14:20:35 +00:00
Nico Weber	67dd74ef89	Revert r283680. Breaks bootstrap builds on (at least) Windows: In file included from D:\buildslave\clang-x64-ninja-win7\llvm\lib\Support\Allocator.cpp:14: In file included from D:\buildslave\clang-x64-ninja-win7\llvm\include\llvm/Support/Allocator.h:24: In file included from D:\buildslave\clang-x64-ninja-win7\llvm\include\llvm/ADT/SmallVector.h:20: In file included from D:\buildslave\clang-x64-ninja-win7\llvm\include\llvm/Support/MathExtras.h:19: D:\buildslave\clang-x64-ninja-win7\stage1.install\bin\..\lib\clang\4.0.0\include\algorithm(63,8) : error: unknown type name '__device__' inline __device__ const __T & llvm-svn: 283747	2016-10-10 14:10:00 +00:00
Justin Lebar	3b593f56fc	[CUDA] Don't install cuda_wrappers/{algorithm,complex} into the main include dir. This is obviously wrong -- if we do this, then all compiles will pick up these wrappers, which is not what we want. llvm-svn: 283683	2016-10-09 00:27:39 +00:00
Justin Lebar	d3c5d2a4de	[CUDA] Support <complex> and std::min/max on the device. Summary: We do this by wrapping <complex> and <algorithm>. Tests are in the test-suite. Reviewers: tra Subscribers: jhen, beanz, cfe-commits, mgorny Differential Revision: https://reviews.llvm.org/D24979 llvm-svn: 283680	2016-10-08 22:16:12 +00:00
Justin Lebar	2dfbe9a3b4	[CUDA] Rename cuda_builtin_vars.h to __clang_cuda_builtin_vars.h. Summary: This matches the idiom we use for our other CUDA wrapper headers. Reviewers: tra Subscribers: beanz, mgorny, cfe-commits Differential Revision: https://reviews.llvm.org/D24978 llvm-svn: 283679	2016-10-08 22:16:08 +00:00
Justin Lebar	e9eb792a0f	[CUDA] Declare our __device__ math functions in the same inline namespace as our standard library. Summary: Currently we declare our inline __device__ math functions in namespace std. But libstdc++ and libc++ declare these functions in an inline namespace inside namespace std. We need to match this because, in a later patch, we want to get e.g. <complex> to use our device overloads, and it only will if those overloads are in the right inline namespace. Reviewers: tra Subscribers: cfe-commits, jhen Differential Revision: https://reviews.llvm.org/D24977 llvm-svn: 283678	2016-10-08 22:16:03 +00:00
Michael Zuckerman	9e43ccfe68	[Clang][AVX512][BuiltIn]Adding missing intrinsics move_{sd\|ss} to clang Differential Revision: http://reviews.llvm.org/D21021 llvm-svn: 283314	2016-10-05 12:56:06 +00:00
Albert Gutowski	f3a0bce155	Separate builtins for x84-64 and i386; implement __mulh and __umulh Summary: We need x86-64-specific builtins if we want to implement some of the MS intrinsics - winnt.h contains definitions of some functions for i386, but not for x86-64 (for example _InterlockedOr64), which means that we cannot treat them as builtins for both i386 and x86-64, because then we have definitions of builtin functions in winnt.h on i386. Reviewers: thakis, majnemer, hans, rnk Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D24598 llvm-svn: 283264	2016-10-04 22:29:49 +00:00
Craig Topper	c4a8228bcc	[AVX-512] Use native IR for masked 512-bit add/sub/mul/div ps/pd intrinsics when rounding mode isn't used. llvm-svn: 283073	2016-10-02 17:43:00 +00:00
Artem Belevich	d4d9dc8252	[CUDA] Added support for CUDA-8 Differential Revision: https://reviews.llvm.org/D24946 llvm-svn: 282610	2016-09-28 17:47:40 +00:00
Martin Storsjo	963f75efc2	[Headers] Replace stray indentation with tabs with spaces. NFC. This matches the rest of the surrounding file. llvm-svn: 282569	2016-09-28 09:34:51 +00:00
Ayman Musa	17a2819b05	Update to commit r282488, fix the buildboot failure. llvm-svn: 282492	2016-09-27 15:37:31 +00:00
Ayman Musa	2e250e8845	[avx512] Add aliases to some missing avx512 intrinsics. Differential Revision:https: //reviews.llvm.org/D24961 llvm-svn: 282488	2016-09-27 14:06:32 +00:00
Nemanja Ivanovic	10e2b5dcaa	[Power9] Builtins for ELF v.2 ABI conformance - front end portion This patch corresponds to review: https://reviews.llvm.org/D24397 It adds the __POWER9_VECTOR__ macro and the -mpower9-vector option along with a number of altivec.h functions (refer to the code review for a list). llvm-svn: 282481	2016-09-27 10:45:22 +00:00
Saleem Abdulrasool	eae64f8a62	headers: add missing Windows ARM Interlocked intrinsics On ARM, there are multiple versions of each of the intrinsics, with acquire/relaxed/release barrier semantics. The newly added ones are provided as inline functions here instead of builtins, since they should only be available on certain archs (arm/aarch64). This is necessary in order to compile C++ code for ARM in MSVC mode. Patch by Martin Storsjö! llvm-svn: 282447	2016-09-26 22:12:43 +00:00
Simon Dardis	3d9c763816	[mips] MSA intrinsics header file This patch adds the msa.h header file containing the shorter names for the MSA instrinsics, e.g. msa_sll_b for builtin_msa_sll_b. Reviewers: vkalintiris, zoran.jovanovic Differential Review: https://reviews.llvm.org/D24674 llvm-svn: 281975	2016-09-20 15:07:36 +00:00
Justin Lebar	e3612a039f	[CUDA] Make __clang_cuda_cmath.h compatible with libc++. Summary: We need to add a bunch more "using"s, which weren't necessary with libstdc++. Once this is in I can check in a test to the test-suite. Reviewers: tra Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D24588 llvm-svn: 281544	2016-09-14 21:50:14 +00:00
Albert Gutowski	727ab8a803	Add some MS aliases for existing intrinsics Reviewers: thakis, compnerd, majnemer, rsmith, rnk Subscribers: alexshap, cfe-commits Differential Revision: https://reviews.llvm.org/D24330 llvm-svn: 281540	2016-09-14 21:19:43 +00:00
Albert Gutowski	fc19fa3721	Temporary fix for MS _Interlocked intrinsics llvm-svn: 281401	2016-09-13 21:51:37 +00:00
Albert Gutowski	9918cb6573	Reverse commit 281375 (breaks building Chromium) llvm-svn: 281399	2016-09-13 21:24:51 +00:00
Albert Gutowski	ce7a9a47b2	Add bunch of _Interlocked builtins Reviewers: compnerd, thakis, Prazek, majnemer, rnk Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D24153 llvm-svn: 281378	2016-09-13 19:43:33 +00:00
Albert Gutowski	ae3fb3113f	Add some MS aliases for existing intrinsics Reviewers: thakis, compnerd, majnemer, rsmith, rnk Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D24330 llvm-svn: 281375	2016-09-13 19:26:42 +00:00
Albert Gutowski	b6a11acb53	Implement MS _rot intrinsics Reviewers: thakis, Prazek, compnerd, rnk Subscribers: majnemer, cfe-commits Differential Revision: https://reviews.llvm.org/D24311 llvm-svn: 280997	2016-09-08 22:32:19 +00:00
Reid Kleckner	5de2bcdcf6	Add MS __nop intrinsic to intrin.h Summary: There was no definition for __nop function - added inline assembly. Patch by Albert Gutowski! Reviewers: rnk, thakis Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D24286 llvm-svn: 280826	2016-09-07 16:55:12 +00:00
Craig Topper	2dfab63bb3	[AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div builtins and replace with native operations. We can't do the 512-bit ones because they take a rounding mode argument that we can't represent. llvm-svn: 280635	2016-09-04 18:30:17 +00:00
Elad Cohen	fb6358d2b5	[Modules] Add 'freestanding' to the 'requires-declaration' feature-list. This adds support for modules that require (non-)freestanding environment, such as the compiler builtin mm_malloc submodule. Differential Revision: https://reviews.llvm.org/D23871 llvm-svn: 280613	2016-09-04 06:00:42 +00:00
Joerg Sonnenberger	b50b2fac9f	Trailing dot that shouldn't have been committed. llvm-svn: 280609	2016-09-04 00:51:02 +00:00
Joerg Sonnenberger	82216f0faa	PR 27200: Fix names of the atomic lock-free macros. llvm-svn: 280607	2016-09-04 00:44:10 +00:00
Craig Topper	f43e4a1728	[AVX-512] Remove masked integer mullo builtins and replace with native IR. llvm-svn: 280597	2016-09-03 19:19:49 +00:00
Craig Topper	0e18976b8d	[AVX-512] Remove masked integer add/sub builtins and replace with native IR. llvm-svn: 280596	2016-09-03 18:29:35 +00:00
Craig Topper	a815f488d5	[AVX-512] Implement masked floating point logical operations with native IR and remove the builtins. llvm-svn: 280197	2016-08-31 05:38:58 +00:00
Craig Topper	d0681d528d	[X86] Use v2i64 vectors to implement _mm_and/andn/or/xor_pd. These will be reused when removing some builtins from avx512vldqintrin.h and this will make the tests for that change show a better number of vector elements. llvm-svn: 280196	2016-08-31 05:38:55 +00:00
Bruno Cardoso Lopes	6736e199c7	[Modules] Add 'gnuinlineasm' to the 'requires-declaration' feature-list. This adds support for modules that require (no-)gnu-inline-asm environment, such as the compiler builtin cpuid submodule. This is the gnu-inline-asm variant of https://reviews.llvm.org/D23871 Differential Revision: https://reviews.llvm.org/D23905 rdar://problem/26931199 llvm-svn: 280159	2016-08-30 21:25:42 +00:00
Alexey Bader	b5d90e57dc	[OpenCL] Make is_valid_event, create_user_event overloadable. Summary: Make is_valid_event and create_user_event overloadable like other built-ins. Patch by Evgeniy Tyurin. Reviewers: bader, yaxunl Subscribers: Anastasia, cfe-commits Differential Revision: https://reviews.llvm.org/D23914 llvm-svn: 280097	2016-08-30 14:42:54 +00:00
Asaf Badouh	356bb76809	[X86][AVX512F] minor fix of the parameter names add "__" prefix Bug 28842 https://llvm.org/bugs/show_bug.cgi?id=29040 Differential Revision: https://reviews.llvm.org/D23753 llvm-svn: 279392	2016-08-21 07:56:47 +00:00
Justin Lebar	cb20a09f54	[CUDA] Improve handling of math functions. Summary: A bunch of related changes here to our CUDA math headers. - The second arg to nexttoward is a double (well, technically, long double, but we don't have that), not a float. - Add a forward-declare of llround(float), which is defined in the CUDA headers. We need this for the same reason we need most of the other forward-declares: To prevent a constexpr function in our standard library from becoming host+device. - Add nexttowardf implementation. - Pull "foobarf" functions defined by the CUDA headers in the global namespace into namespace std. This lets you do e.g. std::sinf. - Add overloads for math functions accepting integer types. This lets you do e.g. std::sin(0) without having an ambiguity between the overload that takes a float and the one that takes a double. With these changes, we pass testcases derived from libc++ for cmath and math.h. We can check these testcases in to the test-suite once support for CUDA lands there. Reviewers: tra Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D23627 llvm-svn: 279140	2016-08-18 20:43:13 +00:00
Yaxun Liu	3317446301	[OpenCL] AMDGPU: Add extensions cl_amd_media_ops and cl_amd_media_ops2 Differential Revision: https://reviews.llvm.org/D23322 llvm-svn: 278851	2016-08-16 20:49:49 +00:00
Reid Kleckner	66e7717b46	Revert "[X86] Add xgetbv/x[X86] Add xgetbv xsetbv intrinsics to non-windows platforms" This reverts commit r278783. It breaks usage of _xgetbv on Windows. llvm-svn: 278814	2016-08-16 16:04:14 +00:00
Marina Yatsina	197b65f833	[X86] Add xgetbv/x[X86] Add xgetbv xsetbv intrinsics to non-windows platforms commit on behalf of guyblank Differential Revision: https://reviews.llvm.org/D21959 llvm-svn: 278783	2016-08-16 08:13:36 +00:00
Lama Saba	5d01f224cf	[X86][AVX512] lower __mm512_andnot_ps/__mm512_andnot_pd to IR Differential revision: https://reviews.llvm.org/D23262 llvm-svn: 278209	2016-08-10 10:34:45 +00:00
Justin Lebar	2ef3dabd45	[CUDA] Add __device__ overloads for placement new and delete. Summary: Previously these sort of worked because they didn't end up resulting in calls at the ptx layer. But I'm adding stricter checks that break placement new without these changes. Reviewers: tra Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D23239 llvm-svn: 278194	2016-08-10 01:09:14 +00:00
Asaf Badouh	2f344b788c	[AVX512] integer comparisions enumeration. fix Bug 28842 https://llvm.org/bugs/show_bug.cgi?id=28842 Differential Revision: https://reviews.llvm.org/D22212 llvm-svn: 277955	2016-08-07 10:43:04 +00:00
Saleem Abdulrasool	afdef205d8	Headers: Add ARM support to intrin.h for MSVC compatibility This fixes compiling with headers from the Windows SDK for ARM, where the YieldProcessor function (in winnt.h) refers to _ARM_BARRIER_ISHST. The actual MSVC armintr.h contains a lot more definitions, but this is enough to build code that uses the Windows SDK but doesn't use ARM intrinsics directly. An alternative would to just keep the addition to intrin.h (to include armintr.h), but not actually ship armintr.h, instead having clang's intrin.h include armintr.h from MSVC's include directory. (That one works fine with clang, at least for building code that uses the Windows SDK.) Patch by Martin Storsjö! llvm-svn: 277928	2016-08-06 17:58:24 +00:00
Yaxun Liu	c489e39eca	[OpenCL] Remove extra native_ functions from opencl-c.h There should be no native_ builtin functions with double type arguments. Patch by Aaron En Ye Shi. Differential Revision : https://reviews.llvm.org/D23071 llvm-svn: 277754	2016-08-04 19:30:54 +00:00
Dimitry Andric	f8099f256d	Add more gcc compatibility names to clang's cpuid.h Summary: Some cpuid bit defines are named slightly different from how gcc's cpuid.h calls them. Define a few more compatibility names to appease software built for gcc: * `bit_PCLMUL` alias of `bit_PCLMULQDQ` * `bit_SSE4_1` alias of `bit_SSE41` * `bit_SSE4_2` alias of `bit_SSE42` * `bit_AES` alias of `bit_AESNI` * `bit_CMPXCHG8B` alias of `bit_CX8` While here, add the misssing 29th bit, `bit_F16C` (which is how gcc calls this bit). Reviewers: joerg, rsmith Subscribers: bruno, cfe-commits Differential Revision: https://reviews.llvm.org/D22010 llvm-svn: 277307	2016-07-31 20:23:23 +00:00
Eric Christopher	b638558e12	Remove unused variable. Fixes PR28761. llvm-svn: 277221	2016-07-29 22:11:11 +00:00
Yaxun Liu	c944e65a24	[OpenCL] Added CLK_ABGR definition for get_image_channel_order return value Added CLK_ABGR definition for get_image_channel_order return value inside opencl-c.h file. Patch by Aaron En Ye Shi. Differential Revision: https://reviews.llvm.org/D22767 llvm-svn: 277179	2016-07-29 17:50:10 +00:00
Craig Topper	351ed42795	[X86] Block pbroadcastq instructions on 32-bit targets instead of pbroadcastb. Thanks to Simon Pilgrim for catching the mistake. llvm-svn: 276564	2016-07-24 14:58:06 +00:00
Ekaterina Romanova	a84c24f39c	Add doxygen comments to emmintrin.h's intrinsics. Only around 50% of the intrinsics in this file are documented now. The patches for the rest of the intrisics in this file will be send out later. The doxygen comments are automatically generated based on Sony's intrinsics docu ment. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. This patch was internally reviewed by Paul Robinson. llvm-svn: 276499	2016-07-22 23:49:37 +00:00
Craig Topper	45db56c375	[X86] Add missing __x86_64__ qualifiers on a bunch of intrinsics that assume 64-bit GPRs are available. Usages of these intrinsics in a 32-bit build results in assertions in the backend. llvm-svn: 276249	2016-07-21 07:38:39 +00:00
Simon Pilgrim	e3b9ee0645	[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead. It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match). This patch changes both scalar and packed versions back to using x86-specific builtins. It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding. Differential Revision: https://reviews.llvm.org/D22105 llvm-svn: 276102	2016-07-20 10:18:01 +00:00
Asaf Badouh	a0b6f8fb56	[X86][AVX512F] minor fix of the parameter names add "__" prefix llvm-svn: 275384	2016-07-14 08:40:30 +00:00
Michael Zuckerman	3378653f8d	[Clang][AVX512] Making cosmetic changes llvm-svn: 275169	2016-07-12 12:42:27 +00:00
Craig Topper	4d61a3c2d8	[AVX512] Replace masked AND/OR/XOR intrinsics with native code and remove the builtins. llvm-svn: 275049	2016-07-11 06:14:18 +00:00
Craig Topper	6e76fb61a7	[X86] Use __butilin_shufflevector for 512-bit shufps intrinsics. llvm-svn: 275012	2016-07-10 05:57:21 +00:00
Craig Topper	95b61b0544	[X86] Use __builtin_ia32_vec_ext_v4hi and __builtin_ia32_vec_set_v4hi to implement pextrw/pinsertw MMX intrinsics instead of trying to use native IR. Without this we end up generating code that doesn't use mmx registers and probably doesn't work well with other mmx intrinsics. llvm-svn: 274968	2016-07-09 05:30:41 +00:00
Justin Bogner	2d5de7e568	NVPTX: Use the nvvm builtins to read SRegs rather than the legacy ptx ones The ptx spellings were removed from LLVM in r274769. llvm-svn: 274770	2016-07-07 16:41:08 +00:00
Justin Bogner	2f8de9fb4f	NVPTX: Rename __builtin_ptx_shfl -> __nvvm_shfl To match "NVPTX: Make the llvm.nvvm.shfl intrinsics and builtin names consistent" in LLVM. llvm-svn: 274663	2016-07-06 19:52:32 +00:00
Michael Zuckerman	b920665493	[Clang][Feature] Adding CLFLUSHOPT feature and intrinsic to clang Differential Revision: http://reviews.llvm.org/D21792 llvm-svn: 274559	2016-07-05 15:56:03 +00:00
Simon Pilgrim	f5a8837e1b	[X86][AVX512] Converted the VBROADCAST intrinsics to generic IR llvm-svn: 274544	2016-07-05 12:59:33 +00:00
Asaf Badouh	136332888a	[X86][AVX512F] add float/double abs intrinsics add abs intrinsics that use native LLVM-IR. change _mm512_mask[z]_and_epi{32\|64} to use select intrinsic Differential Revision: http://reviews.llvm.org/D21973 llvm-svn: 274542	2016-07-05 12:24:14 +00:00
Asaf Badouh	f9cdb8de7a	[AVX512] minor fix in sqrt{ss\|sd} intrinsics arguments Differential Revision: http://reviews.llvm.org/D21988 llvm-svn: 274541	2016-07-05 11:36:21 +00:00
Anastasia Stulova	db7a31cce7	[OpenCL] An implementation of device side enqueue (DSE) from OpenCL v2.0 s6.13.17. - Added new Builtins: enqueue_kernel, get_kernel_work_group_size and get_kernel_preferred_work_group_size_multiple. These Builtins use custom check to diagnose parameters of the passed Blocks i. e. variable number of 'local void*' type params, and check different overloads specified in Table 6.31 of OpenCL v2.0. - IR is generated as an internal library call for each OpenCL Builtin, reusing ObjC Block implementation. Review: http://reviews.llvm.org/D20249 llvm-svn: 274540	2016-07-05 11:31:24 +00:00
Michael Zuckerman	a72b49efe4	ntrinsics _mm256_permutexvar_epi64 doesn't accept three parameters as specify bellow. I deleted the extra mask parameter. __m256i _mm256_permutexvar_epi64 (__m256i idx, __m256i a) #include "immintrin.h" Instruction: vpermq CPUID Flags: AVX512VL + AVX512F Description Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst. Operation FOR j := 0 to 3 i := j64 id := idx[i+1:i]64 dst[i+63:i] := a[id+63:id] ENDFOR dst[MAX:256] := 0 dst[MAX:256] := 0 (From: Intel intrinsics guide) llvm-svn: 274539	2016-07-05 11:30:31 +00:00
Michael Zuckerman	7dac6fbdf8	[Clang][BuiltIn][AVX512] adding _mm{\|256\|512}_mask_cvt{s\|us\|}epi16_storeu_epi8 intrinsics Differential Revision: http://reviews.llvm.org/D21729 llvm-svn: 274532	2016-07-05 08:08:01 +00:00
Craig Topper	2a383c9273	[X86] Use undefined instead of setzero in shufflevector based intrinsics when the second source is unused. Rewrite immediate extractions in shuffle intrinsics to be in ((c >> x) & y) form instead of ((c & z) >> x). This way only x varies between each use instead of having to vary x and z. llvm-svn: 274525	2016-07-04 22:18:01 +00:00
Simon Pilgrim	427154db2a	[X86][AVX512] Converted the VSHUFPD intrinsics to generic IR llvm-svn: 274523	2016-07-04 21:30:47 +00:00
Simon Pilgrim	30db811526	[X86][AVX512] Converted the VPERMPD/VPERMQ intrinsics to generic IR llvm-svn: 274502	2016-07-04 13:34:44 +00:00
Simon Pilgrim	17388f2569	[X86][AVX512] Converted the VPERMILPD/VPERMILPS intrinsics to generic IR llvm-svn: 274492	2016-07-04 11:06:15 +00:00
Simon Pilgrim	275d721485	[X86][AVX512] Converted the MOVDDUP/MOVSLDUP/MOVSHDUP masked intrinsics to generic IR llvm companion patch imminent llvm-svn: 274442	2016-07-02 17:16:25 +00:00
Craig Topper	b3a4477b13	[X86] Replace 128-bit and 256 masked vpermilps/vpermilpd builtins with native IR. llvm-svn: 274425	2016-07-02 05:36:43 +00:00
Michael Zuckerman	3f316abdce	[Clang][Intrinsics][AVX512][BuiltIn] adding intrinsics for vrangesd instruction set Differential Revision: http://reviews.llvm.org/D21734 llvm-svn: 274218	2016-06-30 08:05:46 +00:00
Alexey Bader	e5b3aebfb5	[OpenCL] Add attribute 'pure' to read_image built-in functions to enable optimizations. Reviewers: Anastasia, yaxunl Subscribers: pekka.jaaskelainen, pxli168, cfe-commits Differential Revision: http://reviews.llvm.org/D21795 llvm-svn: 274122	2016-06-29 12:30:26 +00:00
David Majnemer	2916a612cd	[intrin.h] Certain _Interlocked intrinsics return the old value This fixes PR28326. llvm-svn: 273986	2016-06-28 02:54:43 +00:00
Asaf Badouh	57819aa185	[X86] add _mm_loadu_si64 Differential Revision: http://reviews.llvm.org/D21504 llvm-svn: 273812	2016-06-26 13:51:54 +00:00
Craig Topper	50e3dfe9d0	[X86] Fix pslldq/psrldq intrinsics to not fail compilation with immediates larger than 16. This was accidentally broken in r272246. llvm-svn: 273775	2016-06-25 07:31:14 +00:00
Craig Topper	79f53ca0b5	[AVX512] Replace masked unpack builtins with shufflevector and selects. llvm-svn: 273533	2016-06-23 06:36:42 +00:00
Michael Zuckerman	716859aa64	[Clang][bmi][intrinsics] Adding _mm_tzcnt_64 _mm_tzcnt_32 intrinsics to clang. Differential Revision: http://reviews.llvm.org/D21373 llvm-svn: 273401	2016-06-22 12:32:43 +00:00
Craig Topper	9ce3ddf2e6	[AVX512] Use a __v8hi vector inside of _mm_setzero_hi to match its name. Probably no real functional change. llvm-svn: 273389	2016-06-22 06:36:23 +00:00
Craig Topper	08181f795f	[AVX512] Fix _mm_setzero_di to not require avx512vl since its used by the avx512dqintrin.h. Also update the avx512dq test to not enable avx512vl feature so we can ensure correct dependencies. llvm-svn: 273388	2016-06-22 06:36:21 +00:00
Craig Topper	c89dda5938	[AVX512] Add missing typecasts to intrinsics. llvm-svn: 273386	2016-06-22 06:36:16 +00:00
Craig Topper	879b0978f4	[AVX512] Move the 128-bit and 256-bit lzcnt intrinsics to avx512vlcdintrin.h where they belong. llvm-svn: 273249	2016-06-21 06:53:58 +00:00
Yaxun Liu	143f083e4b	[OpenCL] Include opencl-c.h by default as a clang module Include opencl-c.h by default as a module to utilize the automatic AST caching mechanism of clang modules. Add an option -finclude-default-header to enable default header for OpenCL, which is off by default. Differential Revision: http://reviews.llvm.org/D20444 llvm-svn: 273191	2016-06-20 19:26:00 +00:00
Zvi Rackover	453d734201	[X86] _MM_ALIGN16 attribute support for non-windows targets Summary: This patch adds support for the _MM_ALIGN16 attribute on non-windows targets. This aligns Clang with ICC which supports the attribute on all targets. Fixes PR28056 Reviewers: aaboud, echristo, cfe-commits, mkuper Subscribers: zvi, mehdi_amini Projects: #clang-c Differential Revision: http://reviews.llvm.org/D21173 llvm-svn: 273095	2016-06-18 20:01:07 +00:00
Saleem Abdulrasool	5065d8cfc9	Headers: wordsmith error message Use the marketing name for the MSVC release as pointed out by Nico Weber! llvm-svn: 272979	2016-06-17 00:27:02 +00:00
Saleem Abdulrasool	13f3baf572	Headers: tweak for MSVC[<1800] Earlier versions of MSVC did not include inttypes.h. Ensure that we dont try to include_next on those releases. llvm-svn: 272741	2016-06-15 00:28:15 +00:00
Hans Wennborg	f8b91f8336	s/Intrin.h/intrin.h/, trying to fix the build after r272701 llvm-svn: 272702	2016-06-14 20:14:24 +00:00
Nico Weber	73384a8f76	Rename Intrin.h to intrin.h, that's how all the documentation calls it. llvm-svn: 272701	2016-06-14 19:54:40 +00:00
Michael Zuckerman	c49f6ce3e1	[Clang][avx512][Intrinsics] adding prefetch gather intrinsics Differential Revision: http://reviews.llvm.org/D21322 llvm-svn: 272667	2016-06-14 13:45:17 +00:00
Michael Zuckerman	223676d2cc	[Clang][AVX512][intrinsics] Adding missing intrinsics div_pd and div_ps Differential Revision: http://reviews.llvm.org/D20626 llvm-svn: 272658	2016-06-14 12:38:58 +00:00
David Majnemer	d423574fde	[immintrin] Reimplement _bit_scan_{forward,reverse} There is no need to use a target-specific intrinsic to implement _bit_scan_forward or _bit_scan_reverse, reimplementing them using generic intrinsics makes it more likely that the middle end will understand what's going on. llvm-svn: 272564	2016-06-13 17:26:16 +00:00
Asaf Badouh	880f0c252b	[X86][AVX512F] bugfix - sqrtps should get __mask16 as mask parameter CR: Michael Zuckerman llvm-svn: 272549	2016-06-13 15:15:57 +00:00
Simon Pilgrim	beca5f295c	[Clang][X86] Convert non-temporal store builtins to generic __builtin_nontemporal_store in headers We can now use __builtin_nontemporal_store instead of target specific builtins for naturally aligned nontemporal stores which avoids the need for handling in CGBuiltin.cpp The scalar integer nontemporal (unaligned) store builtins will have to wait as __builtin_nontemporal_store currently assumes natural alignment and doesn't accept the 'packed struct' trick that we use for normal unaligned load/stores. The nontemporal loads require further backend support before we can safely convert them to __builtin_nontemporal_load Differential Revision: http://reviews.llvm.org/D21272 llvm-svn: 272540	2016-06-13 09:57:52 +00:00
Craig Topper	fc07498e4a	[AVX512] Masked pcmpeqd, pcmpeqq, pcmpgtd, and pcmpgtq don't require avx512bw, just avx512vl. llvm-svn: 272532	2016-06-13 04:15:11 +00:00
Craig Topper	7cc9263ec2	[AVX512] Implement masked and 512-bit pshufd intrinsics directly with __builtin_shufflevector and __builtin_ia32_select. llvm-svn: 272467	2016-06-11 12:50:19 +00:00
Craig Topper	26d5b87316	[X86] Add explicit typecasts to some intrinsics. llvm-svn: 272466	2016-06-11 12:50:12 +00:00
Craig Topper	68738332b8	[AVX512] Implement 512-bit and masked shufflelo and shufflehi intrinsics directly with __builtin_shufflevector and __builtin_ia32_select. Also improve the formatting of the AVX2 version. llvm-svn: 272452	2016-06-11 03:31:13 +00:00
Craig Topper	d4273a425e	[AVX512] Add _mm512_bsrli_epi128 and _mm512_bslli_epi128 intrinsics. llvm-svn: 272451	2016-06-11 03:31:07 +00:00
Ekaterina Romanova	71a68c928a	Add doxygen comments to mmintrin.h's intrinsics. The doxygen comments are automatically generated based on Sony's intrinsics docu ment. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 272350	2016-06-10 00:10:40 +00:00
Justin Lebar	4fb5711751	[CUDA] Implement __shfl* intrinsics in clang headers. Summary: Clang changes to make use of the LLVM intrinsics added in D21160. Reviewers: tra Subscribers: jholewinski, cfe-commits Differential Revision: http://reviews.llvm.org/D21162 llvm-svn: 272299	2016-06-09 20:04:57 +00:00
Craig Topper	2769bb5753	[X86] Handle AVX2 pslldqi and psrldqi intrinsics shufflevector creation directly in the header file instead of in CGBuiltin.cpp. Simplify the sse2 equivalents as well. llvm-svn: 272246	2016-06-09 05:15:12 +00:00
Craig Topper	3a0c7260f4	[X86] Add void to the argument list of intrinsics that don't take arguments since empty argument list mean something else in C. llvm-svn: 272244	2016-06-09 05:14:28 +00:00
Igor Breger	aadb876200	[AVX512] Emit select instruction instead of using x86 specific instrinsics. This will allow us to remove the x86 instrinics from the backend. Differential Revision: http://reviews.llvm.org/D21060 llvm-svn: 272141	2016-06-08 13:59:20 +00:00
Michael Zuckerman	c4ae8537cf	[Clang][AVX512][BUILTIN]Adding intrinsics for range_round_{sd\|ss} Differential Revision: http://reviews.llvm.org/D21002 llvm-svn: 272123	2016-06-08 08:19:27 +00:00
Ekaterina Romanova	50e94a3b34	Add doxygen comments to xmmintrin.h's intrinsics. Only half of the intrinsics in this file is documented here. The patch for the o ther half will be sent out later. The doxygen comments are automatically generated based on Sony's intrinsics docu ment. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 272121	2016-06-08 07:34:31 +00:00
Craig Topper	f3efec65bb	[AVX512] Reformat macro intrinsics, ensure arguments have proper typecasts, ensure result is typecasted back to the generic types. llvm-svn: 272119	2016-06-08 06:08:07 +00:00
Craig Topper	605894985f	[X86] Put parentheses around macro arguments in intrinsics. llvm-svn: 272118	2016-06-08 06:08:04 +00:00
Michael Zuckerman	96d0399658	[clang][AVX512][Intrinsics] Adding intrinsics reduce_[round]_{ss\|sd} to clang Differential Revision: http://reviews.llvm.org/D21014 llvm-svn: 272012	2016-06-07 14:00:20 +00:00
Michael Zuckerman	1a7889f203	Fixing problem with rsqrt28_sd maskz_rsqrt28_sd mapped to mask_rsqrt28_sd and not to the maskz. llvm-svn: 271836	2016-06-05 15:57:49 +00:00
Michael Zuckerman	95721ac863	[Clang][AVX512]Adding set4 intrinsics Differential Revision: http://reviews.llvm.org/D20866 llvm-svn: 271835	2016-06-05 15:43:30 +00:00
Michael Zuckerman	f36f6eb036	[Clang][AVX512][Intrinsics] Adding two definitions _mm512_setzero and _mm512_setzero_epi32 Differential Revision: http://reviews.llvm.org/D20871 llvm-svn: 271832	2016-06-05 15:12:52 +00:00
Craig Topper	6a77b62640	[X86] Use unsigned types for vector arithmetic in intrinsics to avoid undefined behavior for signed integer overflow. This is really only needed for addition, subtraction, and multiplication, but I did the bitwise ops too for overall consistency. Clang currently doesn't set NSW for signed vector operations so the undefined behavior shouldn't happen today. llvm-svn: 271778	2016-06-04 05:43:41 +00:00
Craig Topper	406d5cdf7c	[AVX512] Remove space in -1 constants. NFC llvm-svn: 271777	2016-06-04 05:43:37 +00:00
Asaf Badouh	89f657611c	[X86][AVX512] add intrinsics of Scalar FP to integer Differential Revision: http://reviews.llvm.org/D20861 llvm-svn: 271499	2016-06-02 08:11:35 +00:00
Michael Zuckerman	9e7d0a98fa	[Clang][AVX512][INTRINSICS] adding round cvt and fix regular cvtps_ph Differential Revision: http://reviews.llvm.org/D20870 llvm-svn: 271498	2016-06-02 07:44:08 +00:00
Simon Pilgrim	00880511b1	[X86][SSE] Replace (V)CVTTPS2DQ and VCVTTPD2DQ truncating (round to zero) f32/f64 to i32 with generic IR (clang) The 'cvtt' truncation (round to zero) conversions can be safely represented as generic __builtin_convertvector (fptosi) calls instead of x86 intrinsics. We already do this (implicitly) for the scalar equivalents. Note: I looked at updating _mm_cvttpd_epi32 as well but this still requires a lot more backend work to correctly lower (both for debug and optimized builds). Differential Revision: http://reviews.llvm.org/D20859 llvm-svn: 271436	2016-06-01 21:46:51 +00:00
Michael Zuckerman	6170c15fc6	[Clang][Intrinsics][avx512] Continue Adding round cvt to clang And remove trailing spaces in intrinsic f test Differential Revision: http://reviews.llvm.org/D20810 llvm-svn: 271398	2016-06-01 14:41:41 +00:00
Michael Zuckerman	e54093fcc0	Adding front-end support to several intrinsics (bit scanning, conversion and state reading intrinsics) Adding LLVM front-end support to two intrinsics dealing with bit scan: _bit_scan_forward and _bit_scan_reverse. Their functionality is as described in Intel intrinsics guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_bit_scan_forward&expand=371,370 https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_bit_scan_reverse&expand=371,370 Furthermore, adding clang front-end support to these conversion intrinsics: _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and _mm256_cvtss_f32. Finally, adding tests to all of the above, as well as to the state reading intrinsics _rdpmc and _rdtsc. Their functionality is also specified in the Intel intrinsics guide. Commit on behalf of Omer Paparo Bivas llvm-svn: 271387	2016-06-01 12:21:00 +00:00
Michael Zuckerman	e6aa66a53d	[Clang][Intrinsics][avx512] Adding round intrinsics fot max/min/sqrt instruction set to clang Differential Revision: http://reviews.llvm.org/D20812 llvm-svn: 271373	2016-06-01 08:34:03 +00:00
Michael Zuckerman	c301c194ec	[Clang][Intrinsics][avx512] Adding round roundscale to clang Differential Revision: http://reviews.llvm.org/D20815 llvm-svn: 271368	2016-06-01 07:35:44 +00:00
Michael Zuckerman	186d86738d	[Clang][Intrinsics][avx512] Adding round cvt to clang Differential Revision: http://reviews.llvm.org/D20790 llvm-svn: 271265	2016-05-31 11:27:34 +00:00
Craig Topper	74b5948f39	[X86] Use unaligned load intrinsics to implement other intrinsics instead of manually creating the unaligned load. llvm-svn: 271250	2016-05-31 05:49:13 +00:00
Simon Pilgrim	645e1ad33a	[X86][SSE] _mm_store1_ps/_mm_store1_pd should require an aligned pointer According to the gcc headers, intel intrinsics docs and msdn codegen the _mm_store1_pd (and its _mm_store_pd1 equivalent) should use an aligned pointer - the clang headers are the only implementation I can find that assume non-aligned stores (by storing with _mm_storeu_pd). Additionally, according to the intel intrinsics docs and msdn codegen the _mm_store1_ps (_mm_store_ps1) requires a similarly aligned pointer. This patch raises the alignment requirements to match the other implementations by calling _mm_store_ps/_mm_store_pd instead. I've also added the missing _mm_store_pd1 intrinsic (which maps to _mm_store1_pd like _mm_store_ps1 does to _mm_store1_ps). As a followup I'll update the llvm fast-isel tests to match this codegen. Differential Revision: http://reviews.llvm.org/D20617 llvm-svn: 271218	2016-05-30 17:55:25 +00:00
Justin Lebar	720f8da33a	[CUDA] Fix order of vectorized ldg intrinsics' elements. Summary: The order is [x, y, z, w], not [w, x, y, z]. Subscribers: cfe-commits, tra Differential Revision: http://reviews.llvm.org/D20794 llvm-svn: 271215	2016-05-30 17:12:55 +00:00
Craig Topper	09175dab31	[X86] Replace unaligned store builtins in SSE/AVX intrinsic files with code that will compile to a native unaligned store. Remove the builtins since they are no longer used. Intrinsics will be removed from llvm in a future commit. llvm-svn: 271214	2016-05-30 17:10:30 +00:00
Michael Zuckerman	9fcf3552ad	[Clang][avx512][builtin] Adding missing intrinsics for cvt Differential Revision: http://reviews.llvm.org/D20618 llvm-svn: 271205	2016-05-30 13:22:12 +00:00
Yaxun Liu	e8f49b9db7	[OpenCL] Add the default header file opencl-c.h for OpenCL C language OpenCL has large number of "builtin" functions ("builtin" in the sense of OpenCL spec) which are defined in header files. To compile OpenCL kernels using these builtin functions, a header file is needed. This header file is based on the Khronos implementation (https://github.com/KhronosGroup/SPIR/blob/spirv-1.0/lib/Headers/opencl.h) with heavy refactoring. Re-commit after fixing failures on ppc64/systemz etc. Differential Revision: http://reviews.llvm.org/D18369 llvm-svn: 271197	2016-05-30 02:22:28 +00:00
Simon Pilgrim	6d1a0c4c75	[X86][SSE] Make unsigned integer vector types generally available As discussed on http://reviews.llvm.org/D20684, move the unsigned integer vector types used for zero extension to make them available for general use. llvm-svn: 271187	2016-05-29 18:49:08 +00:00
Yaxun Liu	898eb39bfc	Revert r271136 [OpenCL] Add the default header file opencl-c.h for OpenCL C language due to build failure on ppc64/hexagon/systemz. llvm-svn: 271144	2016-05-28 19:50:40 +00:00
Yaxun Liu	e54d7c44d0	[OpenCL] Add the default header file opencl-c.h for OpenCL C language OpenCL has large number of "builtin" functions ("builtin" in the sense of OpenCL spec) which are defined in header files. To compile OpenCL kernels using these builtin functions, a header file is needed. This header file is based on the Khronos implementation (https://github.com/KhronosGroup/SPIR/blob/spirv-1.0/lib/Headers/opencl.h) with heavy refactoring. Differential Revision: http://reviews.llvm.org/D18369 llvm-svn: 271136	2016-05-28 19:09:01 +00:00
Simon Pilgrim	91b77ceaed	[X86][SSE] Replace VPMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (clang) The VPMOVSX and (V)PMOVZX sign/zero extension intrinsics can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics. This patch removes the clang builtins and their use in the sse2/avx headers - a companion patch will remove/auto-upgrade the llvm intrinsics. Note: We already did this for SSE41 PMOVSX sometime ago. Differential Revision: http://reviews.llvm.org/D20684 llvm-svn: 271106	2016-05-28 08:12:45 +00:00
Ekaterina Romanova	5a7f09c5af	Clean up: remove trailing spaces in x86 intrinsic headers. Differential Revision: http://reviews.llvm.org/D20614 llvm-svn: 271077	2016-05-28 00:18:59 +00:00
Ahmed Bougacha	5aa0ab3869	[Headers] Remove redundant typedef. NFC. llvm-svn: 271022	2016-05-27 17:57:23 +00:00
Craig Topper	32578b7dcf	[AVX512][Builtin] Fix palignr intrinsic for avx512vlbw. The immediate should not be multiplied by 8. The 512-bit version was fixed recently but this was missed. llvm-svn: 270970	2016-05-27 06:59:39 +00:00
David Majnemer	b2c5720bfd	[Intrin.h] Sort the __read[fg]s intrinsics No functional change is intended. llvm-svn: 270952	2016-05-27 02:06:14 +00:00
Michael Zuckerman	22c47e606a	Adding missing _mm512_castsi512_si256 intrinsic. llvm-svn: 270851	2016-05-26 14:32:11 +00:00
Michael Zuckerman	eb5f178c4b	Fix instrinsics names: _mm128_cmp_ps_mask-->_mm_cmp_ps_mask _mm128_mask_cmp_ps_mask-->_mm_mask_cmp_ps_mask _mm128_cmp_pd_mask-->_mm_cmp_pd_mask _mm128_mask_cmp_pd_mask-->_mm_mask_cmp_pd_mask llvm-svn: 270830	2016-05-26 08:10:12 +00:00
Michael Zuckerman	6f08cebf36	[Clang][AVX512][BUILTIN] Adding intrinsics for set1 Differential Revision: http://reviews.llvm.org/D20562 llvm-svn: 270825	2016-05-26 06:54:52 +00:00
Michael Zuckerman	efbf3f108e	[Clang][AVX512][Builtin] Fix palignr intrinsics header Differential Revision: http://reviews.llvm.org/D20620 llvm-svn: 270707	2016-05-25 15:05:03 +00:00
Michael Zuckerman	d5cc6cd262	[Clang][AVX512][BUILTIN] Add missing intrinsics for cast Differential Revision: http://reviews.llvm.org/D20523 llvm-svn: 270699	2016-05-25 14:04:21 +00:00
Eric Christopher	d83af71b3a	Make the altivec intrinsics that require immediate constant propagation macros rather than functions. Unfortunately couldn't come up with a simple testcase that didn't need code generation to verify what was going on. llvm-svn: 270625	2016-05-24 22:25:06 +00:00
Simon Pilgrim	90770c7c76	[X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR Both the (V)CVTDQ2PD(Y) (i32 to f64) and (V)CVTPS2PD(Y) (f32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen. This patch removes the clang builtins and their use in the sse2/avx headers - a future patch will deal with removing the llvm intrinsics, but that will require a bit more work. Differential Revision: http://reviews.llvm.org/D20528 llvm-svn: 270499	2016-05-23 22:13:02 +00:00
Justin Lebar	91f6f07bb8	[CUDA] Add -fcuda-approx-transcendentals flag. Summary: This lets us emit e.g. sin.approx.f32. See http://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions-sin Reviewers: rnk Subscribers: tra, cfe-commits Differential Revision: http://reviews.llvm.org/D20493 llvm-svn: 270484	2016-05-23 20:19:56 +00:00
Michael Zuckerman	f86eb71616	[clang][AVX512][Builtin] adding missing intrinsics for vpmultishiftqb{128\|256\|512} instruction set . Differential Revision: http://reviews.llvm.org/D20521 llvm-svn: 270441	2016-05-23 15:04:39 +00:00
Michael Zuckerman	e6542002fc	[Clang][AVX512][BUILTIN]adding missing intrinsics for movdaq instruction set Differential Revision: http://reviews.llvm.org/D20514 llvm-svn: 270401	2016-05-23 08:01:48 +00:00
Simon Pilgrim	28666ce778	[X86][AVX] Ensure zero-extension of _mm256_extract_epi8 and _mm256_extract_epi16 Ensure _mm256_extract_epi8 and _mm256_extract_epi16 zero extend their i8/i16 result to i32. This matches _mm_extract_epi8 and _mm_extract_epi16. Fix for PR27594 Differential Revision: http://reviews.llvm.org/D20468 llvm-svn: 270330	2016-05-21 21:14:35 +00:00
Richard Smith	b391930bbf	Re-alphabetize this file list. llvm-svn: 270170	2016-05-20 01:07:10 +00:00
Richard Smith	f5c3a63c28	Revert incorrect module map changes in r269907 and replace them with the appropriate changes. llvm-svn: 270169	2016-05-20 01:06:47 +00:00
Justin Lebar	2e4ecfdebe	[CUDA] Implement __ldg using intrinsics. Summary: Previously it was implemented as inline asm in the CUDA headers. This change allows us to use the [addr+imm] addressing mode when executing ld.global.nc instructions. This translates into a 1.3x speedup on some benchmarks that call this instruction from within an unrolled loop. Reviewers: tra, rsmith Subscribers: jhen, cfe-commits, jholewinski Differential Revision: http://reviews.llvm.org/D19990 llvm-svn: 270150	2016-05-19 22:49:13 +00:00
Michael Zuckerman	178113e8cc	[Clang][AVX512][intrinsics] continue completing missing set intrinsics Differential Revision: http://reviews.llvm.org/D20160 llvm-svn: 270047	2016-05-19 12:07:49 +00:00
Michael Zuckerman	2cacc35343	[Clang][AVX512] completing missing intrinsics [pandnd]. Differential Revision: http://reviews.llvm.org/D20101 llvm-svn: 269939	2016-05-18 15:25:53 +00:00
Ashutosh Nema	51c9dd0081	Add new intrinsic support for MONITORX and MWAITX instructions Summary: MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT pair while adding a timer function, such that another termination of the MWAITX instruction occurs when the timer expires. The presence of the MONITORX and MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29. The MONITORX and MWAITX instructions are intercepted by the same bits that intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be monitored. MWAITX instruction causes the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is "0F 01 FB". These opcode information is used in adding tests for the disassembler. These instructions are enabled for AMD's bdver4 architecture. Patch by Ganesh Gopalasubramanian! Reviewers: echristo, craig.topper Subscribers: RKSimon, joker.eph, llvm-commits, cfe-commits Differential Revision: http://reviews.llvm.org/D19796 llvm-svn: 269907	2016-05-18 11:56:23 +00:00
Craig Topper	8c18e1120d	[AVX512] Add parentheses around macro arguments in AVX512F intrinsics. Remove leading underscores from macro argument names. Add explicit typecasts to all macro arguments and return values. And finally reformat after all the adjustments. This is a mostly mechanical change accomplished with a script. I tried to split out any changes to the typecasts that already existed into separate commits. llvm-svn: 269746	2016-05-17 04:41:50 +00:00
Craig Topper	d266188540	[AVX512] Add parentheses around macro arguments in AVX512VL intrinsics. Remove leading underscores from macro argument names. Add explicit typecasts to all macro arguments and return values. And finally reformat after all the adjustments. This is a mostly mechanical change accomplished with a script. I tried to split out any changes to the typecasts that already existed into separate commits. llvm-svn: 269745	2016-05-17 04:41:48 +00:00
Craig Topper	f2e67a03fe	[AVX512] Add parentheses around macro arguments in AVX512VLDQ intrinsics. Remove leading underscores from macro argument names. Add explicit typecasts to all macro arguments and return values. And finally reformat after all the adjustments. This is a mostly mechanical change accomplished with a script. I tried to split out any changes to the typecasts that already existed into separate commits. llvm-svn: 269744	2016-05-17 04:41:46 +00:00
Craig Topper	1a15b6aff2	[AVX512] Add parentheses around macro arguments in AVX512VLBW intrinsics. Remove leading underscores from macro argument names. Add explicit typecasts to all macro arguments and return values. And finally reformat after all the adjustments. This is a mostly mechanical change accomplished with a script. I tried to split out any changes to the typecasts that already existed into separate commits. llvm-svn: 269743	2016-05-17 04:41:42 +00:00
Craig Topper	8e95bb99fe	[AVX512] Add parentheses around macro arguments in AVX512PF intrinsics. Remove leading underscores from macro argument names. Add explicit typecasts to all macro arguments and return values. And finally reformat after all the adjustments. This is a mostly mechanical change accomplished with a script. I tried to split out any changes to the typecasts that already existed into separate commits. llvm-svn: 269742	2016-05-17 04:41:40 +00:00
Craig Topper	0bb4664a88	[AVX512] Add parentheses around macro arguments in AVX512ER intrinsics. Remove leading underscores from macro argument names. Add explicit typecasts to all macro arguments and return values. And finally reformat after all the adjustments. This is a mostly mechanical change accomplished with a script. I tried to split out any changes to the typecasts that already existed into separate commits. llvm-svn: 269741	2016-05-17 04:41:38 +00:00
Craig Topper	41ad25a0f9	[AVX512] Add parentheses around macro arguments in AVX512DQ intrinsics. Remove leading underscores from macro argument names. Add explicit typecasts to all macro arguments and return values. And finally reformat after all the adjustments. This is a mostly mechanical change accomplished with a script. I tried to split out any changes to the typecasts that already existed into separate commits. llvm-svn: 269740	2016-05-17 04:41:36 +00:00
Craig Topper	709235674b	[AVX512] Add parentheses around macro arguments in AVX512BW intrinsics. Remove leading underscores from macro argument names. Add explicit typecasts to all macro arguments and return values. And finally reformat after all the adjustments. This is a mostly mechanical change accomplished with a script. I tried to split out any changes to the typecasts that already existed into separate commits. llvm-svn: 269739	2016-05-17 04:41:33 +00:00
Craig Topper	58187d33b7	[AVX512] Correct types for scalar double precision FMA intrinsics and single precision getexp intrinsics. llvm-svn: 269737	2016-05-17 04:41:29 +00:00
Craig Topper	cd45b1a7c7	[X86] Add a few missing typecasts to intrinsics. Found by playing with -fno-lax-vector-conversions on the builtin tests. llvm-svn: 269734	2016-05-17 03:42:31 +00:00
Craig Topper	3007cde8c5	[AVX512] _m512_setzero_qi/hi should return __m512i. llvm-svn: 269733	2016-05-17 03:42:25 +00:00
Craig Topper	f6d024edff	[AVX512] Fix odd formatting in intrinsic header. llvm-svn: 269732	2016-05-17 03:42:15 +00:00
Ekaterina Romanova	1168fdc9df	Doxygen comments for avxintrin.h. Added doxygen comments to avxintrin.h's intrinsics. As of now, only around 50% of the intrinsics in this file are documented here. The patches for the other half will be sent out later. Updated bmiintrin.h to fix an incorrect section name. Updated f16cintrin.h to fix incorect parameter names. The doxygen comments are automatically generated based on Sony's intrinsics document. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 269718	2016-05-16 22:54:45 +00:00
Michael Zuckerman	bf05a4589e	[Clang][AVX512] completing missing intrinsics for [vpabs] instruction set Differential Revision: http://reviews.llvm.org/D20069 llvm-svn: 269680	2016-05-16 18:57:24 +00:00
Nico Weber	379a1952b3	[ms] Reintroduce feature guards in intrinsic headers in Microsoft mode Visual Studio's C++ standard library headers include intrin.h, so the intrinsic headers get included a lot more often in Microsoft mode than elsewhere. The AVX512 intrinsics are a lot of code (0.7 MB, causing 30% compile time overhead for small programs including e.g. <string> and 6% compile time overhead for larger projects like e.g. v8). Since multiversioning can't be relied on in Microsoft mode (cl.exe doesn't support it), having faster compiles seems like the much better tradeoff until we have a better intrinsic story going forward (which we'll need for e.g. PR19898). Actually using intrinsics on Windows already requires the right /arch: settings, so this patch should have no big behavior change. See also thread "The intrinsics headers (especially avx512) are too big. What to do about it?" on cfe-dev. http://reviews.llvm.org/D20291 llvm-svn: 269675	2016-05-16 18:14:07 +00:00
Michael Zuckerman	cb85677471	[Clang][AVX512] completing missing intrinsics [vsqrt\|vrsqrt\|vrcp14 ]. Differential Revision: http://reviews.llvm.org/D20068 llvm-svn: 269649	2016-05-16 11:42:01 +00:00
Craig Topper	1aa231e3aa	[X86] Add typecasts to remove most assumptions about what __m128i/__m256i is defined as. Add similar typecasts for the fp types as well. llvm-svn: 269632	2016-05-16 06:38:42 +00:00
Craig Topper	9c6c85f1ad	[AVX512] Add typecasts to some intrinsics to avoid doing operations on the __m512/__m512i/__m512d types. llvm-svn: 269631	2016-05-16 06:38:36 +00:00
Craig Topper	91f23d900f	[X86] Remove bad cast from the 'int' return type of __builtin_ia32_kortestchi to '__mask16' before return in an 'int' intrinsic. llvm-svn: 269621	2016-05-16 01:09:16 +00:00
Craig Topper	7d00d2031d	[AVX512] Fix bad typecasts on return value for 512-bit integer byte/word compare builtins. llvm-svn: 269620	2016-05-16 00:51:06 +00:00
Craig Topper	dca1f230ae	[AVX512] Add intrinsics for 512-bit insertf32x8/insertf32x4/inserti32x4. llvm-svn: 269617	2016-05-15 21:26:20 +00:00

... 2 3 4 5 6 ...

1222 Commits