llvm-project

Commit Graph

Author	SHA1	Message	Date
Adam Nemet	286ae08e7d	Implement AVX1 vbroadcast intrinsics with vector initializers These intrinsics are special because they directly take a memory operand (AVX2 adds the register counterparts). Typically, other non-memop intrinsics take registers and then it's left to isel to fold memory operands. In order to LICM intrinsics directly reading memory, we require that no stores are in the loop (LICM) or that the folded load accesses constant memory (MachineLICM). When neither is the case we fail to hoist a loop-invariant broadcast. We can work around this limitation if we expose the load as a regular load and then just implement the broadcast using the vector initializer syntax. This exposes the load to LICM and other optimizations. At the IR level this is translated into a series of insertelements. The sequence is already recognized as a broadcast so there is no impact on the quality of codegen. _mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch because right now we lack the DAG-combiner smartness to recover the broadcast instructions. This will be tackled in a follow-on. There will be completing changes on the LLVM side to remove the LLVM intrinsics and to auto-upgrade bitcode files. Fixes <rdar://problem/16494520> llvm-svn: 209846	2014-05-29 20:47:29 +00:00
Filipe Cabecinhas	5d289b48b1	Patched clang to emit x86 blends as shufflevectors. Summary: Most of the clang header patch by Simon Pilgrim @ SCEE. Also fixed (or added) clang tests for these intrinsics. LLVM tests to make sure we get the blend instruction out of these shufflevectors are at http://reviews.llvm.org/D3600 Reviewers: eli.friedman, craig.topper, rafael Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D3601 llvm-svn: 208664	2014-05-13 02:37:02 +00:00
Manman Ren	c94122e05b	Intrinsics: fix extract & insert when index is out of bound. Now, all extract & insert intrinsics should have the correct and operation to ignore higher bits. rdar://15250497 llvm-svn: 193267	2013-10-23 20:33:14 +00:00
Craig Topper	c5244512c8	Use a shuffle with undef elements instead of inserting 0s in the 128-bit to 256-bit casting intrinsics to improve performance. Thanks to Katya Romanova for identifying this issue. llvm-svn: 187716	2013-08-05 06:17:21 +00:00
Richard Smith	49e56440f9	Add missing include guards into headers in lib/Headers. While it may appear that these headers should not be included more than once, they are in fact included twice when building our builtins module (in order for it to generate submodules for them), and without this, any modular build enabling AVX and including any builtin header fails. Testing this is tricky because including any of these headers in a modular build is liable to fail, due to unrelated builtin headers in the same module including headers which might not be available on the system running the tests. Suggestion on that front are welcome (but we're getting close to being able to run a buildbot that has modules enabled for all tests, which would nicely solve the testing problem). llvm-svn: 186275	2013-07-14 05:41:45 +00:00
Reid Kleckner	7ab75b3f68	Avoid names like __in that conflict with SAL in builtin headers Microsoft's Source Annotation Language (SAL) defines a bunch of keywords for annotating the inputs and outputs of functions. Empty definitions for the keywords are provided by <stdlib.h> -> <crtdefs.h> -> <sal.h>. This makes it basically impossible to include MSVC's stdlib.h and Clang's *mmintrin.h headers at the same time if they have variables named __in. As a workaround, I've renamed those variables. This fixes the Modules/compiler_builtins.m test which was XFAILed, presumably due to this conflict. llvm-svn: 179860	2013-04-19 17:00:14 +00:00
David Blaikie	5bb700360c	Readd an open paren that was lost while reformatting code. llvm-svn: 172669	2013-01-16 23:13:42 +00:00
David Blaikie	3302f2bd46	PR14964: intrinsic headers using non-reserved identifiers Several of the intrinsic headers were using plain non-reserved identifiers. C++11 17.6.4.3.2 [global.names] p1 reservers names containing a double begining with an underscore followed by an uppercase letter for any use. I think I got them all, but open to being corrected. For the most part I didn't bother updating function-like macro parameter names because I don't believe they're subject to any such collission - though some function-like macros already follow this convention (I didn't update them in part because the churn was more significant as several function-like macros use the double underscore prefixed version of the same name as a parameter in their implementation) llvm-svn: 172666	2013-01-16 23:08:36 +00:00
Craig Topper	26e74e50b6	Convert vperm2f128 and vperm2i128 intrinsics back to using llvm intrinsics. Unfortunately, these instructions have behavior that can't be modeled with shuffle vector. llvm-svn: 154906	2012-04-17 05:16:56 +00:00
Chad Rosier	2c5154224b	Fix the signatures for the _mm256_storeu2_* intrinsics. PR12532 llvm-svn: 154591	2012-04-12 16:29:08 +00:00
Craig Topper	678a53c350	Fix shuffle vector calculation for mm_permute_ps. Fixes PR 12401. llvm-svn: 153724	2012-03-30 05:09:18 +00:00
Chad Rosier	f8df4f4e3b	[avx] Define the _mm256_loadu2_xxx and _mm256_storeu2_xxx intrinsics. From the Intel Optimization Reference Manual, Section 11.6.2. When data cannot be aligned or alignment is not known, 16-byte memory accesses may provide better performance. rdar://11076953 llvm-svn: 153091	2012-03-20 16:40:00 +00:00
Craig Topper	e5ea3b0239	Remove vperm2f* and vperm2i builtins. Same effect can be achieved with builtin_shufflevector. llvm-svn: 150064	2012-02-08 07:33:36 +00:00
Craig Topper	fec9f8edb7	Remove vpermilp* builtins. Same effect can be achieved with builtin_shufflevector. llvm-svn: 150056	2012-02-08 05:16:54 +00:00
Craig Topper	9e9301a83a	Represent 256-bit unaligned loads natively and remove the builtins. Similar change was made for 128-bit versions a while back. llvm-svn: 148919	2012-01-25 04:26:17 +00:00
Craig Topper	9f00948a82	Add AVX2 permute intrinsics. Also add parentheses on some macro arguments in other intrinsic headers. llvm-svn: 147241	2011-12-24 07:55:14 +00:00
Chad Rosier	7caca84ce4	Fix _mm_permute_ps and _mm256_permute_ps AVX intrinsics to use "I" (ICE) markings. Fix avxintrin.h to take them into account. Part of rdar://10595450 llvm-svn: 146810	2011-12-17 01:51:05 +00:00
Chad Rosier	93375d5fa5	Revert r146797, which was a partial revert of r146791; It was correct in the first place. The permutevar_* (note the var) intrinsics use ymm/mem. llvm-svn: 146807	2011-12-17 01:39:56 +00:00
Chad Rosier	0adfe7aa2f	Fix _mm256_extractf128_* AVX intrinsics to use "I" (ICE) markings. Fix avxintrin.h to take them into account. Part of rdar://10595450 llvm-svn: 146804	2011-12-17 01:22:27 +00:00
Chad Rosier	3648646b2b	Partial revert of r146791; vpermilps/vpermilpd instructions accepts ymm/mem/imm8. llvm-svn: 146797	2011-12-17 00:50:42 +00:00
Chad Rosier	060d03be1c	Fix _mm256_round_pd, _mm256_round_ps, _mm_permute_pd and _mm256_permute_pd AVX intrinsics to use "I" (ICE) markings. Fix avxintrin.h to take them into account. Part of rdar://10595450 llvm-svn: 146791	2011-12-17 00:15:26 +00:00
Chad Rosier	33d22d8def	Fix vinsertf128_* AVX intrinsics to use "I" (ICE) markings. Fix avxintrin.h to take them into account. rdar://10590282 llvm-svn: 146758	2011-12-16 21:40:31 +00:00
Chad Rosier	9138fea25e	Fix vperm2f128_* AVX intrinsics to use "I" (ICE) markings. Fix avxintrin.h to take them into account. rdar://10576962 llvm-svn: 146757	2011-12-16 21:07:34 +00:00
Eli Friedman	f16beb3942	Fix some additional x86 intrinsics to use "I" (ICE) markings. Fix *mmintrin.h to take them into account. <rdar://problem/10341145> llvm-svn: 144246	2011-11-10 00:11:13 +00:00
Bob Wilson	c9b97cc1da	Fix vector macros to correctly check argument types. <rdar://problem/10261670> llvm-svn: 143792	2011-11-05 06:08:06 +00:00
Bruno Cardoso Lopes	7a98a7e681	Fix _mm256_shuffle_ps mask! Example, for mask=203, Instead of: <i32 3, i32 2, i32 8, i32 11, i32 3, i32 6, i32 12, i32 15> generate: <i32 3, i32 2, i32 8, i32 11, i32 7, i32 6, i32 12, i32 15> llvm-svn: 138411	2011-08-23 23:29:45 +00:00
John McCall	91a528841b	Implement the AVX cmp builtins as macros instead of static inlines. Patch by Syoyo Fujita! Reviewed by Chris Lattner! Checked in by me! llvm-svn: 128984	2011-04-06 03:37:51 +00:00
Benjamin Kramer	6f35f3cd80	Disallow direct inclusion of avxintrin.h. Users should include immintrin.h instead. This matches GCC's behavior. llvm-svn: 111692	2010-08-20 23:00:03 +00:00
Bruno Cardoso Lopes	8c333153e0	Fix define inserting a comma :) llvm-svn: 110839	2010-08-11 18:45:43 +00:00
Bruno Cardoso Lopes	65954ffc69	Remove 256-bit cast built-ins and make the AVX intrinsic call llvm __builtin_shufflevector with the appropriate arguments llvm-svn: 110771	2010-08-11 02:14:38 +00:00
Bruno Cardoso Lopes	a4f1930b75	Remove 256-bit unpack built-ins and make the AVX intrinsic call llvm __builtin_shufflevector with the appropriate arguments llvm-svn: 110768	2010-08-11 01:43:24 +00:00
Bruno Cardoso Lopes	e712a135b7	Remove 256-bit shuffle built-ins and make the AVX intrinsic call llvm __builtin_shufflevector with the appropriate arguments llvm-svn: 110766	2010-08-11 01:17:34 +00:00
Bruno Cardoso Lopes	3d3fc1d075	Make replicate intrinsics use shufflevector instead of dup builtins, also remove the dup builtins llvm-svn: 110646	2010-08-10 02:23:54 +00:00
Bruno Cardoso Lopes	3d19889ca8	Fix AVX 256-bit intrinsics headers by using the right cast type while dealing with logical ops llvm-svn: 110389	2010-08-05 23:04:58 +00:00
Bruno Cardoso Lopes	fc2320fd73	Logical AVX instrinsics can be matched directly, no need to use builtins here. llvm-svn: 110271	2010-08-04 22:56:42 +00:00
Bruno Cardoso Lopes	7c4b513a3f	Add AVX intrinsics header llvm-svn: 110253	2010-08-04 22:03:36 +00:00

36 Commits