llvm-project

Commit Graph

Author	SHA1	Message	Date
Ekaterina Romanova	2e041c9c20	[DOXYGEN] Documentation for the newly added x86 intrinsics. Added doxygen comments for the newly added intrinsics in avxintrin.h, namely _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and _mm256_cvtss_f32 Added doxygen comments for the new intrinsics in emmintrin.h, namely _mm_loadu_si64 and _mm_load_sd. Explicit parameter names were added for _mm_clflush and _mm_setcsr The rest of the changes are editorial, removing trailing spaces at the end of the lines. Differential Revision: https://reviews.llvm.org/D28503 llvm-svn: 291876	2017-01-13 01:14:08 +00:00
Ekaterina Romanova	16166a4d71	[DOXYGEN] Improved doxygen comments for tmmintrin.h intrinsics. Added \n commands to insert a line breaks where necessary to make the documentation more readable. Formatted comments to fit into 80 chars. llvm-svn: 290458	2016-12-23 23:36:26 +00:00
Ekaterina Romanova	0c1c3bbc78	[DOXYGEN] Improved doxygen comments for x86 intrinsics headers. Tagged instruction names with <c> INSTR_NAME </c> to display them in typewriter font. In the past, \c command was used, unfortunately it applied to only one word. <c> .. </c> has the same meaning, but applies to all words in between the tags. llvm-svn: 289249	2016-12-09 18:35:50 +00:00
Ekaterina Romanova	d6042197db	[DOXYGEN] Improved doxygen comments for avxintrin.h intrinsics. Tagged parameter names with \a doxygen command to display them in italics. Formatted comments to fit into 80 chars. llvm-svn: 289022	2016-12-08 04:09:17 +00:00
Ekaterina Romanova	4c77e8940e	[DOXYGEN] Updated instruction names corresponding to avxintrin.h intrinsics. Documentation for some of the avxintrin.h's intrinsics errorneously said that non VEX-prefixed instructions could be generated. This was fixed. I tried several different solutions to achieve pretty printing of unordered lists (nested and non-nested) in param sections in doxygen. llvm-svn: 287990	2016-11-26 19:38:19 +00:00
Ekaterina Romanova	0a70076121	Doxygen comments for avxintrin.h. Added doxygen comments to avxintrin.h's intrinsics. As of now, all the intrinsics in this file that were documented by Sony's intrinsics guide should have corresponding doxygen comments. Note: The doxygen comments are automatically generated based on Sony's intrinsic s document. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. Reviewed by Wolfgang Pieb. llvm-svn: 287436	2016-11-19 04:59:08 +00:00
Ekaterina Romanova	2174b6fe72	Minor changes in x86 intrinsics headers; NFC I made several changes for consistency with the rest of x86 instrinsics header files. Some of these changes help to render doxygen comments better. 1. avxintrin.h – Moved the opening bracket on a separate line for several intrinsics (for consistency with the rest of the intrinsics). 2. emmintrin.h - Moved the doxygen comment next to the body of the function; - Added braces after extern "C" even though there is only one declaration each time 3. xmmintrin.h - Moved the doxygen comment next to the body of the function; - Added intrinsic prototypes for a couple of macro definitions into the doxygen comment; - Added braces after extern "C" even though there is only one declaration each time 4. ammintrin.h – Removed extra line between the doxygen comment and the body of the functions (for consistency with the rest of the files). Desk reviewed by Paul Robinson. llvm-svn: 287278	2016-11-17 23:02:00 +00:00
Ekaterina Romanova	64adc38e51	Doxygen comments for avxintrin.h. Added doxygen comments to avxintrin.h's intrinsics. As of now, around 75% of the intrinsics in this file are documented here. The patches for the other 25% will be se nt out later. Removed extra spaces in emmitrin.h. Note: The doxygen comments are automatically generated based on Sony's intrinsics document. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 286336	2016-11-09 03:58:30 +00:00
Simon Pilgrim	e3b9ee0645	[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead. It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match). This patch changes both scalar and packed versions back to using x86-specific builtins. It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding. Differential Revision: https://reviews.llvm.org/D22105 llvm-svn: 276102	2016-07-20 10:18:01 +00:00
Craig Topper	2a383c9273	[X86] Use undefined instead of setzero in shufflevector based intrinsics when the second source is unused. Rewrite immediate extractions in shuffle intrinsics to be in ((c >> x) & y) form instead of ((c & z) >> x). This way only x varies between each use instead of having to vary x and z. llvm-svn: 274525	2016-07-04 22:18:01 +00:00
Simon Pilgrim	beca5f295c	[Clang][X86] Convert non-temporal store builtins to generic __builtin_nontemporal_store in headers We can now use __builtin_nontemporal_store instead of target specific builtins for naturally aligned nontemporal stores which avoids the need for handling in CGBuiltin.cpp The scalar integer nontemporal (unaligned) store builtins will have to wait as __builtin_nontemporal_store currently assumes natural alignment and doesn't accept the 'packed struct' trick that we use for normal unaligned load/stores. The nontemporal loads require further backend support before we can safely convert them to __builtin_nontemporal_load Differential Revision: http://reviews.llvm.org/D21272 llvm-svn: 272540	2016-06-13 09:57:52 +00:00
Craig Topper	3a0c7260f4	[X86] Add void to the argument list of intrinsics that don't take arguments since empty argument list mean something else in C. llvm-svn: 272244	2016-06-09 05:14:28 +00:00
Craig Topper	6a77b62640	[X86] Use unsigned types for vector arithmetic in intrinsics to avoid undefined behavior for signed integer overflow. This is really only needed for addition, subtraction, and multiplication, but I did the bitwise ops too for overall consistency. Clang currently doesn't set NSW for signed vector operations so the undefined behavior shouldn't happen today. llvm-svn: 271778	2016-06-04 05:43:41 +00:00
Simon Pilgrim	00880511b1	[X86][SSE] Replace (V)CVTTPS2DQ and VCVTTPD2DQ truncating (round to zero) f32/f64 to i32 with generic IR (clang) The 'cvtt' truncation (round to zero) conversions can be safely represented as generic __builtin_convertvector (fptosi) calls instead of x86 intrinsics. We already do this (implicitly) for the scalar equivalents. Note: I looked at updating _mm_cvttpd_epi32 as well but this still requires a lot more backend work to correctly lower (both for debug and optimized builds). Differential Revision: http://reviews.llvm.org/D20859 llvm-svn: 271436	2016-06-01 21:46:51 +00:00
Michael Zuckerman	e54093fcc0	Adding front-end support to several intrinsics (bit scanning, conversion and state reading intrinsics) Adding LLVM front-end support to two intrinsics dealing with bit scan: _bit_scan_forward and _bit_scan_reverse. Their functionality is as described in Intel intrinsics guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_bit_scan_forward&expand=371,370 https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_bit_scan_reverse&expand=371,370 Furthermore, adding clang front-end support to these conversion intrinsics: _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and _mm256_cvtss_f32. Finally, adding tests to all of the above, as well as to the state reading intrinsics _rdpmc and _rdtsc. Their functionality is also specified in the Intel intrinsics guide. Commit on behalf of Omer Paparo Bivas llvm-svn: 271387	2016-06-01 12:21:00 +00:00
Craig Topper	74b5948f39	[X86] Use unaligned load intrinsics to implement other intrinsics instead of manually creating the unaligned load. llvm-svn: 271250	2016-05-31 05:49:13 +00:00
Craig Topper	09175dab31	[X86] Replace unaligned store builtins in SSE/AVX intrinsic files with code that will compile to a native unaligned store. Remove the builtins since they are no longer used. Intrinsics will be removed from llvm in a future commit. llvm-svn: 271214	2016-05-30 17:10:30 +00:00
Simon Pilgrim	90770c7c76	[X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR Both the (V)CVTDQ2PD(Y) (i32 to f64) and (V)CVTPS2PD(Y) (f32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen. This patch removes the clang builtins and their use in the sse2/avx headers - a future patch will deal with removing the llvm intrinsics, but that will require a bit more work. Differential Revision: http://reviews.llvm.org/D20528 llvm-svn: 270499	2016-05-23 22:13:02 +00:00
Simon Pilgrim	28666ce778	[X86][AVX] Ensure zero-extension of _mm256_extract_epi8 and _mm256_extract_epi16 Ensure _mm256_extract_epi8 and _mm256_extract_epi16 zero extend their i8/i16 result to i32. This matches _mm_extract_epi8 and _mm_extract_epi16. Fix for PR27594 Differential Revision: http://reviews.llvm.org/D20468 llvm-svn: 270330	2016-05-21 21:14:35 +00:00
Ekaterina Romanova	1168fdc9df	Doxygen comments for avxintrin.h. Added doxygen comments to avxintrin.h's intrinsics. As of now, only around 50% of the intrinsics in this file are documented here. The patches for the other half will be sent out later. Updated bmiintrin.h to fix an incorrect section name. Updated f16cintrin.h to fix incorect parameter names. The doxygen comments are automatically generated based on Sony's intrinsics document. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 269718	2016-05-16 22:54:45 +00:00
Craig Topper	1aa231e3aa	[X86] Add typecasts to remove most assumptions about what __m128i/__m256i is defined as. Add similar typecasts for the fp types as well. llvm-svn: 269632	2016-05-16 06:38:42 +00:00
Ekaterina Romanova	13f189da86	Add doxygen comments to avxintrin.h's intrinsics. Only around 25% of the intrinsics in this file are documented here. The patches for the other half will be sent out later. The doxygen comments are automatically generated based on Sony's intrinsics document. I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. llvm-svn: 263175	2016-03-11 00:05:54 +00:00
Craig Topper	d619eaaae4	[X86] Add missing typecasts in intrinsic macros. This should make them more robust against inputs that aren't already the right type. llvm-svn: 252700	2015-11-11 03:47:10 +00:00
Craig Topper	7148166785	[X86] Remove temporary variables from macros in x86 intrinsic headers. Prevents duplicate names appearing from multiple macro expansions. NFC llvm-svn: 252586	2015-11-10 05:08:05 +00:00
Andrea Di Biagio	8bb12d0a77	[x86] Fix maskload/store intrinsic definitions in avxintrin.h According to the Intel documentation, the mask operand of a maskload and maskstore intrinsics is always a vector of packed integer/long integer values. This patch introduces the following two changes: 1. It fixes the avx maskload/store intrinsic definitions in avxintrin.h. 2. It changes BuiltinsX86.def to match the correct gcc definitions for avx maskload/store (see D13861 for more details). Differential Revision: http://reviews.llvm.org/D13861 llvm-svn: 250816	2015-10-20 11:19:54 +00:00
Chandler Carruth	cbe6411401	Fix the SSE4 byte sign extension in a cleaner way, and more thoroughly test that our intrinsics behave the same under -fsigned-char and -funsigned-char. This further testing uncovered that AVX-2 has a broken cmpgt for 8-bit elements, and has for a long time. This is fixed in the same way as SSE4 handles the case. The other ISA extensions currently work correctly because they use specific instruction intrinsics. As soon as they are rewritten in terms of generic IR, they will need to add these special casts. I've added the necessary testing to catch this however, so we shouldn't have to chase it down again. I considered changing the core typedef to be signed, but that seems like a bad idea. Notably, it would be an ABI break if anyone is reaching into the innards of the intrinsic headers and passing __v16qi on an API boundary. I can't be completely confident that this wouldn't happen due to a macro expanding in a lambda, etc., so it seems much better to leave it alone. It also matches GCC's behavior exactly. A fun side note is that for both GCC and Clang, -funsigned-char really does change the semantics of __v16qi. To observe this, consider: % cat x.cc #include <smmintrin.h> #include <iostream> int main() { __v16qi a = { 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; __v16qi b = _mm_set1_epi8(-1); std::cout << (int)(a / b)[0] << ", " << (int)(a / b)[1] << '\n'; } % clang++ -o x x.cc && ./x -1, 1 % clang++ -funsigned-char -o x x.cc && ./x 0, 1 However, while this may be surprising, both Clang and GCC agree. Differential Revision: http://reviews.llvm.org/D13324 llvm-svn: 249097	2015-10-01 23:40:12 +00:00
Sean Silva	e4c3760a9f	Clean up trailing whitespace in the builtin headers llvm-svn: 247498	2015-09-12 02:55:19 +00:00
Simon Pilgrim	5aba9925c0	[X86][SSE] Add _mm_undefined_* intrinsics Added missing SSE/AVX 'undefined' intrinsics (PR24040): _mm_undefined_pd, _mm_undefined_ps + _mm_undefined_si128 _mm256_undefined_pd, _mm256_undefined_ps + _mm256_undefined_si256 _mm512_undefined, _mm512_undefined_ps, _mm512_undefined_pd + _mm512_undefined_epi32 Added builtin intrinsicss: __builtin_ia32_undef128, __builtin_ia32_undef256 + __builtin_ia32_undef512 Differential Revision: http://reviews.llvm.org/D12052 llvm-svn: 246083	2015-08-26 21:17:12 +00:00
Michael Kuperstein	e45af54cdb	[X86] Rename DEFAULT_FN_ATTR macro to __DEFAULT_FN_ATTR llvm-svn: 241065	2015-06-30 13:36:19 +00:00
Eric Christopher	9fc7fb274e	Update the intel intrinsic headers to use the target attribute support. This involved removing the conditional inclusion and replacing them with target attributes matching the original conditional inclusion and checks. The testcase update removes the macro checks for each file and replaces them with usage of the __target__ attribute, e.g.: int __attribute__((__target__(("sse3")))) foo(int a) { _mm_mwait(0, 0); return 4; } This usage does require the enclosing function have the requisite __target__ attribute for inlining and code generation - also for any macro intrinsic uses in the enclosing function. There's no change for existing uses of the intrinsic headers. llvm-svn: 239883	2015-06-17 07:09:32 +00:00
Eric Christopher	4d185168e9	Use a define for per-file function attributes for the Intel intrinsic headers. This is a precursor to changing them to use the new target attribute code. llvm-svn: 239882	2015-06-17 07:09:20 +00:00
Michael Kuperstein	7619004211	[X86] Add _mm256_set_m128 and its 5 variants. Differential Revision: http://reviews.llvm.org/D9855 llvm-svn: 237778	2015-05-20 07:46:52 +00:00
Sanjay Patel	f204b00940	Replace second (hopefully unused) access of macro input argument with zero vector to be safer. Suggested by Craig Topper in D8275. This is a follow-on to r232052. llvm-svn: 232061	2015-03-12 17:23:46 +00:00
Sanjay Patel	0c351aba25	[X86, AVX] replace vextractf128 intrinsics with generic shuffles This is very much like D8088 (checked in at r231792). Now that we've replaced the vinsertf128 intrinsics, do the same for their extract twins. Differential Revision: http://reviews.llvm.org/D8275 llvm-svn: 232052	2015-03-12 15:50:36 +00:00
Sanjay Patel	7f6aa52e93	[X86, AVX] Replace vinsertf128 intrinsics with generic shuffles. We want to replace as much custom x86 shuffling via intrinsics as possible because pushing the code down the generic shuffle optimization path allows for better codegen and less complexity in LLVM. This is the sibling patch for the LLVM half of this change: http://reviews.llvm.org/D8086 Differential Revision: http://reviews.llvm.org/D8088 llvm-svn: 231792	2015-03-10 15:19:26 +00:00
Filipe Cabecinhas	d74002965e	Make the _mm256_insert_epi64 definition more consistent Use long long for the epi64 argument, like the other intrinsics. NFC since this is only defined in 64-bit mode, not in 32-bit. Fix suggested by H. J. Lu! llvm-svn: 229886	2015-02-19 19:00:33 +00:00
Filipe Cabecinhas	54a2ba8b76	[Headers] Add tests for _mm256_insert_epi64 and fix its definition Summary: The definition for _mm256_insert_epi64 was taking an int, which would get truncated before being inserted in the vector. Original patch by Joshua Magee! Reviewers: bruno, craig.topper Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D7179 llvm-svn: 229811	2015-02-19 03:02:33 +00:00
David Majnemer	1cf22e690d	Headers: Don't use attribute keywords which aren't reserved Instead of using 'unavailable', use '__unavailable__' llvm-svn: 228087	2015-02-04 00:26:10 +00:00
Craig Topper	9fee8ab4f9	[x86] Remove tab characters from avxintrin.h. NFC. llvm-svn: 227676	2015-01-31 06:33:59 +00:00
Craig Topper	459554f164	[X86] Make order consistent between 'const' and 'int' in one of the intrinsic header files. NFC llvm-svn: 227675	2015-01-31 06:31:30 +00:00
Adam Nemet	286ae08e7d	Implement AVX1 vbroadcast intrinsics with vector initializers These intrinsics are special because they directly take a memory operand (AVX2 adds the register counterparts). Typically, other non-memop intrinsics take registers and then it's left to isel to fold memory operands. In order to LICM intrinsics directly reading memory, we require that no stores are in the loop (LICM) or that the folded load accesses constant memory (MachineLICM). When neither is the case we fail to hoist a loop-invariant broadcast. We can work around this limitation if we expose the load as a regular load and then just implement the broadcast using the vector initializer syntax. This exposes the load to LICM and other optimizations. At the IR level this is translated into a series of insertelements. The sequence is already recognized as a broadcast so there is no impact on the quality of codegen. _mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch because right now we lack the DAG-combiner smartness to recover the broadcast instructions. This will be tackled in a follow-on. There will be completing changes on the LLVM side to remove the LLVM intrinsics and to auto-upgrade bitcode files. Fixes <rdar://problem/16494520> llvm-svn: 209846	2014-05-29 20:47:29 +00:00
Filipe Cabecinhas	5d289b48b1	Patched clang to emit x86 blends as shufflevectors. Summary: Most of the clang header patch by Simon Pilgrim @ SCEE. Also fixed (or added) clang tests for these intrinsics. LLVM tests to make sure we get the blend instruction out of these shufflevectors are at http://reviews.llvm.org/D3600 Reviewers: eli.friedman, craig.topper, rafael Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D3601 llvm-svn: 208664	2014-05-13 02:37:02 +00:00
Manman Ren	c94122e05b	Intrinsics: fix extract & insert when index is out of bound. Now, all extract & insert intrinsics should have the correct and operation to ignore higher bits. rdar://15250497 llvm-svn: 193267	2013-10-23 20:33:14 +00:00
Craig Topper	c5244512c8	Use a shuffle with undef elements instead of inserting 0s in the 128-bit to 256-bit casting intrinsics to improve performance. Thanks to Katya Romanova for identifying this issue. llvm-svn: 187716	2013-08-05 06:17:21 +00:00
Richard Smith	49e56440f9	Add missing include guards into headers in lib/Headers. While it may appear that these headers should not be included more than once, they are in fact included twice when building our builtins module (in order for it to generate submodules for them), and without this, any modular build enabling AVX and including any builtin header fails. Testing this is tricky because including any of these headers in a modular build is liable to fail, due to unrelated builtin headers in the same module including headers which might not be available on the system running the tests. Suggestion on that front are welcome (but we're getting close to being able to run a buildbot that has modules enabled for all tests, which would nicely solve the testing problem). llvm-svn: 186275	2013-07-14 05:41:45 +00:00
Reid Kleckner	7ab75b3f68	Avoid names like __in that conflict with SAL in builtin headers Microsoft's Source Annotation Language (SAL) defines a bunch of keywords for annotating the inputs and outputs of functions. Empty definitions for the keywords are provided by <stdlib.h> -> <crtdefs.h> -> <sal.h>. This makes it basically impossible to include MSVC's stdlib.h and Clang's *mmintrin.h headers at the same time if they have variables named __in. As a workaround, I've renamed those variables. This fixes the Modules/compiler_builtins.m test which was XFAILed, presumably due to this conflict. llvm-svn: 179860	2013-04-19 17:00:14 +00:00
David Blaikie	5bb700360c	Readd an open paren that was lost while reformatting code. llvm-svn: 172669	2013-01-16 23:13:42 +00:00
David Blaikie	3302f2bd46	PR14964: intrinsic headers using non-reserved identifiers Several of the intrinsic headers were using plain non-reserved identifiers. C++11 17.6.4.3.2 [global.names] p1 reservers names containing a double begining with an underscore followed by an uppercase letter for any use. I think I got them all, but open to being corrected. For the most part I didn't bother updating function-like macro parameter names because I don't believe they're subject to any such collission - though some function-like macros already follow this convention (I didn't update them in part because the churn was more significant as several function-like macros use the double underscore prefixed version of the same name as a parameter in their implementation) llvm-svn: 172666	2013-01-16 23:08:36 +00:00
Craig Topper	26e74e50b6	Convert vperm2f128 and vperm2i128 intrinsics back to using llvm intrinsics. Unfortunately, these instructions have behavior that can't be modeled with shuffle vector. llvm-svn: 154906	2012-04-17 05:16:56 +00:00
Chad Rosier	2c5154224b	Fix the signatures for the _mm256_storeu2_* intrinsics. PR12532 llvm-svn: 154591	2012-04-12 16:29:08 +00:00

1 2

76 Commits