This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.
Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.
To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.
To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.
To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.
To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.
There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.
Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.
Differential Revision: https://reviews.llvm.org/D48617
llvm-svn: 336583
Previously we only checked the sse feature, but this means that if you passed -mno-mmx, the builtins/intrinsics wouldn't be disabled in the frontend and would instead fail backend isel.
llvm-svn: 333980
I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added.
Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load.
It also removed a bogus error message on another test.
llvm-svn: 333613
We don't need the insertion back into the original vector at the end. The builtin already understands that.
This is different than _mm_sqrt_sd which takes two arguments and we do need to insert.
llvm-svn: 333572
(1) I added some \see cross-references to a few select intrinsics that are related (and have the same or similar semantics).
(2) pmmintrin.h, smmintrin.h, xmmintrin.h have very few minor formatting changes. They make rendering of our intrinsics documentation better.
llvm-svn: 333065
This is similar to the LLVM change https://reviews.llvm.org/D46290.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46320
llvm-svn: 331834
- Fix inaccurate instruction listings.
- Fix small issues in _mm_getcsr and _mm_setcsr.
- Fix description of NaN handling in comparison intrinsics.
- Fix inaccurate description of _mm_movemask_pi8.
- Fix inaccurate instruction mappings.
- Fix typos.
- Clarify wording on some descriptions.
- Fix bit ranges in return value.
- Fix typo in _mm_move_ms intrinsic instruction since it operates on singe-precision values, not double.
- This patch was made by Craig Flores
Differential Revision: https://reviews.llvm.org/D41523
llvm-svn: 322778
(2) Removed uncessary anymore \c commands, since the same effect will be achived by <c> ... </c> sequence.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 303228
Separated very long brief sections into two sections.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 303031
- To be consistent with the rest of the intrinsics headers, I removed the tags <i> .. </i> for marking instruction names in italics in in smmintrin.h.
- Formatting changes to fit into 80 characters.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 300578
Added doxygen comments for the newly added intrinsics in avxintrin.h, namely _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and _mm256_cvtss_f32
Added doxygen comments for the new intrinsics in emmintrin.h, namely _mm_loadu_si64 and _mm_load_sd.
Explicit parameter names were added for _mm_clflush and _mm_setcsr
The rest of the changes are editorial, removing trailing spaces at the end of the lines.
Differential Revision: https://reviews.llvm.org/D28503
llvm-svn: 291876
Added \n commands to insert a line breaks where necessary, since one long line of documentation is nearly unreadable.
Formatted comments to fit into 80 chars.
In some cases added \a command in front of the parameter names to display them in italics.
llvm-svn: 290619
Tagged instruction names with <c> INSTR_NAME </c> to display them in typewriter font.
In the past, \c command was used, unfortunately it applied to only one word.
<c> .. </c> has the same meaning, but applies to all words in between the tags.
llvm-svn: 289249
I made several changes for consistency with the rest of x86 instrinsics header files. Some of these changes help to render doxygen comments better.
1. avxintrin.h – Moved the opening bracket on a separate line for several
intrinsics (for consistency with the rest of the intrinsics).
2. emmintrin.h - Moved the doxygen comment next to the body of the function;
- Added braces after extern "C" even though there is only
one declaration each time
3. xmmintrin.h - Moved the doxygen comment next to the body of the function;
- Added intrinsic prototypes for a couple of macro definitions
into the doxygen comment;
- Added braces after extern "C" even though there is only one
declaration each time
4. ammintrin.h – Removed extra line between the doxygen comment and the body
of the functions (for consistency with the rest of the files).
Desk reviewed by Paul Robinson.
llvm-svn: 287278
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.
It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).
This patch changes both scalar and packed versions back to using x86-specific builtins.
It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.
Differential Revision: https://reviews.llvm.org/D22105
llvm-svn: 276102
Summary:
This patch adds support for the _MM_ALIGN16 attribute on non-windows targets. This aligns Clang with ICC which supports the attribute on all targets.
Fixes PR28056
Reviewers: aaboud, echristo, cfe-commits, mkuper
Subscribers: zvi, mehdi_amini
Projects: #clang-c
Differential Revision: http://reviews.llvm.org/D21173
llvm-svn: 273095
We can now use __builtin_nontemporal_store instead of target specific builtins for naturally aligned nontemporal stores which avoids the need for handling in CGBuiltin.cpp
The scalar integer nontemporal (unaligned) store builtins will have to wait as __builtin_nontemporal_store currently assumes natural alignment and doesn't accept the 'packed struct' trick that we use for normal unaligned load/stores.
The nontemporal loads require further backend support before we can safely convert them to __builtin_nontemporal_load
Differential Revision: http://reviews.llvm.org/D21272
llvm-svn: 272540
Only half of the intrinsics in this file is documented here. The patch for the o
ther half will be sent out later.
The doxygen comments are automatically generated based on Sony's intrinsics docu
ment.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 272121
This is really only needed for addition, subtraction, and multiplication, but I did the bitwise ops too for overall consistency. Clang currently doesn't set NSW for signed vector operations so the undefined behavior shouldn't happen today.
llvm-svn: 271778
According to the gcc headers, intel intrinsics docs and msdn codegen the _mm_store1_pd (and its _mm_store_pd1 equivalent) should use an aligned pointer - the clang headers are the only implementation I can find that assume non-aligned stores (by storing with _mm_storeu_pd).
Additionally, according to the intel intrinsics docs and msdn codegen the _mm_store1_ps (_mm_store_ps1) requires a similarly aligned pointer.
This patch raises the alignment requirements to match the other implementations by calling _mm_store_ps/_mm_store_pd instead.
I've also added the missing _mm_store_pd1 intrinsic (which maps to _mm_store1_pd like _mm_store_ps1 does to _mm_store1_ps).
As a followup I'll update the llvm fast-isel tests to match this codegen.
Differential Revision: http://reviews.llvm.org/D20617
llvm-svn: 271218
xmmintrin.h a bit more directed. If for whatever reason modules are enabled but
we textually include one of these headers, don't deploy the special case for
modules. To make this work cleanly, extend __building_module to be defined
even when modules is disabled.
llvm-svn: 266945
Only half of the intrinsics in this file is documented here. The patch for the other half will be sent out later.
The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream.
llvm-svn: 263098
This involved removing the conditional inclusion and replacing them
with target attributes matching the original conditional inclusion
and checks. The testcase update removes the macro checks for each
file and replaces them with usage of the __target__ attribute, e.g.:
int __attribute__((__target__(("sse3")))) foo(int a) {
_mm_mwait(0, 0);
return 4;
}
This usage does require the enclosing function have the requisite
__target__ attribute for inlining and code generation - also for
any macro intrinsic uses in the enclosing function. There's no change
for existing uses of the intrinsic headers.
llvm-svn: 239883
xmmintrin.h includes emmintrin.h and vice versa if SSE2 is enabled. We break
this cycle for a modules build, and instead make the xmmintrin.h module
re-export the immintrin.h module. Also included is a fix for an assert in the
serialization code if a module exports another module that was declared later
in the same module map.
llvm-svn: 237321
This still lower to the same intrinsics as before.
This is preparation for bounds checking the immediate on the avx version of the builtin so we don't pass illegal immediates into the backend. Since SSE uses a smaller size immediate its not possible to bounds check when using a shared builtin. Rather than creating a clang specific builtin for the different immediate, I decided (after consulting with Chandler) that it was better to match gcc.
llvm-svn: 224879
The last step of _mm_cvtps_pi16 should use _mm_packs_pi32, which is a function
that reads two __m64 values and packs four 32-bit values into four 16-bit
values.
<rdar://problem/16873717>
llvm-svn: 209489