Commit Graph

73 Commits

Author SHA1 Message Date
Chandler Carruth 4cf5743b77 Move the builtin headers to use the new license file header.
Summary:
These all had somewhat custom file headers with different text from the
ones I searched for previously, and so I missed them. Thanks to Hal and
Kristina and others who prompted me to fix this, and sorry it took so
long.

Reviewers: hfinkel

Subscribers: mcrosier, javed.absar, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D60406

llvm-svn: 357941
2019-04-08 20:51:30 +00:00
Craig Topper be4cbe8726 [X86] Add explicit alignment to __m128/__m128i/__m128d/etc. to allow matching of MSVC behavior with #pragma pack.
Summary:
With MSVC, #pragma pack is ignored when there is explicit alignment. This differs from gcc. Clang emulates this difference when compiling for Windows.

It appears that MSVC and its headers consider the __m128/__m128i/__m128d/etc. types to be explicitly aligned and ignores #pragma pack for them. Since we don't have explicit alignment on them in our headers, we don't match the MSVC behavior here.

This patch adds explicit alignment to match this behavior. I'm hoping this won't cause any problems when we're not emulating MSVC. But if someone knows of something that would be different we can swith to conditionally adding the alignment based on _MSC_VER.

I had to add explicitly unaligned types as well so we could use them in the loadu/storeu intrinsics which use __attribute__(__packed__). Using the now explicitly aligned types wouldn't produce align 1 accesses when targeting Windows.

Reviewers: rnk, erichkeane, spatel, RKSimon

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D57961

llvm-svn: 353555
2019-02-08 19:45:08 +00:00
Craig Topper eae26bf737 [X86] Add more intrinsics to match icc.
This adds
_mm_loadu_epi8, _mm256_loadu_epi8, _mm512_loadu_epi8
_mm_loadu_epi16, _mm256_loadu_epi16, _mm512_loadu_epi16
_mm_storeu_epi8, _mm256_storeu_epi8, _mm512_storeu_epi8
_mm_storeu_epi16, _mm256_storeu_epi16, _mm512_storeu_epi16

llvm-svn: 344862
2018-10-20 19:28:52 +00:00
Mikhail Dvoretckii d1bf9ef0c7 [X86] Lowering integer truncation intrinsics to native IR
This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to
native IR in cases where the result's length is less than 128 bits.

The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for
128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps)
instructions are generated instead

Differential Revision: https://reviews.llvm.org/D48712

llvm-svn: 336643
2018-07-10 08:22:44 +00:00
Craig Topper 74c10e3236 [Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.

Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.

To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.

To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.

To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.

To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.

There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.

Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.

Differential Revision: https://reviews.llvm.org/D48617

llvm-svn: 336583
2018-07-09 19:00:16 +00:00
Craig Topper 0029470dde [X86] Correct the width of mask arguments in intrinsic headers and tests.
All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there.

Some of these were real bugs where we lost bits from the user input:
_mm512_mask_broadcast_f32x8
_mm512_maskz_broadcast_f32x8
_mm512_mask_broadcast_i32x8
_mm512_maskz_broadcast_i32x8
_mm256_mask_cvtusepi16_storeu_epi8

llvm-svn: 336042
2018-06-30 06:05:17 +00:00
Craig Topper 91bbe98757 [X86] Remove masking from dbpsadbw builtins, use select builtin instead.
llvm-svn: 334385
2018-06-11 06:18:29 +00:00
Craig Topper c633867944 [X86] Remove __extension__ from macro intrinsics when its not needed.
I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added.

Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load.

It also removed a bogus error message on another test.

llvm-svn: 333613
2018-05-31 00:51:20 +00:00
Craig Topper dff5b311af [X86] Reduce the number of setzero intrinsics to just the set defined by the Intel Intrinsics Guide.
We had quite a few for different element sizes of integers sometimes with strange target features attached to them.

We only need a single version for each of _m128i, _m256i, and _m512i with the target feature that first introduced those types.

llvm-svn: 333568
2018-05-30 18:02:11 +00:00
Craig Topper 68a272d501 [X86] Merge the 3 different flavors of masked vpermi2var/vpermt2var builtins to a single version without masking. Use select builtins with appropriate operand instead.
llvm-svn: 333387
2018-05-29 03:26:38 +00:00
Craig Topper 55b4067350 [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead.
Someday maybe we'll use selects for all the builtins.

llvm-svn: 332825
2018-05-20 23:34:10 +00:00
Craig Topper b809fc3d63 [X86] Fix a bad cast from mask16 to mask8 in _mm256_mask_cvtepi16_epi8 introduced in r332266.
llvm-svn: 332738
2018-05-18 17:18:46 +00:00
Craig Topper 25de41cfbc [X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins.
As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend.

Differential Revision: https://reviews.llvm.org/D46742

llvm-svn: 332266
2018-05-14 17:50:40 +00:00
Uriel Korach 5b2b71d909 [X86] test/testn intrinsics lowering to IR. clang side
Change Header files of the intrinsics for lowering test and testn intrinsics to IR code.
Removed test and testn builtins from clang

Differential Revision: https://reviews.llvm.org/D38737

llvm-svn: 318035
2017-11-13 12:50:52 +00:00
Craig Topper 57f96ac6dc [X86] Replace the mask cmpeq/cmple/cmplt/cmpgt/cmpge/cmpneq intrinsics with macros that just pass the right comparison predicate value to the regular cmp intrinsic. Remove mask cmpeq/cmpgt builtins that are now unused.
This shortens the intrinsic headers a little and allows us to get rid of the cmpeq and cmpgt handling from CGBuiltin.cpp.

llvm-svn: 317506
2017-11-06 21:00:49 +00:00
Jina Nahias 3ad702a1ed Lowering Mask Set1 intrinsics to LLVM IR
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D37668

llvm-svn: 313624
2017-09-19 11:00:27 +00:00
Craig Topper 37bf5c6a3f [AVX-512] Replace masked 16-bit element variable shift builtins with new unmasked versions and selects.
llvm-svn: 287313
2016-11-18 05:04:51 +00:00
Craig Topper d7e5b21914 [X86] Remove extra escaped new lines in intrinsic headers left over from an earlier conversion away from a macro. NFC
llvm-svn: 286756
2016-11-13 07:26:31 +00:00
Craig Topper 66b2fd1209 [AVX-512] Remove many of the masked 128/256-bit shift builtins and replace them with unmasked builtins and selects.
llvm-svn: 285539
2016-10-31 04:30:51 +00:00
Craig Topper 312ff9d19d [AVX-512] Remove masked 128/256-bit builtins for vpmaddwd and vpmaddubsw. Replace with unmasked builtins and select.
llvm-svn: 285516
2016-10-30 07:11:34 +00:00
Craig Topper 4caf76bee2 [AVX-512] Remove 128/256-bit masked pmulhrsw/pmulhuw/pmulhw builtins and use unmasked builtins and select instead.
llvm-svn: 285505
2016-10-29 19:02:14 +00:00
Craig Topper eee7c0520c [AVX-512] Replace masked 128/256-bit byte, word, and dword min/max builtins with selects and the older unmasked builtins.
llvm-svn: 284954
2016-10-23 23:57:30 +00:00
Craig Topper 4ef879ac2c [AVX-512] Remove masked 128/256-bit packss/packus builtins and replace with selects and the older unmasked builtins.
llvm-svn: 284935
2016-10-23 07:35:39 +00:00
Craig Topper 4d63dfc286 [AVX-512] Replace masked 128/256-bit pavg builtins and replace with select and older unmasked builtins.
llvm-svn: 284929
2016-10-22 21:24:56 +00:00
Craig Topper 622c63614d [AVX-512] Replace masked 128/256-bit saturating add/sub builtins with select and older unmasked builtins.
llvm-svn: 284928
2016-10-22 21:24:52 +00:00
Craig Topper 11dda92405 [AVX-512] Replace masked 128/256-bit vpmovzx/vpmovsx builtins with native IR.
llvm-svn: 284927
2016-10-22 21:24:48 +00:00
Craig Topper eb1c0afa90 [AVX-512] Remove masked 128/256-bit pshufb builtins. Replace with a select and the older unmaksed builtins.
llvm-svn: 284925
2016-10-22 21:24:42 +00:00
Craig Topper 78a9c40326 [AVX-512] Remove builtins for 128/256-bit pabsb/pabsw. We can use a select and the older non-masked versions instead.
llvm-svn: 284924
2016-10-22 21:24:38 +00:00
Craig Topper c2c7e42bfe [AVX-512] Add typecasts to alignr intrinsics that were modified in r284920.
llvm-svn: 284923
2016-10-22 21:24:34 +00:00
Craig Topper f6373bc6fd [AVX-512] Remove masked 128/256-bit palignr builtins. We can just use a select in the header file with the older unmasked versions instead.
llvm-svn: 284920
2016-10-22 18:32:33 +00:00
Craig Topper f43e4a1728 [AVX-512] Remove masked integer mullo builtins and replace with native IR.
llvm-svn: 280597
2016-09-03 19:19:49 +00:00
Craig Topper 0e18976b8d [AVX-512] Remove masked integer add/sub builtins and replace with native IR.
llvm-svn: 280596
2016-09-03 18:29:35 +00:00
Craig Topper 351ed42795 [X86] Block pbroadcastq instructions on 32-bit targets instead of pbroadcastb.
Thanks to Simon Pilgrim for catching the mistake.

llvm-svn: 276564
2016-07-24 14:58:06 +00:00
Craig Topper 45db56c375 [X86] Add missing __x86_64__ qualifiers on a bunch of intrinsics that assume 64-bit GPRs are available.
Usages of these intrinsics in a 32-bit build results in assertions in the backend.

llvm-svn: 276249
2016-07-21 07:38:39 +00:00
Simon Pilgrim f5a8837e1b [X86][AVX512] Converted the VBROADCAST intrinsics to generic IR
llvm-svn: 274544
2016-07-05 12:59:33 +00:00
Michael Zuckerman 7dac6fbdf8 [Clang][BuiltIn][AVX512] adding _mm{|256|512}_mask_cvt{s|us|}epi16_storeu_epi8 intrinsics
Differential Revision: http://reviews.llvm.org/D21729

llvm-svn: 274532
2016-07-05 08:08:01 +00:00
Craig Topper 79f53ca0b5 [AVX512] Replace masked unpack builtins with shufflevector and selects.
llvm-svn: 273533
2016-06-23 06:36:42 +00:00
Craig Topper 9ce3ddf2e6 [AVX512] Use a __v8hi vector inside of _mm_setzero_hi to match its name. Probably no real functional change.
llvm-svn: 273389
2016-06-22 06:36:23 +00:00
Craig Topper c89dda5938 [AVX512] Add missing typecasts to intrinsics.
llvm-svn: 273386
2016-06-22 06:36:16 +00:00
Craig Topper 26d5b87316 [X86] Add explicit typecasts to some intrinsics.
llvm-svn: 272466
2016-06-11 12:50:12 +00:00
Craig Topper 68738332b8 [AVX512] Implement 512-bit and masked shufflelo and shufflehi intrinsics directly with __builtin_shufflevector and __builtin_ia32_select. Also improve the formatting of the AVX2 version.
llvm-svn: 272452
2016-06-11 03:31:13 +00:00
Igor Breger aadb876200 [AVX512] Emit select instruction instead of using x86 specific instrinsics.
This will allow us to remove the x86 instrinics from the backend.

Differential Revision: http://reviews.llvm.org/D21060

llvm-svn: 272141
2016-06-08 13:59:20 +00:00
Craig Topper 32578b7dcf [AVX512][Builtin] Fix palignr intrinsic for avx512vlbw. The immediate should not be multiplied by 8.
The 512-bit version was fixed recently but this was missed.

llvm-svn: 270970
2016-05-27 06:59:39 +00:00
Craig Topper 1a15b6aff2 [AVX512] Add parentheses around macro arguments in AVX512VLBW intrinsics. Remove leading underscores from macro argument names. Add explicit typecasts to all macro arguments and return values. And finally reformat after all the adjustments.
This is a mostly mechanical change accomplished with a script. I tried to split out any changes to the typecasts that already existed into separate commits.

llvm-svn: 269743
2016-05-17 04:41:42 +00:00
Michael Zuckerman e871785eb6 [Clang][avx512][Builtin] Adding intrinsics for cvtw2mask{128|256|512} instruction set
Differential Revision: http://reviews.llvm.org/D19766

llvm-svn: 268385
2016-05-03 14:12:23 +00:00
Michael Zuckerman de8d3753d3 [clang][AVX512][Builtin] Adding intrinsics for the SAD instruction set.
Differential Revision: http://reviews.llvm.org/D19591

llvm-svn: 267942
2016-04-28 21:21:08 +00:00
Michael Zuckerman 533e065bdc [Clang][BuiltIn][AVX512] Adding intrinsics fot align{d|q} and palignr instruction set
Differential Revision: http://reviews.llvm.org/D19588

llvm-svn: 267876
2016-04-28 12:47:30 +00:00
Michael Zuckerman 8938e836c4 [Clang][AVX512][BuiltIn] Adding support to intrinsics of VPERMD and VPERMW instruction set
Differential Revision: http://reviews.llvm.org/D19195

llvm-svn: 267380
2016-04-25 05:32:35 +00:00
Michael Zuckerman c2b6128a8f [Clang][AVX512][Builtin] Adding support for VBROADCAST and VPBROADCASTB/W/D/Q instruction set
Differential Revision: http://reviews.llvm.org/D19012

llvm-svn: 266195
2016-04-13 12:58:01 +00:00
Michael Zuckerman 074edd7c1e [Clang][AVX512][Builtin] Adding supporting to intrinsics of cvt{b|d|q}2mask{128|256|512} and cvtmask2{b|d|q}{128|256|512} instruction set.
Differential Revision: http://reviews.llvm.org/D19009

llvm-svn: 266188
2016-04-13 10:49:37 +00:00