Commit Graph

65 Commits

Author SHA1 Message Date
Craig Topper 8e364c680f [X86] Restore the pavg intrinsics.
The pattern we replaced these with may be too hard to match as demonstrated by
PR41496 and PR41316.

This patch restores the intrinsics and then we can start focusing
on the optimizing the intrinsics.

I've mostly reverted the original patch that removed them. Though I modified
the avx512 intrinsics to not have masking built in.

Differential Revision: https://reviews.llvm.org/D60674

llvm-svn: 358427
2019-04-15 17:17:35 +00:00
Chandler Carruth 4cf5743b77 Move the builtin headers to use the new license file header.
Summary:
These all had somewhat custom file headers with different text from the
ones I searched for previously, and so I missed them. Thanks to Hal and
Kristina and others who prompted me to fix this, and sorry it took so
long.

Reviewers: hfinkel

Subscribers: mcrosier, javed.absar, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D60406

llvm-svn: 357941
2019-04-08 20:51:30 +00:00
Craig Topper 74c10e3236 [Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.

Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.

To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.

To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.

To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.

To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.

There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.

Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.

Differential Revision: https://reviews.llvm.org/D48617

llvm-svn: 336583
2018-07-09 19:00:16 +00:00
Craig Topper 31730ae761 [X86] Rename __builtin_ia32_pslldqi128 to __builtin_ia32_pslldqi128_byteshift and similar for other sizes. Remove the multiply by 8 from the header files.
The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8.

This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic.

Fixes the remaining issue from PR37795.

llvm-svn: 334773
2018-06-14 22:02:35 +00:00
Craig Topper 03f4f04b91 [X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking.
llvm-svn: 334311
2018-06-08 18:00:25 +00:00
Craig Topper 9d3962f4f1 [X86] Change immediate type for some builtins from char to int.
These builtins are all handled by CGBuiltin.cpp so it doesn't much matter what the immediate type is, but int matches the intrinsic spec.

llvm-svn: 334310
2018-06-08 18:00:22 +00:00
Craig Topper 03de166ccd [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking.
llvm-svn: 334265
2018-06-08 06:13:16 +00:00
Craig Topper 3428beeb2f [X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.

llvm-svn: 334261
2018-06-08 03:24:47 +00:00
Craig Topper 7d17d7278b [X86] Add builtins for blend with immediate control to enforce target feature requirements and check immediate range.
llvm-svn: 334249
2018-06-08 00:00:21 +00:00
Craig Topper d3623155a2 [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar intrinsics.
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.

This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.

llvm-svn: 334208
2018-06-07 17:28:03 +00:00
Craig Topper c633867944 [X86] Remove __extension__ from macro intrinsics when its not needed.
I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added.

Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load.

It also removed a bogus error message on another test.

llvm-svn: 333613
2018-05-31 00:51:20 +00:00
Yael Tsafrir 23e7733230 [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IR
Differential Revision: https://reviews.llvm.org/D37562

llvm-svn: 313011
2017-09-12 07:46:32 +00:00
Simon Pilgrim c14865c0c5 [X86][AVX] Ensure vector non-temporal load/store intrinsics force pointer alignment (PR33830)
Clang specifies a max type alignment of 16 bytes on darwin targets (annoyingly in the driver not via cc1), meaning that the builtin nontemporal stores don't correctly align the loads/stores to 32 or 64 bytes when required, resulting in lowering to temporal unaligned loads/stores.

This patch casts the vectors to explicitly aligned types prior to the load/store to ensure that the require alignment is respected.

Differential Revision: https://reviews.llvm.org/D35996

llvm-svn: 309488
2017-07-29 15:33:34 +00:00
Simon Pilgrim 9f6e79c5e4 [X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (clang)
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.

LLVM companion patch: D31767.

Differential Revision: https://reviews.llvm.org/D31766

llvm-svn: 300326
2017-04-14 15:05:57 +00:00
Craig Topper 2a383c9273 [X86] Use undefined instead of setzero in shufflevector based intrinsics when the second source is unused. Rewrite immediate extractions in shuffle intrinsics to be in ((c >> x) & y) form instead of ((c & z) >> x). This way only x varies between each use instead of having to vary x and z.
llvm-svn: 274525
2016-07-04 22:18:01 +00:00
Craig Topper 50e3dfe9d0 [X86] Fix pslldq/psrldq intrinsics to not fail compilation with immediates larger than 16. This was accidentally broken in r272246.
llvm-svn: 273775
2016-06-25 07:31:14 +00:00
Craig Topper 68738332b8 [AVX512] Implement 512-bit and masked shufflelo and shufflehi intrinsics directly with __builtin_shufflevector and __builtin_ia32_select. Also improve the formatting of the AVX2 version.
llvm-svn: 272452
2016-06-11 03:31:13 +00:00
Craig Topper 2769bb5753 [X86] Handle AVX2 pslldqi and psrldqi intrinsics shufflevector creation directly in the header file instead of in CGBuiltin.cpp. Simplify the sse2 equivalents as well.
llvm-svn: 272246
2016-06-09 05:15:12 +00:00
Craig Topper 6a77b62640 [X86] Use unsigned types for vector arithmetic in intrinsics to avoid undefined behavior for signed integer overflow.
This is really only needed for addition, subtraction, and multiplication, but I did the bitwise ops too for overall consistency. Clang currently doesn't set NSW for signed vector operations so the undefined behavior shouldn't happen today.

llvm-svn: 271778
2016-06-04 05:43:41 +00:00
Simon Pilgrim 6d1a0c4c75 [X86][SSE] Make unsigned integer vector types generally available
As discussed on http://reviews.llvm.org/D20684, move the unsigned integer vector types used for zero extension to make them available for general use.

llvm-svn: 271187
2016-05-29 18:49:08 +00:00
Simon Pilgrim 91b77ceaed [X86][SSE] Replace VPMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (clang)
The VPMOVSX and (V)PMOVZX sign/zero extension intrinsics can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics.

This patch removes the clang builtins and their use in the sse2/avx headers - a companion patch will remove/auto-upgrade the llvm intrinsics.

Note: We already did this for SSE41 PMOVSX sometime ago.

Differential Revision: http://reviews.llvm.org/D20684

llvm-svn: 271106
2016-05-28 08:12:45 +00:00
Craig Topper cd45b1a7c7 [X86] Add a few missing typecasts to intrinsics. Found by playing with -fno-lax-vector-conversions on the builtin tests.
llvm-svn: 269734
2016-05-17 03:42:31 +00:00
Craig Topper 1aa231e3aa [X86] Add typecasts to remove most assumptions about what __m128i/__m256i is defined as. Add similar typecasts for the fp types as well.
llvm-svn: 269632
2016-05-16 06:38:42 +00:00
Craig Topper 5ec97a7b9b [X86] Improve codegen for AVX2 gather with an all 1s mask.
Use undefined instead of setzero as the pass through input since its going to be fully overwritten. Use cmpeq of two zero vectors to produce the all 1s vector. Casting -1 to a double and vectorizing causes a constant load of a -1.0 floating point value.

llvm-svn: 254389
2015-12-01 07:12:59 +00:00
Craig Topper e20b8c68ed [X86] _mm256_permutevar8x32_ps should take an integer vector for its shuffle index input.
llvm-svn: 254270
2015-11-29 22:53:32 +00:00
Craig Topper d619eaaae4 [X86] Add missing typecasts in intrinsic macros. This should make them more robust against inputs that aren't already the right type.
llvm-svn: 252700
2015-11-11 03:47:10 +00:00
Craig Topper 19744ee6ad [X86] Change pointer type in AVX2 gather builtins to be the scalar type instead of the vector type. This matches gcc and removes extras casts.
llvm-svn: 252697
2015-11-11 02:51:18 +00:00
Craig Topper fd778eebac [X86] Use setzero instead of set1(0) in a few places in intrinsic headers.
llvm-svn: 252587
2015-11-10 05:08:08 +00:00
Craig Topper 7148166785 [X86] Remove temporary variables from macros in x86 intrinsic headers. Prevents duplicate names appearing from multiple macro expansions. NFC
llvm-svn: 252586
2015-11-10 05:08:05 +00:00
Ahmed Bougacha 7dfaaf3891 [Headers][X86] Fix stream_load (movntdqa) to accept const*.
Per Intel intrinsics guide:
- _mm256_stream_load_si256 takes `__m256i const *'
- _mm_stream_load_si128 takes `__m128i *', for no good reason.

Let's accept const* for both.

llvm-svn: 249213
2015-10-02 23:29:26 +00:00
Chandler Carruth cbe6411401 Fix the SSE4 byte sign extension in a cleaner way, and more thoroughly
test that our intrinsics behave the same under -fsigned-char and
-funsigned-char.

This further testing uncovered that AVX-2 has a broken cmpgt for 8-bit
elements, and has for a long time. This is fixed in the same way as
SSE4 handles the case.

The other ISA extensions currently work correctly because they use
specific instruction intrinsics. As soon as they are rewritten in terms
of generic IR, they will need to add these special casts. I've added the
necessary testing to catch this however, so we shouldn't have to chase
it down again.

I considered changing the core typedef to be signed, but that seems like
a bad idea. Notably, it would be an ABI break if anyone is reaching into
the innards of the intrinsic headers and passing __v16qi on an API
boundary. I can't be completely confident that this wouldn't happen due
to a macro expanding in a lambda, etc., so it seems much better to leave
it alone. It also matches GCC's behavior exactly.

A fun side note is that for both GCC and Clang, -funsigned-char really
does change the semantics of __v16qi. To observe this, consider:

  % cat x.cc
  #include <smmintrin.h>
  #include <iostream>

  int main() {
    __v16qi a = { 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
    __v16qi b = _mm_set1_epi8(-1);
    std::cout << (int)(a / b)[0] << ", " << (int)(a / b)[1] << '\n';
  }
  % clang++ -o x x.cc && ./x
  -1, 1
  % clang++ -funsigned-char -o x x.cc && ./x
  0, 1

However, while this may be surprising, both Clang and GCC agree.

Differential Revision: http://reviews.llvm.org/D13324

llvm-svn: 249097
2015-10-01 23:40:12 +00:00
Ahmed Bougacha 5e354cb547 [Headers][X86] Use __builtin_shufflevector in AVX2 broadcasts.
This lets us optimize them better. We agreed to remove the intrinsics,
instead of combining them later, as, at -O0, we generate the expected
instructions. Plus, it's a nice cleanup.

Differential Revision: http://reviews.llvm.org/D10556

llvm-svn: 245605
2015-08-20 20:27:21 +00:00
Michael Kuperstein e45af54cdb [X86] Rename DEFAULT_FN_ATTR macro to __DEFAULT_FN_ATTR
llvm-svn: 241065
2015-06-30 13:36:19 +00:00
Eric Christopher 9fc7fb274e Update the intel intrinsic headers to use the target attribute support.
This involved removing the conditional inclusion and replacing them
with target attributes matching the original conditional inclusion
and checks. The testcase update removes the macro checks for each
file and replaces them with usage of the __target__ attribute, e.g.:

int __attribute__((__target__(("sse3")))) foo(int a) {
  _mm_mwait(0, 0);
  return 4;
}

This usage does require the enclosing function have the requisite
__target__ attribute for inlining and code generation - also for
any macro intrinsic uses in the enclosing function. There's no change
for existing uses of the intrinsic headers.

llvm-svn: 239883
2015-06-17 07:09:32 +00:00
Eric Christopher 4d185168e9 Use a define for per-file function attributes for the Intel intrinsic headers.
This is a precursor to changing them to use the new target attribute
code.

llvm-svn: 239882
2015-06-17 07:09:20 +00:00
Michael Kuperstein 877f3cbe84 [X86] Add _mm_broadcastsd_pd intrinsic
_mm_broadcastsd_pd is basically an alias for _mm_movedup_pd, however the alias is only available from AVX2 forward.

llvm-svn: 237698
2015-05-19 14:49:14 +00:00
Michael Kuperstein 6168183e04 [X86] Added _mm256_bslli_epi128 and _mm256_bsrli_epi128.
These two intrinsics are alternative names for _mm256_slli_si256 and _mm256_srli_si256, respectively.

llvm-svn: 237693
2015-05-19 13:05:46 +00:00
Ekaterina Romanova b929ad7b17 _mm256_blend_epi16 is being cast to __m256d instead of __m256i. Fixing this.
llvm-svn: 234560
2015-04-10 02:39:45 +00:00
Sanjay Patel 0a6da5de55 [X86, AVX2] Replace inserti128 and extracti128 intrinsics with generic shuffles
This is nearly identical to the v*f128_si256 parts of r231792 and r232052.

AVX2 introduced proper integer variants of the hacked integer insert/extract
C intrinsics that were created for this same functionality with AVX1.

This should complete the front end fixes for insert/extract128 intrinsics. 
Corresponding LLVM patch to follow.

llvm-svn: 232109
2015-03-12 21:54:24 +00:00
Juergen Ributzka 9baa03fc07 Lower _mm256_broadcastsi128_si256 directly to a vector shuffle.
Originally we were using the same GCC builtins to lower this AVX2 vector
intrinsic. Instead we will now lower it directly to a vector shuffle.

This will not only allow LLVM to generate better code, but it will also allow us
to remove the GCC intrinsics.

Reviewed by Andrea

This is related to rdar://problem/18742778.

llvm-svn: 231081
2015-03-03 17:22:53 +00:00
Filipe Cabecinhas 5d289b48b1 Patched clang to emit x86 blends as shufflevectors.
Summary:
Most of the clang header patch by Simon Pilgrim @ SCEE.
Also fixed (or added) clang tests for these intrinsics.

LLVM tests to make sure we get the blend instruction out of these
shufflevectors are at http://reviews.llvm.org/D3600

Reviewers: eli.friedman, craig.topper, rafael

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D3601

llvm-svn: 208664
2014-05-13 02:37:02 +00:00
Eli Friedman 3cd55f49ab Fix argument types of some AVX2 intrinsics.
This fix makes our headers consistent with gcc.

PR17312.

llvm-svn: 191248
2013-09-23 23:52:04 +00:00
Juergen Ributzka 2c2dbf4542 Fix the name and the type of the argument for intrinisc
_mm256_broadcastsi128_si256 to align with the Intel documentation.

This fixes bug PR 16581 and rdar:14747994.

llvm-svn: 188609
2013-08-17 16:40:09 +00:00
Richard Smith 49e56440f9 Add missing include guards into headers in lib/Headers. While it may appear
that these headers should not be included more than once, they are in fact
included twice when building our builtins module (in order for it to generate
submodules for them), and without this, any modular build enabling AVX and
including any builtin header fails.

Testing this is tricky because including any of these headers in a modular
build is liable to fail, due to unrelated builtin headers in the same module
including headers which might not be available on the system running the tests.
Suggestion on that front are welcome (but we're getting close to being able to
run a buildbot that has modules enabled for all tests, which would nicely solve
the testing problem).

llvm-svn: 186275
2013-07-14 05:41:45 +00:00
David Blaikie 3302f2bd46 PR14964: intrinsic headers using non-reserved identifiers
Several of the intrinsic headers were using plain non-reserved identifiers.
C++11 17.6.4.3.2 [global.names] p1 reservers names containing a double
begining with an underscore followed by an uppercase letter for any use.

I think I got them all, but open to being corrected. For the most part I
didn't bother updating function-like macro parameter names because I don't
believe they're subject to any such collission - though some function-like
macros already follow this convention (I didn't update them in part because
the churn was more significant as several function-like macros use the double
underscore prefixed version of the same name as a parameter in their
implementation)

llvm-svn: 172666
2013-01-16 23:08:36 +00:00
Manman Ren f865ba0c0e X86: add more GATHER intrinsics in Clang
Support the following intrinsics:
  _mm_i32gather_pd, _mm256_i32gather_pd,
  _mm_i64gather_pd, _mm256_i64gather_pd,
  _mm_i32gather_ps, _mm256_i32gather_ps,
  _mm_i64gather_ps, _mm256_i64gather_ps,
  _mm_i32gather_epi64, _mm256_i32gather_epi64,
  _mm_i64gather_epi64, _mm256_i64gather_epi64,
  _mm_i32gather_epi32, _mm256_i32gather_epi32,
  _mm_i64gather_epi32, _mm256_i64gather_epi32

llvm-svn: 159410
2012-06-29 05:19:13 +00:00
Manman Ren 86c3250b82 X86: add more GATHER intrinsics in Clang
Corrected type for index of _mm256_mask_i32gather_pd
  from 256-bit to 128-bit
Corrected types for src|dst|mask of _mm256_mask_i64gather_ps
  from 256-bit to 128-bit

Support the following intrinsics:
  _mm_mask_i32gather_epi64, _mm256_mask_i32gather_epi64,
  _mm_mask_i64gather_epi64, _mm256_mask_i64gather_epi64,
  _mm_mask_i32gather_epi32, _mm256_mask_i32gather_epi32,
  _mm_mask_i64gather_epi32, _mm256_mask_i64gather_epi32

llvm-svn: 159403
2012-06-29 00:54:35 +00:00
Manman Ren add5e9e289 X86: add GATHER intrinsics (AVX2) in Clang
Support the following intrinsics:
  _mm_mask_i32gather_pd, _mm256_mask_i32gather_pd, _mm_mask_i64gather_pd
  _mm256_mask_i64gather_pd, _mm_mask_i32gather_ps, _mm256_mask_i32gather_ps
  _mm_mask_i64gather_ps, _mm256_mask_i64gather_ps

llvm-svn: 159222
2012-06-26 19:55:09 +00:00
Craig Topper 26e74e50b6 Convert vperm2f128 and vperm2i128 intrinsics back to using llvm intrinsics. Unfortunately, these instructions have behavior that can't be modeled with shuffle vector.
llvm-svn: 154906
2012-04-17 05:16:56 +00:00
Craig Topper 8e57855ea0 Change _mm256_permute4x64_epi64 and _mm256_permute4x64_pd to use builtin_shufflevector instead of specific builtins. Old builtins will be removed from llvm now that vpermq/vpermpd are supported by shuffle lowering code.
llvm-svn: 154777
2012-04-15 22:18:10 +00:00