Commit Graph

116 Commits

Author SHA1 Message Date
Simon Pilgrim 645e1ad33a [X86][SSE] _mm_store1_ps/_mm_store1_pd should require an aligned pointer
According to the gcc headers, intel intrinsics docs and msdn codegen the _mm_store1_pd (and its _mm_store_pd1 equivalent) should use an aligned pointer - the clang headers are the only implementation I can find that assume non-aligned stores (by storing with _mm_storeu_pd).

Additionally, according to the intel intrinsics docs and msdn codegen the _mm_store1_ps (_mm_store_ps1) requires a similarly aligned pointer.

This patch raises the alignment requirements to match the other implementations by calling _mm_store_ps/_mm_store_pd instead.

I've also added the missing _mm_store_pd1 intrinsic (which maps to _mm_store1_pd like _mm_store_ps1 does to _mm_store1_ps).

As a followup I'll update the llvm fast-isel tests to match this codegen.

Differential Revision: http://reviews.llvm.org/D20617

llvm-svn: 271218
2016-05-30 17:55:25 +00:00
Craig Topper 09175dab31 [X86] Replace unaligned store builtins in SSE/AVX intrinsic files with code that will compile to a native unaligned store. Remove the builtins since they are no longer used.
Intrinsics will be removed from llvm in a future commit.

llvm-svn: 271214
2016-05-30 17:10:30 +00:00
Simon Pilgrim 6d1a0c4c75 [X86][SSE] Make unsigned integer vector types generally available
As discussed on http://reviews.llvm.org/D20684, move the unsigned integer vector types used for zero extension to make them available for general use.

llvm-svn: 271187
2016-05-29 18:49:08 +00:00
Simon Pilgrim 90770c7c76 [X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR
Both the (V)CVTDQ2PD(Y) (i32 to f64) and (V)CVTPS2PD(Y) (f32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen.

This patch removes the clang builtins and their use in the sse2/avx headers - a future patch will deal with removing the llvm intrinsics, but that will require a bit more work.

Differential Revision: http://reviews.llvm.org/D20528

llvm-svn: 270499
2016-05-23 22:13:02 +00:00
Craig Topper 1aa231e3aa [X86] Add typecasts to remove most assumptions about what __m128i/__m256i is defined as. Add similar typecasts for the fp types as well.
llvm-svn: 269632
2016-05-16 06:38:42 +00:00
Ekaterina Romanova f2ed62027d Add doxygen comments to emmintrin.h's intrinsics. Only around 25% of the intrinsics in this file are documented now. The patches for the rest of the intrisics in this file will be send out later.
The doxygen comments are automatically generated based on Sony's intrinsics document.

I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream. This patch was internally reviewed by Paul Robinson.

llvm-svn: 265844
2016-04-08 20:45:48 +00:00
Craig Topper fb79b5f273 [X86] Add 'pause' builtin that's already in llvm and use it instead of inline assembly to implement _mm_pause.
llvm-svn: 252712
2015-11-11 08:13:33 +00:00
Craig Topper a5455524c2 [X86] Use __builtin_ia32_paddq and __builtin_ia32_psubq to implement a couple intrinsics that were supposed to operate on MMX registers. Otherwise we end up operating on GPRs. Throw in a test for _mm_mul_su32 while I was there.
llvm-svn: 252711
2015-11-11 08:00:41 +00:00
Craig Topper d619eaaae4 [X86] Add missing typecasts in intrinsic macros. This should make them more robust against inputs that aren't already the right type.
llvm-svn: 252700
2015-11-11 03:47:10 +00:00
Craig Topper fd778eebac [X86] Use setzero instead of set1(0) in a few places in intrinsic headers.
llvm-svn: 252587
2015-11-10 05:08:08 +00:00
Chandler Carruth cbe6411401 Fix the SSE4 byte sign extension in a cleaner way, and more thoroughly
test that our intrinsics behave the same under -fsigned-char and
-funsigned-char.

This further testing uncovered that AVX-2 has a broken cmpgt for 8-bit
elements, and has for a long time. This is fixed in the same way as
SSE4 handles the case.

The other ISA extensions currently work correctly because they use
specific instruction intrinsics. As soon as they are rewritten in terms
of generic IR, they will need to add these special casts. I've added the
necessary testing to catch this however, so we shouldn't have to chase
it down again.

I considered changing the core typedef to be signed, but that seems like
a bad idea. Notably, it would be an ABI break if anyone is reaching into
the innards of the intrinsic headers and passing __v16qi on an API
boundary. I can't be completely confident that this wouldn't happen due
to a macro expanding in a lambda, etc., so it seems much better to leave
it alone. It also matches GCC's behavior exactly.

A fun side note is that for both GCC and Clang, -funsigned-char really
does change the semantics of __v16qi. To observe this, consider:

  % cat x.cc
  #include <smmintrin.h>
  #include <iostream>

  int main() {
    __v16qi a = { 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
    __v16qi b = _mm_set1_epi8(-1);
    std::cout << (int)(a / b)[0] << ", " << (int)(a / b)[1] << '\n';
  }
  % clang++ -o x x.cc && ./x
  -1, 1
  % clang++ -funsigned-char -o x x.cc && ./x
  0, 1

However, while this may be surprising, both Clang and GCC agree.

Differential Revision: http://reviews.llvm.org/D13324

llvm-svn: 249097
2015-10-01 23:40:12 +00:00
Michael Kuperstein a10dff946e [X86] Make f16c intrinsics accessible through emmintrin.h, per Intel docs
Differential Revision: http://reviews.llvm.org/D13015

llvm-svn: 248156
2015-09-21 13:34:47 +00:00
Michael Kuperstein 5c2cb0eee2 [X86] Fix some non-reserved parameter names in intrinsic headers
Differential Revision: http://reviews.llvm.org/D13009

llvm-svn: 248150
2015-09-21 11:45:27 +00:00
Simon Pilgrim 5aba9925c0 [X86][SSE] Add _mm_undefined_* intrinsics
Added missing SSE/AVX 'undefined' intrinsics (PR24040):

_mm_undefined_pd, _mm_undefined_ps + _mm_undefined_si128
_mm256_undefined_pd, _mm256_undefined_ps + _mm256_undefined_si256
_mm512_undefined, _mm512_undefined_ps, _mm512_undefined_pd + _mm512_undefined_epi32

Added builtin intrinsicss:

__builtin_ia32_undef128, __builtin_ia32_undef256 + __builtin_ia32_undef512

Differential Revision: http://reviews.llvm.org/D12052

llvm-svn: 246083
2015-08-26 21:17:12 +00:00
Michael Kuperstein e45af54cdb [X86] Rename DEFAULT_FN_ATTR macro to __DEFAULT_FN_ATTR
llvm-svn: 241065
2015-06-30 13:36:19 +00:00
Eric Christopher 9fc7fb274e Update the intel intrinsic headers to use the target attribute support.
This involved removing the conditional inclusion and replacing them
with target attributes matching the original conditional inclusion
and checks. The testcase update removes the macro checks for each
file and replaces them with usage of the __target__ attribute, e.g.:

int __attribute__((__target__(("sse3")))) foo(int a) {
  _mm_mwait(0, 0);
  return 4;
}

This usage does require the enclosing function have the requisite
__target__ attribute for inlining and code generation - also for
any macro intrinsic uses in the enclosing function. There's no change
for existing uses of the intrinsic headers.

llvm-svn: 239883
2015-06-17 07:09:32 +00:00
Eric Christopher 4d185168e9 Use a define for per-file function attributes for the Intel intrinsic headers.
This is a precursor to changing them to use the new target attribute
code.

llvm-svn: 239882
2015-06-17 07:09:20 +00:00
Craig Topper a462482d98 [X86] Add _mm_bslli_si128 and _mm_bsrli_si128 as aliases of _mm_slli_si128 and _mm_srli_si128. This matches Intel documentation and gcc.
llvm-svn: 229066
2015-02-13 06:04:45 +00:00
Craig Topper 51e47418d4 [X86] Simplify some code and remove some -Wshadow disables from intrinsic header.
llvm-svn: 229065
2015-02-13 06:04:43 +00:00
Filipe Cabecinhas 2177fc1732 Make the byte-shift SSE intrinsics emit vector shuffles which we know the backend can handle.
Also removed unused builtins.

Original patch by Andrea Di Biagio!

Reviewers: craig.topper, nadav

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D7199

llvm-svn: 228481
2015-02-07 01:37:09 +00:00
David Majnemer 1cf22e690d Headers: Don't use attribute keywords which aren't reserved
Instead of using 'unavailable', use '__unavailable__'

llvm-svn: 228087
2015-02-04 00:26:10 +00:00
Craig Topper 2094d8fe88 [x86] Add the (v)cmpps/pd/ss/sd builtins to match gcc. Use them in the sse intrinsic files.
This still lower to the same intrinsics as before.

This is preparation for bounds checking the immediate on the avx version of the builtin so we don't pass illegal immediates into the backend. Since SSE uses a smaller size immediate its not possible to bounds check when using a shared builtin. Rather than creating a clang specific builtin for the different immediate, I decided (after consulting with Chandler) that it was better to match gcc.

llvm-svn: 224879
2014-12-27 06:59:57 +00:00
Alp Toker d480b1bf34 Fix a SSE2 intrinsics typo
Full discourse at:

  http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20131104/092514.html
  http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-November/068124.html

Patch by Dimitry Andric and Alexey Dokuchaev!

llvm-svn: 195558
2013-11-23 22:11:57 +00:00
Manman Ren be38b9e15f _mm_extract_epi16: use "& 7" when index is out of bound.
This is in line with implementation of _mm_extract_pi16.
rdar://15250497

llvm-svn: 193187
2013-10-22 19:24:42 +00:00
Ted Kremenek 854cc293a7 Suppress useless -Wshadow warning when using _mm* macros from emmintrin.h
Fixes <rdar://problem/10679282>.

I'm not completely satisfied with this patch.  Sprinkling "diagnostic ignored"
_Pragmas throughout this file is gross, but I couldn't suppress
it for the entire file.

llvm-svn: 192143
2013-10-07 23:51:11 +00:00
Eli Friedman f9d8c6cebb Add _mm_stream_si64 intrinsic.
While I'm here, also fix the alignment computation for the whole family of
intrinsics.

PR17298.

llvm-svn: 191243
2013-09-23 23:38:39 +00:00
Manman Ren 9bb34d66b3 X86 intrinsics: cmpge|gt|nge|ngt_ss|_sd
These intrinsics should return the comparision result in the low bits and keep 
the high bits of the first source operand.

When calling to builtin functions, the source operands are swapped and the high
bits of the second source operand are kept. To fix the issue, an extra
shufflevector is used.

rdar://14153896

llvm-svn: 184110
2013-06-17 19:42:49 +00:00
Reid Kleckner 7ab75b3f68 Avoid names like __in that conflict with SAL in builtin headers
Microsoft's Source Annotation Language (SAL) defines a bunch of keywords
for annotating the inputs and outputs of functions.  Empty definitions
for the keywords are provided by <stdlib.h> -> <crtdefs.h> -> <sal.h>.
This makes it basically impossible to include MSVC's stdlib.h and
Clang's *mmintrin.h headers at the same time if they have variables
named __in.  As a workaround, I've renamed those variables.

This fixes the Modules/compiler_builtins.m test which was XFAILed,
presumably due to this conflict.

llvm-svn: 179860
2013-04-19 17:00:14 +00:00
David Blaikie 3302f2bd46 PR14964: intrinsic headers using non-reserved identifiers
Several of the intrinsic headers were using plain non-reserved identifiers.
C++11 17.6.4.3.2 [global.names] p1 reservers names containing a double
begining with an underscore followed by an uppercase letter for any use.

I think I got them all, but open to being corrected. For the most part I
didn't bother updating function-like macro parameter names because I don't
believe they're subject to any such collission - though some function-like
macros already follow this convention (I didn't update them in part because
the churn was more significant as several function-like macros use the double
underscore prefixed version of the same name as a parameter in their
implementation)

llvm-svn: 172666
2013-01-16 23:08:36 +00:00
Chad Rosier 87622b8b84 Get rid of storelv4si builtin as it can be expressed directly. This is general
goodness because it provides opportunites to cleanup things.  For example,

uint64_t t1(__m128i vA)
{
  uint64_t Alo;
  _mm_storel_epi64((__m128i*)&Alo, vA);
  return Alo;
}

was generating 

	movq	%xmm0, -8(%rbp)
	movq	-8(%rbp), %rax

and now generates

	movd	%xmm0, %rax

rdar://11282581

llvm-svn: 155924
2012-05-01 18:11:51 +00:00
Nick Lewycky d0ba3793aa Comment mystery code.
llvm-svn: 149742
2012-02-04 02:16:48 +00:00
Nick Lewycky 51a009092c Make _mm_cmpgt_epi8 immute to -funsigned-char.
llvm-svn: 149725
2012-02-03 23:57:48 +00:00
Bob Wilson c9b97cc1da Fix vector macros to correctly check argument types. <rdar://problem/10261670>
llvm-svn: 143792
2011-11-05 06:08:06 +00:00
Eli Friedman 89c11337ba Add _mm_comige_sd to emmintrin.h, since I apparently forgot to do this in r138769.
<rdar://problem/10230751>

llvm-svn: 141310
2011-10-06 20:31:50 +00:00
Eli Friedman 9bb51adcce Tweak *mmintrin.h so that they don't make any bad assumptions about alignment (which probably has little effect in practice, but better to get it right). Make the load in _mm_loadh_pi and _mm_loadl_pi a single LLVM IR instruction to make optimizing easier for CodeGen.
rdar://10054986

llvm-svn: 139874
2011-09-15 23:15:27 +00:00
Eli Friedman f8cb480528 Add missing function _mm_ucomige_sd to emmintrin.h. PR10803.
llvm-svn: 138769
2011-08-29 21:26:24 +00:00
Bill Wendling 03e7e430c3 Add 'may_alias' attribute. Noticed by Eli.
llvm-svn: 131278
2011-05-13 01:24:00 +00:00
Bill Wendling 502931fad9 Represent the unaligned loads natively. These are converted into a call to the
correct unaligned load.

llvm-svn: 131268
2011-05-13 00:11:39 +00:00
Bill Wendling e106c34817 LLVM doesn't always optimize away the four loads from this:
(__m128){ p[0], p[1], p[2], p[3] }

which produces really bad code. This could be done in instcombine, but it's
probably better to do it in the front-end instead.
<rdar://problem/9424836>

llvm-svn: 131237
2011-05-12 19:02:15 +00:00
Eli Friedman 8ba29d8e7f PR9866: Fix the implementation of _mm_loadl_pd and _mm_loadh_pd to not make
bad assumptions about the alignment of the double* argument.

llvm-svn: 131052
2011-05-07 18:59:31 +00:00
Chris Lattner f03406f103 don't use compound literals in MM macros, since they will be instantiated
into user code which may warn about them with -pedantic.  Patch by Jonathan Sauer!

llvm-svn: 130149
2011-04-25 20:42:40 +00:00
Bill Wendling b9c9e34cb3 Just use a native "load" instead of translating the builtin later. Clang can
take it!

I wasn't able to get __builtin_ia32_loaddqu to transform into an unaligned
load...I'll have to look into it further.

llvm-svn: 129427
2011-04-13 05:58:17 +00:00
Chris Lattner 1750cb037d __builtin_ia32_psrldqi128 too
llvm-svn: 115301
2010-10-01 06:58:49 +00:00
Chris Lattner 81f347fe6d the second argument to __builtin_ia32_pslldqi128 must be an immediate,
so it needs to be called from a macro, not a function.  This is a necessary
but insufficient step towards fixing PR8221

llvm-svn: 115299
2010-10-01 06:52:23 +00:00
Eric Christopher 2a9898f0a2 Move some type defines from smmintrin.h to emmintrin.h to match where
gcc defines them.

llvm-svn: 112146
2010-08-26 02:09:25 +00:00
Benjamin Kramer ae8ea1f715 Fix header comments.
llvm-svn: 111645
2010-08-20 16:47:17 +00:00
Chris Lattner 9052c35479 fix some vector extractions to return properly zero extended values
(instead of sign extending) to match ICC.  GCC is changing this in 
a series of their own PRs (e.g. 41323).

llvm-svn: 111637
2010-08-20 16:08:33 +00:00
Eli Friedman 07c89c6b3e PR7588: Fix the _mm_shufflehi_epi16 macro. (The issue was an oversight
involving operator precedence.)

llvm-svn: 107902
2010-07-08 20:09:45 +00:00
Chris Lattner 8b3b145342 fix _mm_shuffle_pd too, thanks to Joel Falcou for pointing this out.
llvm-svn: 103873
2010-05-15 16:54:46 +00:00
Chris Lattner 7eac805bb0 fix PR6658: inline isn't a keyword in C89 mode, use __inline__ instead.
llvm-svn: 99190
2010-03-22 18:14:12 +00:00
Eric Christopher 33124e20c7 Migrate typedefs to the top level of xmmintrin.h and remove the same
one from emmintrin.h.

llvm-svn: 99020
2010-03-20 01:08:47 +00:00
Anders Carlsson 327c8df90c Make our char vector types not be explicitly signed to match GCC and to fix compilation with C++ and -fno-lax-vector-conversions
llvm-svn: 82254
2009-09-18 19:18:19 +00:00
Anders Carlsson dfa3117085 Fix PR4923.
Fix error in _mm_set_pd/_mm_setr_pd and add _mm_set_epi64x/_mm_set1_epi64x. Patch by Laurent Morichetti!

llvm-svn: 82228
2009-09-18 17:03:55 +00:00
Eli Friedman 5173349565 Switch some functions from using x86 builtins to using vector
operations.

llvm-svn: 76753
2009-07-22 17:08:01 +00:00
Eli Friedman e9ff191459 Remove a few more vector builtins.
llvm-svn: 73022
2009-06-07 09:32:56 +00:00
Eli Friedman 4d8d7d3263 Replace more calls to builtins with generic code.
llvm-svn: 72995
2009-06-06 08:08:06 +00:00
Eli Friedman d00fd2885e Fix some casts to work without -flax-vector-conversions.
llvm-svn: 72981
2009-06-06 03:45:06 +00:00
Eli Friedman ebd9314f32 Misc fixes to MMX/SSE intrinsics: a few small bug fixes, and getting rid
of calls to builtins for constructs which can be expressed directly.

llvm-svn: 72979
2009-06-06 02:13:04 +00:00
Eli Friedman f83c258eae Add aliases for a couple of SSE intrinsics. Patch by Ed Schouten.
llvm-svn: 72717
2009-06-02 05:55:48 +00:00
Anders Carlsson 2081200b8c Add 'cmp' SSE builtins and get rid of a bunch of other builtins.
llvm-svn: 72032
2009-05-18 19:16:46 +00:00
Anders Carlsson 57640939c2 Fix typo.
llvm-svn: 68466
2009-04-06 21:55:22 +00:00
Anders Carlsson 823c02eaab Add the nodebug attribute to intrinsics
llvm-svn: 64519
2009-02-14 01:00:11 +00:00
Mike Stump 5b31ed3ff0 80col.
llvm-svn: 64450
2009-02-13 14:24:50 +00:00
Anders Carlsson 43c2bab6d3 Fix more bugs I discovered
llvm-svn: 62656
2009-01-21 01:49:39 +00:00
Anders Carlsson 88b53663fb Fix implementation of _mm_pause.
llvm-svn: 61441
2008-12-26 02:22:10 +00:00
Anders Carlsson 19ef5d49d4 OK, all tests pass. Let's start using the SSE2 header
llvm-svn: 61440
2008-12-26 00:57:11 +00:00