Commit Graph

36 Commits

Author SHA1 Message Date
Adam Nemet 286ae08e7d Implement AVX1 vbroadcast intrinsics with vector initializers
These intrinsics are special because they directly take a memory operand (AVX2
adds the register counterparts).  Typically, other non-memop intrinsics take
registers and then it's left to isel to fold memory operands.

In order to LICM intrinsics directly reading memory, we require that no stores
are in the loop (LICM) or that the folded load accesses constant memory
(MachineLICM).  When neither is the case we fail to hoist a loop-invariant
broadcast.

We can work around this limitation if we expose the load as a regular load and
then just implement the broadcast using the vector initializer syntax.  This
exposes the load to LICM and other optimizations.

At the IR level this is translated into a series of insertelements.  The
sequence is already recognized as a broadcast so there is no impact on the
quality of codegen.

_mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch
because right now we lack the DAG-combiner smartness to recover the broadcast
instructions.  This will be tackled in a follow-on.

There will be completing changes on the LLVM side to remove the LLVM
intrinsics and to auto-upgrade bitcode files.

Fixes <rdar://problem/16494520>

llvm-svn: 209846
2014-05-29 20:47:29 +00:00
Filipe Cabecinhas 5d289b48b1 Patched clang to emit x86 blends as shufflevectors.
Summary:
Most of the clang header patch by Simon Pilgrim @ SCEE.
Also fixed (or added) clang tests for these intrinsics.

LLVM tests to make sure we get the blend instruction out of these
shufflevectors are at http://reviews.llvm.org/D3600

Reviewers: eli.friedman, craig.topper, rafael

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D3601

llvm-svn: 208664
2014-05-13 02:37:02 +00:00
Manman Ren c94122e05b Intrinsics: fix extract & insert when index is out of bound.
Now, all extract & insert intrinsics should have the correct and operation
to ignore higher bits.

rdar://15250497

llvm-svn: 193267
2013-10-23 20:33:14 +00:00
Craig Topper c5244512c8 Use a shuffle with undef elements instead of inserting 0s in the 128-bit to 256-bit casting intrinsics to improve performance. Thanks to Katya Romanova for identifying this issue.
llvm-svn: 187716
2013-08-05 06:17:21 +00:00
Richard Smith 49e56440f9 Add missing include guards into headers in lib/Headers. While it may appear
that these headers should not be included more than once, they are in fact
included twice when building our builtins module (in order for it to generate
submodules for them), and without this, any modular build enabling AVX and
including any builtin header fails.

Testing this is tricky because including any of these headers in a modular
build is liable to fail, due to unrelated builtin headers in the same module
including headers which might not be available on the system running the tests.
Suggestion on that front are welcome (but we're getting close to being able to
run a buildbot that has modules enabled for all tests, which would nicely solve
the testing problem).

llvm-svn: 186275
2013-07-14 05:41:45 +00:00
Reid Kleckner 7ab75b3f68 Avoid names like __in that conflict with SAL in builtin headers
Microsoft's Source Annotation Language (SAL) defines a bunch of keywords
for annotating the inputs and outputs of functions.  Empty definitions
for the keywords are provided by <stdlib.h> -> <crtdefs.h> -> <sal.h>.
This makes it basically impossible to include MSVC's stdlib.h and
Clang's *mmintrin.h headers at the same time if they have variables
named __in.  As a workaround, I've renamed those variables.

This fixes the Modules/compiler_builtins.m test which was XFAILed,
presumably due to this conflict.

llvm-svn: 179860
2013-04-19 17:00:14 +00:00
David Blaikie 5bb700360c Readd an open paren that was lost while reformatting code.
llvm-svn: 172669
2013-01-16 23:13:42 +00:00
David Blaikie 3302f2bd46 PR14964: intrinsic headers using non-reserved identifiers
Several of the intrinsic headers were using plain non-reserved identifiers.
C++11 17.6.4.3.2 [global.names] p1 reservers names containing a double
begining with an underscore followed by an uppercase letter for any use.

I think I got them all, but open to being corrected. For the most part I
didn't bother updating function-like macro parameter names because I don't
believe they're subject to any such collission - though some function-like
macros already follow this convention (I didn't update them in part because
the churn was more significant as several function-like macros use the double
underscore prefixed version of the same name as a parameter in their
implementation)

llvm-svn: 172666
2013-01-16 23:08:36 +00:00
Craig Topper 26e74e50b6 Convert vperm2f128 and vperm2i128 intrinsics back to using llvm intrinsics. Unfortunately, these instructions have behavior that can't be modeled with shuffle vector.
llvm-svn: 154906
2012-04-17 05:16:56 +00:00
Chad Rosier 2c5154224b Fix the signatures for the _mm256_storeu2_* intrinsics.
PR12532

llvm-svn: 154591
2012-04-12 16:29:08 +00:00
Craig Topper 678a53c350 Fix shuffle vector calculation for mm_permute_ps. Fixes PR 12401.
llvm-svn: 153724
2012-03-30 05:09:18 +00:00
Chad Rosier f8df4f4e3b [avx] Define the _mm256_loadu2_xxx and _mm256_storeu2_xxx intrinsics.
From the Intel Optimization Reference Manual, Section 11.6.2.  When data cannot
be aligned or alignment is not known, 16-byte memory accesses may provide better
performance.
rdar://11076953

llvm-svn: 153091
2012-03-20 16:40:00 +00:00
Craig Topper e5ea3b0239 Remove vperm2f* and vperm2i builtins. Same effect can be achieved with builtin_shufflevector.
llvm-svn: 150064
2012-02-08 07:33:36 +00:00
Craig Topper fec9f8edb7 Remove vpermilp* builtins. Same effect can be achieved with builtin_shufflevector.
llvm-svn: 150056
2012-02-08 05:16:54 +00:00
Craig Topper 9e9301a83a Represent 256-bit unaligned loads natively and remove the builtins. Similar change was made for 128-bit versions a while back.
llvm-svn: 148919
2012-01-25 04:26:17 +00:00
Craig Topper 9f00948a82 Add AVX2 permute intrinsics. Also add parentheses on some macro arguments in other intrinsic headers.
llvm-svn: 147241
2011-12-24 07:55:14 +00:00
Chad Rosier 7caca84ce4 Fix _mm_permute_ps and _mm256_permute_ps AVX intrinsics to use "I" (ICE)
markings.  Fix avxintrin.h to take them into account.
Part of rdar://10595450

llvm-svn: 146810
2011-12-17 01:51:05 +00:00
Chad Rosier 93375d5fa5 Revert r146797, which was a partial revert of r146791; It was correct in the
first place.  The permutevar_* (note the *var*) intrinsics use ymm/mem.

llvm-svn: 146807
2011-12-17 01:39:56 +00:00
Chad Rosier 0adfe7aa2f Fix _mm256_extractf128_* AVX intrinsics to use "I" (ICE) markings. Fix
avxintrin.h to take them into account.
Part of rdar://10595450

llvm-svn: 146804
2011-12-17 01:22:27 +00:00
Chad Rosier 3648646b2b Partial revert of r146791; vpermilps/vpermilpd instructions accepts ymm/mem/imm8.
llvm-svn: 146797
2011-12-17 00:50:42 +00:00
Chad Rosier 060d03be1c Fix _mm256_round_pd, _mm256_round_ps, _mm_permute_pd and _mm256_permute_pd AVX
intrinsics to use "I" (ICE) markings.  Fix avxintrin.h to take them into 
account.
Part of rdar://10595450

llvm-svn: 146791
2011-12-17 00:15:26 +00:00
Chad Rosier 33d22d8def Fix vinsertf128_* AVX intrinsics to use "I" (ICE) markings. Fix avxintrin.h to
take them into account.
rdar://10590282

llvm-svn: 146758
2011-12-16 21:40:31 +00:00
Chad Rosier 9138fea25e Fix vperm2f128_* AVX intrinsics to use "I" (ICE) markings. Fix avxintrin.h to
take them into account.
rdar://10576962

llvm-svn: 146757
2011-12-16 21:07:34 +00:00
Eli Friedman f16beb3942 Fix some additional x86 intrinsics to use "I" (ICE) markings. Fix *mmintrin.h to take them into account.
<rdar://problem/10341145>

llvm-svn: 144246
2011-11-10 00:11:13 +00:00
Bob Wilson c9b97cc1da Fix vector macros to correctly check argument types. <rdar://problem/10261670>
llvm-svn: 143792
2011-11-05 06:08:06 +00:00
Bruno Cardoso Lopes 7a98a7e681 Fix _mm256_shuffle_ps mask! Example, for mask=203, Instead of:
<i32 3, i32 2, i32 8, i32 11, i32 3, i32 6, i32 12, i32 15>
generate:
  <i32 3, i32 2, i32 8, i32 11, i32 7, i32 6, i32 12, i32 15>

llvm-svn: 138411
2011-08-23 23:29:45 +00:00
John McCall 91a528841b Implement the AVX cmp builtins as macros instead of static inlines.
Patch by Syoyo Fujita!  Reviewed by Chris Lattner!  Checked in by me!

llvm-svn: 128984
2011-04-06 03:37:51 +00:00
Benjamin Kramer 6f35f3cd80 Disallow direct inclusion of avxintrin.h. Users should include immintrin.h instead. This matches GCC's behavior.
llvm-svn: 111692
2010-08-20 23:00:03 +00:00
Bruno Cardoso Lopes 8c333153e0 Fix define inserting a comma :)
llvm-svn: 110839
2010-08-11 18:45:43 +00:00
Bruno Cardoso Lopes 65954ffc69 Remove 256-bit cast built-ins and make the AVX intrinsic call llvm __builtin_shufflevector with the appropriate arguments
llvm-svn: 110771
2010-08-11 02:14:38 +00:00
Bruno Cardoso Lopes a4f1930b75 Remove 256-bit unpack built-ins and make the AVX intrinsic call llvm __builtin_shufflevector with the appropriate arguments
llvm-svn: 110768
2010-08-11 01:43:24 +00:00
Bruno Cardoso Lopes e712a135b7 Remove 256-bit shuffle built-ins and make the AVX intrinsic call llvm __builtin_shufflevector with the appropriate arguments
llvm-svn: 110766
2010-08-11 01:17:34 +00:00
Bruno Cardoso Lopes 3d3fc1d075 Make replicate intrinsics use shufflevector instead of dup builtins, also remove the dup builtins
llvm-svn: 110646
2010-08-10 02:23:54 +00:00
Bruno Cardoso Lopes 3d19889ca8 Fix AVX 256-bit intrinsics headers by using the right cast type while dealing with logical ops
llvm-svn: 110389
2010-08-05 23:04:58 +00:00
Bruno Cardoso Lopes fc2320fd73 Logical AVX instrinsics can be matched directly, no need to use builtins here.
llvm-svn: 110271
2010-08-04 22:56:42 +00:00
Bruno Cardoso Lopes 7c4b513a3f Add AVX intrinsics header
llvm-svn: 110253
2010-08-04 22:03:36 +00:00