According to the Intel documentation, the mask operand of a maskload and
maskstore intrinsics is always a vector of packed integer/long integer values.
This patch introduces the following two changes:
1. It fixes the avx maskload/store intrinsic definitions in avxintrin.h.
2. It changes BuiltinsX86.def to match the correct gcc definitions for avx
maskload/store (see D13861 for more details).
Differential Revision: http://reviews.llvm.org/D13861
llvm-svn: 250816
test that our intrinsics behave the same under -fsigned-char and
-funsigned-char.
This further testing uncovered that AVX-2 has a broken cmpgt for 8-bit
elements, and has for a long time. This is fixed in the same way as
SSE4 handles the case.
The other ISA extensions currently work correctly because they use
specific instruction intrinsics. As soon as they are rewritten in terms
of generic IR, they will need to add these special casts. I've added the
necessary testing to catch this however, so we shouldn't have to chase
it down again.
I considered changing the core typedef to be signed, but that seems like
a bad idea. Notably, it would be an ABI break if anyone is reaching into
the innards of the intrinsic headers and passing __v16qi on an API
boundary. I can't be completely confident that this wouldn't happen due
to a macro expanding in a lambda, etc., so it seems much better to leave
it alone. It also matches GCC's behavior exactly.
A fun side note is that for both GCC and Clang, -funsigned-char really
does change the semantics of __v16qi. To observe this, consider:
% cat x.cc
#include <smmintrin.h>
#include <iostream>
int main() {
__v16qi a = { 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
__v16qi b = _mm_set1_epi8(-1);
std::cout << (int)(a / b)[0] << ", " << (int)(a / b)[1] << '\n';
}
% clang++ -o x x.cc && ./x
-1, 1
% clang++ -funsigned-char -o x x.cc && ./x
0, 1
However, while this may be surprising, both Clang and GCC agree.
Differential Revision: http://reviews.llvm.org/D13324
llvm-svn: 249097
This involved removing the conditional inclusion and replacing them
with target attributes matching the original conditional inclusion
and checks. The testcase update removes the macro checks for each
file and replaces them with usage of the __target__ attribute, e.g.:
int __attribute__((__target__(("sse3")))) foo(int a) {
_mm_mwait(0, 0);
return 4;
}
This usage does require the enclosing function have the requisite
__target__ attribute for inlining and code generation - also for
any macro intrinsic uses in the enclosing function. There's no change
for existing uses of the intrinsic headers.
llvm-svn: 239883
This is very much like D8088 (checked in at r231792).
Now that we've replaced the vinsertf128 intrinsics,
do the same for their extract twins.
Differential Revision: http://reviews.llvm.org/D8275
llvm-svn: 232052
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.
This is the sibling patch for the LLVM half of this change:
http://reviews.llvm.org/D8086
Differential Revision: http://reviews.llvm.org/D8088
llvm-svn: 231792
Use long long for the epi64 argument, like the other intrinsics.
NFC since this is only defined in 64-bit mode, not in 32-bit.
Fix suggested by H. J. Lu!
llvm-svn: 229886
Summary:
The definition for _mm256_insert_epi64 was taking an int, which would get
truncated before being inserted in the vector.
Original patch by Joshua Magee!
Reviewers: bruno, craig.topper
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D7179
llvm-svn: 229811
These intrinsics are special because they directly take a memory operand (AVX2
adds the register counterparts). Typically, other non-memop intrinsics take
registers and then it's left to isel to fold memory operands.
In order to LICM intrinsics directly reading memory, we require that no stores
are in the loop (LICM) or that the folded load accesses constant memory
(MachineLICM). When neither is the case we fail to hoist a loop-invariant
broadcast.
We can work around this limitation if we expose the load as a regular load and
then just implement the broadcast using the vector initializer syntax. This
exposes the load to LICM and other optimizations.
At the IR level this is translated into a series of insertelements. The
sequence is already recognized as a broadcast so there is no impact on the
quality of codegen.
_mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch
because right now we lack the DAG-combiner smartness to recover the broadcast
instructions. This will be tackled in a follow-on.
There will be completing changes on the LLVM side to remove the LLVM
intrinsics and to auto-upgrade bitcode files.
Fixes <rdar://problem/16494520>
llvm-svn: 209846
Summary:
Most of the clang header patch by Simon Pilgrim @ SCEE.
Also fixed (or added) clang tests for these intrinsics.
LLVM tests to make sure we get the blend instruction out of these
shufflevectors are at http://reviews.llvm.org/D3600
Reviewers: eli.friedman, craig.topper, rafael
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D3601
llvm-svn: 208664
that these headers should not be included more than once, they are in fact
included twice when building our builtins module (in order for it to generate
submodules for them), and without this, any modular build enabling AVX and
including any builtin header fails.
Testing this is tricky because including any of these headers in a modular
build is liable to fail, due to unrelated builtin headers in the same module
including headers which might not be available on the system running the tests.
Suggestion on that front are welcome (but we're getting close to being able to
run a buildbot that has modules enabled for all tests, which would nicely solve
the testing problem).
llvm-svn: 186275
Microsoft's Source Annotation Language (SAL) defines a bunch of keywords
for annotating the inputs and outputs of functions. Empty definitions
for the keywords are provided by <stdlib.h> -> <crtdefs.h> -> <sal.h>.
This makes it basically impossible to include MSVC's stdlib.h and
Clang's *mmintrin.h headers at the same time if they have variables
named __in. As a workaround, I've renamed those variables.
This fixes the Modules/compiler_builtins.m test which was XFAILed,
presumably due to this conflict.
llvm-svn: 179860
Several of the intrinsic headers were using plain non-reserved identifiers.
C++11 17.6.4.3.2 [global.names] p1 reservers names containing a double
begining with an underscore followed by an uppercase letter for any use.
I think I got them all, but open to being corrected. For the most part I
didn't bother updating function-like macro parameter names because I don't
believe they're subject to any such collission - though some function-like
macros already follow this convention (I didn't update them in part because
the churn was more significant as several function-like macros use the double
underscore prefixed version of the same name as a parameter in their
implementation)
llvm-svn: 172666
From the Intel Optimization Reference Manual, Section 11.6.2. When data cannot
be aligned or alignment is not known, 16-byte memory accesses may provide better
performance.
rdar://11076953
llvm-svn: 153091