Tagged instruction names with <c> INSTR_NAME </c> to display them in typewriter font.
In the past, \c command was used, unfortunately it applied to only one word.
<c> .. </c> has the same meaning, but applies to all words in between the tags.
llvm-svn: 289249
I made several changes for consistency with the rest of x86 instrinsics header files. Some of these changes help to render doxygen comments better.
1. avxintrin.h – Moved the opening bracket on a separate line for several
intrinsics (for consistency with the rest of the intrinsics).
2. emmintrin.h - Moved the doxygen comment next to the body of the function;
- Added braces after extern "C" even though there is only
one declaration each time
3. xmmintrin.h - Moved the doxygen comment next to the body of the function;
- Added intrinsic prototypes for a couple of macro definitions
into the doxygen comment;
- Added braces after extern "C" even though there is only one
declaration each time
4. ammintrin.h – Removed extra line between the doxygen comment and the body
of the functions (for consistency with the rest of the files).
Desk reviewed by Paul Robinson.
llvm-svn: 287278
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.
It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).
This patch changes both scalar and packed versions back to using x86-specific builtins.
It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.
Differential Revision: https://reviews.llvm.org/D22105
llvm-svn: 276102
Summary:
This patch adds support for the _MM_ALIGN16 attribute on non-windows targets. This aligns Clang with ICC which supports the attribute on all targets.
Fixes PR28056
Reviewers: aaboud, echristo, cfe-commits, mkuper
Subscribers: zvi, mehdi_amini
Projects: #clang-c
Differential Revision: http://reviews.llvm.org/D21173
llvm-svn: 273095
We can now use __builtin_nontemporal_store instead of target specific builtins for naturally aligned nontemporal stores which avoids the need for handling in CGBuiltin.cpp
The scalar integer nontemporal (unaligned) store builtins will have to wait as __builtin_nontemporal_store currently assumes natural alignment and doesn't accept the 'packed struct' trick that we use for normal unaligned load/stores.
The nontemporal loads require further backend support before we can safely convert them to __builtin_nontemporal_load
Differential Revision: http://reviews.llvm.org/D21272
llvm-svn: 272540
Only half of the intrinsics in this file is documented here. The patch for the o
ther half will be sent out later.
The doxygen comments are automatically generated based on Sony's intrinsics docu
ment.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 272121
This is really only needed for addition, subtraction, and multiplication, but I did the bitwise ops too for overall consistency. Clang currently doesn't set NSW for signed vector operations so the undefined behavior shouldn't happen today.
llvm-svn: 271778
According to the gcc headers, intel intrinsics docs and msdn codegen the _mm_store1_pd (and its _mm_store_pd1 equivalent) should use an aligned pointer - the clang headers are the only implementation I can find that assume non-aligned stores (by storing with _mm_storeu_pd).
Additionally, according to the intel intrinsics docs and msdn codegen the _mm_store1_ps (_mm_store_ps1) requires a similarly aligned pointer.
This patch raises the alignment requirements to match the other implementations by calling _mm_store_ps/_mm_store_pd instead.
I've also added the missing _mm_store_pd1 intrinsic (which maps to _mm_store1_pd like _mm_store_ps1 does to _mm_store1_ps).
As a followup I'll update the llvm fast-isel tests to match this codegen.
Differential Revision: http://reviews.llvm.org/D20617
llvm-svn: 271218
xmmintrin.h a bit more directed. If for whatever reason modules are enabled but
we textually include one of these headers, don't deploy the special case for
modules. To make this work cleanly, extend __building_module to be defined
even when modules is disabled.
llvm-svn: 266945
Only half of the intrinsics in this file is documented here. The patch for the other half will be sent out later.
The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code review upstream.
llvm-svn: 263098
This involved removing the conditional inclusion and replacing them
with target attributes matching the original conditional inclusion
and checks. The testcase update removes the macro checks for each
file and replaces them with usage of the __target__ attribute, e.g.:
int __attribute__((__target__(("sse3")))) foo(int a) {
_mm_mwait(0, 0);
return 4;
}
This usage does require the enclosing function have the requisite
__target__ attribute for inlining and code generation - also for
any macro intrinsic uses in the enclosing function. There's no change
for existing uses of the intrinsic headers.
llvm-svn: 239883
xmmintrin.h includes emmintrin.h and vice versa if SSE2 is enabled. We break
this cycle for a modules build, and instead make the xmmintrin.h module
re-export the immintrin.h module. Also included is a fix for an assert in the
serialization code if a module exports another module that was declared later
in the same module map.
llvm-svn: 237321
This still lower to the same intrinsics as before.
This is preparation for bounds checking the immediate on the avx version of the builtin so we don't pass illegal immediates into the backend. Since SSE uses a smaller size immediate its not possible to bounds check when using a shared builtin. Rather than creating a clang specific builtin for the different immediate, I decided (after consulting with Chandler) that it was better to match gcc.
llvm-svn: 224879
The last step of _mm_cvtps_pi16 should use _mm_packs_pi32, which is a function
that reads two __m64 values and packs four 32-bit values into four 16-bit
values.
<rdar://problem/16873717>
llvm-svn: 209489
Because GCC incorrectly defines _mm_prefetch to take anything that casts
to void*, people have started using that behavior. The previous patch
that made _mm_prefetch actually take a const char * broke compatibility
with existing code. This update to the patch leaves the macro that
defines _mm_prefetch with the (void*) cast when _MSC_VER is not defined.
llvm-svn: 201901
This breaks backwards compatibility with existing code. Previously, this
was defined as
#define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), 0, (sel)))
Which basically accepts any pointer. Changing this to char* simply
breaks a lot of existing code. I have tried changing char* to
"const void*", which seems to be the right thing as per Intel
specification this should work on basically any pointer. However,
apparently this breaks windows compatibility (because of a conflicting
declaration in windows.h).
So, we probably need to #ifdef this based on whether clang is compiling
for windows. According to Chandler, this might be done by introducing an
additional symbol to a fake type in BuiltinsX86.def and then condition
the type expansion on the platform.
llvm-svn: 201775
This patch adds several built-ins that are required for ms
compatibility. _mm_prefetch must be a built-in because it takes a
compile-time constant argument and our prior approach of using a #define
to the current built-in doesn't work in the presence of re-declaration
of _mm_prefetch. The others can be obtained by including the windows
system headers. If a user includes the windows system headers but not
intrin.h they still need to work and therefore must be built-in because
we don't get a chance to implement them in intrin.h in this case.
llvm-svn: 201734
These intrinsics should return the comparision result in the low bits and keep
the high bits of the first source operand.
When calling to builtin functions, the source operands are swapped and the high
bits of the second source operand are kept. To fix the issue, an extra
shufflevector is used.
rdar://14153896
llvm-svn: 184110
Module "sse" implicitly exports module "sse2".
This is bad because we also have module "sse2" export module "sse" (as intended) so we end up with a cycle
in the module import graph:
1. sse2 -> (also imports) sse
2. sse -> (also imports) sse2
To eliminate the cycle remove 2.; importing module "sse2" will also import module "sse", but just importing
module "sse" will not also import module "sse2".
rdar://13240552
llvm-svn: 178117
Several of the intrinsic headers were using plain non-reserved identifiers.
C++11 17.6.4.3.2 [global.names] p1 reservers names containing a double
begining with an underscore followed by an uppercase letter for any use.
I think I got them all, but open to being corrected. For the most part I
didn't bother updating function-like macro parameter names because I don't
believe they're subject to any such collission - though some function-like
macros already follow this convention (I didn't update them in part because
the churn was more significant as several function-like macros use the double
underscore prefixed version of the same name as a parameter in their
implementation)
llvm-svn: 172666
(__m128){ p[0], p[1], p[2], p[3] }
which produces really bad code. This could be done in instcombine, but it's
probably better to do it in the front-end instead.
<rdar://problem/9424836>
llvm-svn: 131237