Also provide _setjmpex(). r200243 put in _setjmp() and _setjmpex() behind a
comment since jmp_buf wasn't available. r200344 added jmp_buf and put in
_setjmp(), but missed _setjmpex().
llvm-svn: 212557
Protect MMX specific declarations under a __MMX__ guard. This header can be
included on non-x86 architectures (e.g. ARM) which do not support the MMX ISA.
Use the preprocessor to prevent these declarations from being processed.
llvm-svn: 212512
Although the functions are marked as always_inline, the compiler with which they
are used may not honour the extended attributes and emit them as functions. In
such a case, indicate that they should have extern "C" linkage and should not be
mangled in C++ style if used within C++.
llvm-svn: 212511
defined or defined identically before there will not be any
change in functionality.
MinGW-w64 defines __GNUC_VA_LIST as
#define __GNUC_VA_LIST
which is different than the definition here, causing
a warning without the guard.
llvm-svn: 212183
This patch adds intrinsic __rdpmc to header file 'ia32intrin.h'.
Intrinsic __rdmpc can be used to read performance monitoring counters. It is
implemented as a direct call to __builtin_ia32_rdpmc.
It takes as input a value representing the index of the performance counter to
read. The value of the performance counter is then returned as a unsigned
64-bit quantity.
llvm-svn: 212053
Summary: This patch introduces ACLE header file, implementing extensions that can be directly mapped to existing Clang intrinsics. It implements for both AArch32 and AArch64.
Reviewers: t.p.northover, compnerd, rengolin
Reviewed By: compnerd, rengolin
Subscribers: rnk, echristo, compnerd, aemerson, mroth, cfe-commits
Differential Revision: http://reviews.llvm.org/D4296
llvm-svn: 211962
Conditionally include x86intrin.h if we are building for x86 or x86_64.
Conditionalise definition of inline assembly routines which use x86 or x86_64
inline assembly. This is needed as clang can target Windows on ARM where these
definitions may be included into user code.
llvm-svn: 211716
Add support for _InterlockedCompareExchangePointer, _InterlockExchangePointer,
_InterlockExchange. These are available as a compiler intrinsic on ARM and x86.
These are used directly by the Windows SDK headers without use of the intrin
header.
llvm-svn: 211216
There are several Altivec tests that formerly ran only on big-endian
targets (and in some cases only on 32-bit targets). It is useful to
verify these on little-endian targets as well.
While testing these, I discovered a typo in <altivec.h>. This is also
fixed by this patch.
llvm-svn: 210928
The vec_sld and vec_vsldoi interfaces perform a left-shift on vector
arguments for both big and little endian. However, because they rely
on the vec_perm interface which is endian-dependent, the permutation
vector needs to be reversed for LE to get the proper shift direction.
I've added some extra testing for these interfaces for LE in the
builtins-ppc-altivec.c.
llvm-svn: 210657
The PowerPC vsumsws instruction, accessed via vec_sums, is defined
architecturally with a big-endian bias, in that the second input vector
and the result always reference big-endian element 3 (little-endian
element 0). For ease of porting, the programmer wants elements 3 in
both cases.
To provide this semantics, for little endian we generate a permute for
the second input vector prior to the vsumsws instruction, and generate
a permute for the result vector following the vsumsws instruction.
The correctness of this code is tested by the new sums.c test added in
a previous patch, as well as the modifications to
builtins-ppc-altivec.c in the present patch.
llvm-svn: 210449
The PowerPC vector-unpack-high and vector-unpack-low instructions
are defined architecturally with a big-endian bias, in that the vector
element numbering is assumed to be "left to right" regardless of
whether the processor is in big-endian or little-endian mode. This
effectively reverses the meaning of "high" and "low." Such a
definition is unnatural for little-endian code generation.
To facilitate ease of porting, the vec_unpackh and vec_unpackl
interfaces are designed to use natural element ordering, so that
elements are numbered according to little-endian design principles
when code is generated for a little-endian target. The desired
semantics can be achieved by using the opposite instruction for
little-endian mode. That is, when a call to vec_unpackh appears in
the code, a vector-unpack-low is generated, and when a call to
vec_unpackl appears in the code, a vector-unpack-high is generated.
The correctness of this code is tested by the new unpack.c test
added in a previous patch, as well as the modifications to
builtins-ppc-altivec.c in the present patch.
Note that these interfaces were originally incorrectly implemented
when they take a vector pixel argument. This patch corrects this
implementation for both big- and little-endian code generation.
llvm-svn: 210391
The Altivec builtin test case test/CodeGen/builtins-ppc-altivec.c has
always been executed only for 32-bit PowerPC. These tests are equally
valid for 64-bit PowerPC. This patch updates the test to be run for
three targets: powerpc-unknown-unknown, powerpc64-unknown-unknown,
and powerpc64le-unknown-unknown. The expected code generation changes
for some of the Altivec builtins for little endian, so this patch adds
new CHECK-LE variants to the test for the powerpc64le target.
These tests satisfy the testing requirements for some previous patches
committed over the last couple of days for lib/Headers/altivec.h:
r210279 for vec_perm, r210337 for vec_mul[eo], and r210340 for
vec_pack.
llvm-svn: 210384
The PowerPC vector-pack instructions are defined architecturally with
a big-endian bias, in that the vector element numbering is assumed to
be "left to right" regardless of whether the processor is in
big-endian or little-endian mode. This definition is unnatural for
little-endian code generation.
To facilitate ease of porting, the vec_pack and related interfaces are
designed to use natural element ordering, so that elements are
numbered according to little-endian design principles when code is
generated for a little-endian target. The vec_pack calls are
implemented as calls to vec_perm, specifying selection of the
odd-numbered vector elements. For little endian, this means the
odd-numbered elements counting from the right end of the register.
Since the underlying instructions count from the left end, we must
instead select the even-numbered vector elements for little endian to
achieve the desired semantics.
The correctness of this code is tested by the new pack.c test added in
a previous patch. I plan to later make the existing ppc32 Altivec
compile-time tests work for ppc64 and ppc64le as well.
llvm-svn: 210340
The PowerPC vector-multiply-even and vector-multiply-odd instructions
are defined architecturally with a big-endian bias, in that the vector
element numbering is assumed to be "left to right" regardless of
whether the processor is in big-endian or little-endian mode. This
definition is unnatural for little-endian code generation.
To facilitate ease of porting, the vec_mule and vec_mulo interfacs are
designed to use natural element ordering, so that elements are
numbered according to little-endian design principles when code is
generated for a little-endian target. The desired semantics can be
achieved by using the opposite instruction for little-endian mode.
That is, when a call to vec_mule appears in the code, a
vector-multiply-odd is generated, and when a call to vec_mulo appears
in the code, a vector-multiply-even is generated.
The correctness of this code is tested by the new mult-even-odd.c test
added in a previous patch. I plan to later make the existing ppc32
Altivec compile-time tests work for ppc64 and ppc64le as well.
llvm-svn: 210337
The PowerPC vperm (vector permute) instruction is defined
architecturally with a big-endian bias, in that the two input vectors
are assumed to be concatenated "left to right" and the elements of the
combined input vector are assumed to be numbered from "left to right"
(i.e., with element 0 referencing the high-order element). This
definition is unnatural for little-endian code generation.
To facilitate ease of porting, the vec_perm interface is designed to
use natural element ordering, so that elements are numbered according
to little-endian design principles when code is generated for a
little-endian target. The desired semantics can be achieved with the
vperm instruction provided that the two input vector registers are
reversed, and the permute control vector is complemented. The
complementing is performed using an xor with a vector containing all
one bits.
Only the rightmost 5 bits of each element of the permute control
vector are relevant, so it would be possible to complement the vector
with respect to a <16xi8> vector containing all 31s. However, when
the permute control vector is not a constant, using 255 instead has
the advantage that the vec_xor can be recognized during code
generation as a vnor instruction. (Power8 introduces a vnand
instruction which could alternatively be generated.)
The correctness of this code is tested by the new perm.c test added in
a previous patch. I plan to later make the existing ppc32 Altivec
compile-time tests work for ppc64 and ppc64le as well.
llvm-svn: 210279
These intrinsics are special because they directly take a memory operand (AVX2
adds the register counterparts). Typically, other non-memop intrinsics take
registers and then it's left to isel to fold memory operands.
In order to LICM intrinsics directly reading memory, we require that no stores
are in the loop (LICM) or that the folded load accesses constant memory
(MachineLICM). When neither is the case we fail to hoist a loop-invariant
broadcast.
We can work around this limitation if we expose the load as a regular load and
then just implement the broadcast using the vector initializer syntax. This
exposes the load to LICM and other optimizations.
At the IR level this is translated into a series of insertelements. The
sequence is already recognized as a broadcast so there is no impact on the
quality of codegen.
_mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch
because right now we lack the DAG-combiner smartness to recover the broadcast
instructions. This will be tackled in a follow-on.
There will be completing changes on the LLVM side to remove the LLVM
intrinsics and to auto-upgrade bitcode files.
Fixes <rdar://problem/16494520>
llvm-svn: 209846
The last step of _mm_cvtps_pi16 should use _mm_packs_pi32, which is a function
that reads two __m64 values and packs four 32-bit values into four 16-bit
values.
<rdar://problem/16873717>
llvm-svn: 209489
Summary:
Most of the clang header patch by Simon Pilgrim @ SCEE.
Also fixed (or added) clang tests for these intrinsics.
LLVM tests to make sure we get the blend instruction out of these
shufflevectors are at http://reviews.llvm.org/D3600
Reviewers: eli.friedman, craig.topper, rafael
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D3601
llvm-svn: 208664
glibc expects that stddef.h only defines a single thing if either of these
defines is set. For example, before this change, a C file containing
#include <stdlib.h>
int ptrdiff_t = 0;
would compile with gcc but not with clang. Now it compiles with clang too.
This also fixes PR12997, where older versions of the Linux headers would define
NULL incorrectly, and glibc would define __need_NULL and expect stddef.h to
redefine NULL with the correct definition.
llvm-svn: 207606
See the bug and the cfe-commits thread "[patch] Let stddef.h redefine NULL if
__need_NULL is set" for discussion.
Fixes PR12997 and is similar to the __need_wint_t bits already in this file.
llvm-svn: 207482
This patch:
1. Adds a definition for two new GCCBuiltins in BuiltinsX86.def:
__builtin_ia32_rdtsc;
__builtin_ia32_rdtscp;
2. Replaces the already existing definition of intrinsic __rdtsc in
ia32intrin.h with a simple call to the new GCC builtin __builtin_ia32_rdtsc.
3. Adds a definition for the new intrinsic __rdtscp in ia32intrin.h
llvm-svn: 207132
Don't include input and output regs in clobbers. Prefix some
identifiers with __. Add a memory constraint to __readcr3 to prevent
reordering. This constraint is heavy handed, but conservatively
correct.
Thanks to PaX Team for the suggestions.
llvm-svn: 205778
This adds Clang support for the ARM64 backend. There are definitely
still some rough edges, so please bring up any issues you see with
this patch.
As with the LLVM commit though, we think it'll be more useful for
merging with AArch64 from within the tree.
llvm-svn: 205100
They're already defined in ia32intrin.h, and this would cause including Intrin.h
in 64-bit mode to fail because of conflicting types. Update ms-intrin.cpp to
also run in 64-bit mode to catch things like this.
llvm-svn: 203714
Summary:
Our usual definition of max_align_t wouldn't match up with MSVC if it
was used in a template argument.
Reviewers: chandlerc, rsmith, rnk
Reviewed By: chandlerc
CC: cfe-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2924
llvm-svn: 202911
and one for PCLMUL support. The current immintrin.h header only includes
wmmintrin.h if AES support is enabled. It should include it if either AES or
PCLMUL is enabled (GCC's version of immintrin.h does this).
Patch by John Baldwin!
llvm-svn: 202871
Because GCC incorrectly defines _mm_prefetch to take anything that casts
to void*, people have started using that behavior. The previous patch
that made _mm_prefetch actually take a const char * broke compatibility
with existing code. This update to the patch leaves the macro that
defines _mm_prefetch with the (void*) cast when _MSC_VER is not defined.
llvm-svn: 201901
This breaks backwards compatibility with existing code. Previously, this
was defined as
#define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), 0, (sel)))
Which basically accepts any pointer. Changing this to char* simply
breaks a lot of existing code. I have tried changing char* to
"const void*", which seems to be the right thing as per Intel
specification this should work on basically any pointer. However,
apparently this breaks windows compatibility (because of a conflicting
declaration in windows.h).
So, we probably need to #ifdef this based on whether clang is compiling
for windows. According to Chandler, this might be done by introducing an
additional symbol to a fake type in BuiltinsX86.def and then condition
the type expansion on the platform.
llvm-svn: 201775