Originally we were using the same GCC builtins to lower this AVX2 vector
intrinsic. Instead we will now lower it directly to a vector shuffle.
This will not only allow LLVM to generate better code, but it will also allow us
to remove the GCC intrinsics.
Reviewed by Andrea
This is related to rdar://problem/18742778.
llvm-svn: 231081
Clang has introduced ::max_align_t in stddef.h in r201729, but libc++ was
already defining std::max_align_t on Darwin because there was none in the
global namespace. After that Clang commit though, libc++ started defining
std::max_align_t to be a typedef for ::max_align_t, which has a different
definition. This changed the ABI. This commit restores the previous
definition.
rdar://19919394 rdar://18557982
llvm-svn: 230292
Use long long for the epi64 argument, like the other intrinsics.
NFC since this is only defined in 64-bit mode, not in 32-bit.
Fix suggested by H. J. Lu!
llvm-svn: 229886
Summary:
The definition for _mm256_insert_epi64 was taking an int, which would get
truncated before being inserted in the vector.
Original patch by Joshua Magee!
Reviewers: bruno, craig.topper
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D7179
llvm-svn: 229811
Also removed unused builtins.
Original patch by Andrea Di Biagio!
Reviewers: craig.topper, nadav
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D7199
llvm-svn: 228481
Analogous to AVX2, these need to be implemented as macros to properly
propagate the immediate index operand.
Part of <rdar://problem/17688758>
llvm-svn: 226496
These are implemented with __builtin_shufflevector just like AVX.
We have some tests on the LLVM side to assert that these shufflevectors do
indeed generate the corresponding unpck instruction.
Part of <rdar://problem/17688758>
llvm-svn: 225922
libunwind in all cases when installed.
At the time, Clang's unwind.h didn't provide huge chunks of the
LSB-specified unwind interface, and was generally too aenemic to use for
real software. However, it has since then become a strict superset of
the APIs provided by libunwind on Linux. Notably, you cannot compile
llgo's libgo library against libunwind, but you can against Clang's
unwind.h. So let's just use our header. =] I've checked pretty
thoroughly for any incompatibilities, and I am not aware of any.
An open question is whether or not we should continue to munge
GNU_SOURCE here. I didn't touch that as it potentially has compatibility
implications on systems I cannot easily test -- Darwin. If a Darwin
maintainer can verify that this is in fact unnecessary and remove it,
cool. Until then, leaving it in makes this change a no-op there, and
only really relevant on Linux systems where it is pretty clearly the
right way to go.
llvm-svn: 224934
necessary to be fully compatible with existing software that calls into
the linux unwind code. You can find documentation of this API and why it
exists in the discussion abot NPTL here:
https://gcc.gnu.org/ml/gcc-patches/2003-09/msg00154.html
llvm-svn: 224933
This still lower to the same intrinsics as before.
This is preparation for bounds checking the immediate on the avx version of the builtin so we don't pass illegal immediates into the backend. Since SSE uses a smaller size immediate its not possible to bounds check when using a shared builtin. Rather than creating a clang specific builtin for the different immediate, I decided (after consulting with Chandler) that it was better to match gcc.
llvm-svn: 224879
Implement _umul128; it provides the high and low halves of a 128-bit
multiply. We can simply use our __int128 arithmetic to implement this,
we generate great code for it:
movq %rdx, %rax
mulq %rcx
movq %rdx, (%r8)
retq
Differential Revision: http://reviews.llvm.org/D6486
llvm-svn: 223175
VSX makes the "vector long long" and "vector double" types available.
This patch enables the vec_perm interface for these types. The same
builtin is generated regardless of the specified type, so no
additional work or testing is needed in the back end. Tests are added
to ensure this builtin is generated by the front end.
llvm-svn: 221988
This patch adds builtin support for xvdivdp and xvdivsp, along with a
new test case. The builtins are accessed using vec_div in altivec.h.
Builtins are listed (mostly) alphabetically there, so inserting these
changed the line numbers for deprecation warnings tested in
test/Headers/altivec-intrin.c.
There is a companion patch for LLVM.
llvm-svn: 221984
This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for
PowerPC, which provide programmer access to the lxvd2x, lxvw4x,
stxvd2x, and stxvw4x instructions.
New code in altivec.h defines these in terms of new builtins, which
are themselves defined in BuiltinsPPC.def. The builtins are converted
to LLVM intrinsics in CGBuiltin.cpp. Additional code is added to
builtins-ppc-vsx.c to verify the correct generation of the intrinsics.
Note that I moved the other VSX builtins so all VSX builtins will be
alphabetical in their own section in BuiltinsPPC.def.
There is a companion patch for LLVM.
llvm-svn: 221768
Now that we have initial support for VSX, we can begin adding
intrinsics for programmer access to VSX instructions. This patch
performs the necessary enablement in the front end, and tests it by
implementing intrinsics for minimum and maximum using the vector
double data type.
The main change in the front end is to no longer disallow "vector" and
"double" in the same declaration (lib/Sema/DeclSpec.cpp), but "vector"
and "long double" must still be disallowed. The new intrinsics are
accessed via vec_max and vec_min with changes in
lib/Headers/altivec.h. Note that for v4f32, we already access
corresponding VMX builtins, but with VSX enabled we should use the
forms that allow all 64 vector registers.
The new built-ins are defined in include/clang/Basic/BuiltinsPPC.def.
I've added a new test in test/CodeGen/builtins-ppc-vsx.c that is
similar to, but much smaller than, builtins-ppc-altivec.c. This
allows us to test VSX IR generation without duplicating CHECK lines
for the existing bazillion Altivec tests.
Since vector double is now legal when VSX is available, I've modified
the error message, and changed where we test for it and for vector
long double, since the target machine isn't visible in the old place.
This serendipitously removed a not-pertinent warning about 'long'
being deprecated when used with 'vector', when "vector long double" is
encountered and we just want to issue an error. The existing tests
test/Parser/altivec.c and test/Parser/cxx-altivec.cpp have been
updated accordingly, and I've added test/Parser/vsx.c to verify that
"vector double" is now legitimate with VSX enabled.
There is a companion patch for LLVM.
llvm-svn: 220989
The Windows NT SDK uses __readfsdword and declares it as a compiler provided
builtin (#pragma intrinsic(__readfsword). Because intrin.h is not referenced
by winnt.h, it is not possible to provide an out-of-line definition for the
intrinsic. Provide a proper compiler builtin definition.
llvm-svn: 220859
The use of the vec_lvsl and vec_lvsr interfaces are discouraged for
little endian targets since Power8 hardware is a minimum requirement,
and Power8 provides reasonable performance for unaligned vector loads
and stores. Up till now we have not provided "correct" (i.e., big-
endian-compatible) code generation for these interfaces, as to do so
produces poorly performing code. However, this has become the source
of too many questions.
With this patch, LLVM will now produce compatible code for these
interfaces, but will also produce a deprecation warning message for
PPC64LE when one of them is used. This should make the porting direction
clearer to programmers. A similar patch has recently been committed to
GCC.
This patch includes a test for the warning message. There is a companion
patch that adds two unit tests to projects/test-suite.
llvm-svn: 219137
Adds a Clang-specific implementation of C11's stdatomic.h header. On systems,
such as FreeBSD, where a stdatomic.h header is already provided, we defer to
that header instead (using our __has_include_next technology). Otherwise, we
provide an implementation in terms of our __c11_atomic_* intrinsics (that were
created for this purpose).
C11 7.1.4p1 requires function declarations for atomic_thread_fence,
atomic_signal_fence, atomic_flag_test_and_set,
atomic_flag_test_and_set_explicit, and atomic_flag_clear, and requires that
they have external linkage. Accordingly, we provide these declarations, but if
a user elides the shadowing macros and uses them, then they must have a libc
(or similar) that actually provides definitions.
atomic_flag is implemented using _Bool as the underlying type. This is
consistent with the implementation provided by FreeBSD and also GCC 4.9 (at
least when __GCC_ATOMIC_TEST_AND_SET_TRUEVAL == 1).
Patch by Richard Smith (rebased and slightly edited by me -- Richard said I
should drive at this point).
llvm-svn: 218957
When building with modules enabled, we were defining max_align_t as a typedef
for a different anonymous struct type each time it was included, resulting in
an error if <stddef.h> is not covered by a module map and is included more than
once in the same modules-enabled compilation of C11 or C++11 code.
llvm-svn: 218931
This commit makes two changes:
- Remove the push and pop instructions that were saving and restoring %ebx
before and after cpuid in 32-bit pic mode. We were doing this to ensure we
don't lose the GOT address in pic register %ebx, but this isn't necessary
because the GOT address is kept in a virtual register.
- In 64-bit mode, preserve base register %rbx around cpuid.
This fixes PR20311 and rdar://problem/17686779.
llvm-svn: 218173
They are added to adxintrin.h but outside __ADX__ block.
These intrinics generates adc and sbb correspondingly that were available before ADX
llvm-svn: 218118
Summary:
ACLE 2.0 section 9.2 defines the following "miscellaneous data processing intrinsics": `__clz`, `__cls`, `__ror`, `__rev`, `__rev16`, `__revsh` and `__rbit`.
`__clz` has already been implemented in the arm_acle.h header file. The rest are not supported yet. This patch completes ACLE data processing intrinsics.
Reviewers: t.p.northover, rengolin
Reviewed By: rengolin
Subscribers: aemerson, mroth, llvm-commits
Differential Revision: http://reviews.llvm.org/D4983
llvm-svn: 216658
Similar approach to the set1 intrinsics is used: implement in terms of vector
initializers and then ensure with an LLVM test that a broadcast is generated
at the end.
Part of <rdar://problem/17688758>
llvm-svn: 215486
Note that similar to palingr, we could further optimize these to emit
shufflevector when the shift count is <=64. This however does not
change the overall design that unlike palignr we would still need the LLVM
intrinsic corresponding to this intruction to handle the >64 cases. (palignr
uses the psrldq intrinsic in this case.)
llvm-svn: 214891
My original LE implementation of the vsldoi instruction, with its
altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect
shufflevector operations in the LLVM IR. Correct code is generated
because the back end handles the incorrect shufflevector in a
consistent manner.
This patch and a companion patch for LLVM correct this problem by
removing the fixup from altivec.h and the corresponding fixup from the
PowerPC back end. Several test cases are also modified to reflect the
now-correct LLVM IR.
The vec_sums and vec_vsumsws interfaces in altivec.h are also fixed,
because they used vec_perm calls intended to be recognized as vsldoi
instructions. These vec_perm calls are now replaced with code that
more clearly shows the intent of the transformation.
llvm-svn: 214801
(Dropped the byte and word variants from the patch. Turns out these are not
part of AVX512F but only AVX512BW/VL.)
Part of <rdar://problem/17688758>
llvm-svn: 214314
char-based types from "char" to "signed char". Adjust stdint.h to use
__INTx_TYPE__ directly without prefixing it with signed and to use
__UINTx_TYPE__ for unsigned ones.
The value of __INTx_TYPE__ now matches GCC.
llvm-svn: 214119
This adds the ARM ACLE hint intrinsic wrappers to arm_acle.h. These need to be
protected with a !defined(_MSC_VER) since MSVC (and thus clang in compatibility
mode) provide these wrappers as proper builtin intrinsics.
llvm-svn: 212891
Also provide _setjmpex(). r200243 put in _setjmp() and _setjmpex() behind a
comment since jmp_buf wasn't available. r200344 added jmp_buf and put in
_setjmp(), but missed _setjmpex().
llvm-svn: 212557
Protect MMX specific declarations under a __MMX__ guard. This header can be
included on non-x86 architectures (e.g. ARM) which do not support the MMX ISA.
Use the preprocessor to prevent these declarations from being processed.
llvm-svn: 212512
Although the functions are marked as always_inline, the compiler with which they
are used may not honour the extended attributes and emit them as functions. In
such a case, indicate that they should have extern "C" linkage and should not be
mangled in C++ style if used within C++.
llvm-svn: 212511
defined or defined identically before there will not be any
change in functionality.
MinGW-w64 defines __GNUC_VA_LIST as
#define __GNUC_VA_LIST
which is different than the definition here, causing
a warning without the guard.
llvm-svn: 212183
This patch adds intrinsic __rdpmc to header file 'ia32intrin.h'.
Intrinsic __rdmpc can be used to read performance monitoring counters. It is
implemented as a direct call to __builtin_ia32_rdpmc.
It takes as input a value representing the index of the performance counter to
read. The value of the performance counter is then returned as a unsigned
64-bit quantity.
llvm-svn: 212053