added vpclmulqdq feature recognition
added intrinsics support for vpclmulqdq instructions
_mm256_clmulepi64_epi128
_mm512_clmulepi64_epi128
matching a similar work on the backend (D40101)
Differential Revision: https://reviews.llvm.org/D41573
llvm-svn: 321480
added vaes feature recognition
added intrinsics support for vaes instructions, matching a similar work on the backend (D40078)
_mm256_aesenc_epi128
_mm512_aesenc_epi128
_mm256_aesenclast_epi128
_mm512_aesenclast_epi128
_mm256_aesdec_epi128
_mm512_aesdec_epi128
_mm256_aesdeclast_epi128
_mm512_aesdeclast_epi128
llvm-svn: 321474
* __shfl_{up,down}* uses unsigned int for the third parameter.
* added [unsigned] long overloads for non-sync shuffles.
Differential Revision: https://reviews.llvm.org/D41521
llvm-svn: 321326
This patch, together with a matching llvm patch (https://reviews.llvm.org/D39720), implements the lowering of X86 kunpack intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D39719
Change-Id: Id5d3cb394ad33b98be79a6783d1d15569e2b798d
llvm-svn: 319777
Use this function to create the install targets rather than doing so
manually, which gains us the `-stripped` install targets to perform
stripped installations.
Differential Revision: https://reviews.llvm.org/D40675
llvm-svn: 319489
CUDA-9 headers check for specific libc++ version and ifdef out
some of the definitions we need if LIBCPP_VERSION >= 3800.
Differential Revision: https://reviews.llvm.org/D40198
llvm-svn: 319485
Shadow stack solution introduces a new stack for return addresses only.
The stack has a Shadow Stack Pointer (SSP) that points to the last address to which we expect to return.
If we return to a different address an exception is triggered.
This patch includes shadow stack intrinsics as well as the corresponding CET header.
It includes CET clang flags for shadow stack and Indirect Branch Tracking.
For more information, please see the following:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
Differential Revision: https://reviews.llvm.org/D40224
Change-Id: I79ad0925a028bbc94c8ecad75f6daa2f214171f1
llvm-svn: 318995
fma4 instructions zero the upper bits of the xmm register. fma3 instructions leave the bits unmodified. This requires separate builtins for the different semantics.
While we're cleaning up the scalar builtins this also removes the fma3 fmsub/fnmadd/fnmsub builtins by using negates in the header file.
llvm-svn: 318985
Summary:
__builtin_nexttoward lowers to a libcall, e.g. nexttowardf(), that CUDA
does not have.
Rather than try to implement it, we simply remove these functions --
nvcc doesn't support them either, and nextafter, which does work, does
essentially the same thing on GPUs, because GPUs don't have long double.
Reviewers: tra
Subscribers: cfe-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D40152
llvm-svn: 318494
Change Header files of the intrinsics for lowering test and testn intrinsics to IR code.
Removed test and testn builtins from clang
Differential Revision: https://reviews.llvm.org/D38737
llvm-svn: 318035
This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D38672
Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047
llvm-svn: 318025
Summary:
How embarrassing.
This is tested in the test-suite -- fix to come there in a separate
patch.
Reviewers: tra
Subscribers: sanjoy, cfe-commits
Differential Revision: https://reviews.llvm.org/D39817
llvm-svn: 317961
The backend should be able to combine the negates to create fmsub, fnmadd, and fnmsub. faddsub converting to fsubadd still needs work I think, but should be very doable.
This matches what we already do for the masked builtins.
This only covers the packed builtins. Scalar builtins will be done after FMA4 is fixed.
llvm-svn: 317873
I think we need to use different builtins for the FMA4 instructions since those instructions zero the upper bits and FMA3 instructions pass the bits through.
So this moves the existing builtins to be the FMA3 versions. New versions will be added for FMA4.
llvm-svn: 317766
Summary: According to Intel docs this should take void const *. We had char*. The lack of const is the main issue.
Reviewers: RKSimon, zvi, igorb
Reviewed By: igorb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38782
llvm-svn: 315470
The __builtin_ia32_pbroadcastq512_mem_mask we were previously trying to use in 32-bit mode is not implemented in the x86 backend and causes isel to fail in release builds. In debug builds it fails even earlier during legalization with an llvm_unreachable.
While there add the missing test case for this intrinsic for this for 64-bit mode.
This fixes PR34631. D37668 should be able to recover this for 32-bit mode soon. But I wanted to fix the crash ahead of that.
llvm-svn: 313392
In CUDA-9 some of device-side math functions that we need are conditionally
defined within '#if _GLIBCXX_MATH_H'. We need to temporarily undo the guard
around inclusion of math_functions.hpp.
Differential Revision: https://reviews.llvm.org/D37906
llvm-svn: 313369
For now CUDA-9 is not included in the list of CUDA versions clang
searches for, so the path to CUDA-9 must be explicitly passed
via --cuda-path=.
On LLVM side NVPTX added sm_70 GPU type which bumps required
PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment.
Differential Revision: https://reviews.llvm.org/D37576
llvm-svn: 312734
Summary:
Tests have to live in the test-suite, and so will come in a separate
patch.
Fixes PR34360.
Reviewers: tra
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D37539
llvm-svn: 312681
Based off the Intel Intrinsics guide, we should expect a void const* argument.
Prevents 'passing 'const void *' to parameter of type 'void *' discards qualifiers' warnings.
Differential Revision: https://reviews.llvm.org/D37449
llvm-svn: 312523
This patch implements the broadcastf32x2/broadcasti32x2 intrinsics using __builtin_shufflevector.
Differential Revision: https://reviews.llvm.org/D37287
llvm-svn: 312135
GCC will interpret `__attribute__((__aligned__))` as 8-byte alignment on
ARM, but clang will not. Explicitly specify the alignment. This
mirrors the declaration in libunwind.
llvm-svn: 311576
The C++ ABI requires that the exception object (which under AEABI is the
`_Unwind_Control_Block`) is double-word aligned. The attribute was
applied to the `_Unwind_Exception` type, but not the
`_Unwind_Control_Block`. This should fix the libunwind test for the
alignment of the exception type.
llvm-svn: 311563
OpenCL spec v2.0 s6.13.6:
gentype select (gentype a,
gentype b,
igentype c)
gentype select (gentype a,
gentype b,
ugentype c)
igentype and ugentype must have the same number
of elements and bits as gentype.
Differential Revision: https://reviews.llvm.org/D36259
llvm-svn: 310160
OpenCL 2.0 atomic builtin functions have a scope argument which is ideally
represented as synchronization scope argument in LLVM atomic instructions.
Clang supports translating Clang atomic builtin functions to LLVM atomic
instructions. However it currently does not support synchronization scope
of LLVM atomic instructions. Without this, users have to use LLVM assembly
code to implement OpenCL atomic builtin functions.
This patch adds OpenCL 2.0 atomic builtin functions as Clang builtin
functions, which supports generating LLVM atomic instructions with
synchronization scope operand.
Currently only constant memory scope argument is supported. Support of
non-constant memory scope argument will be added later.
Differential Revision: https://reviews.llvm.org/D28691
llvm-svn: 310082
Clang specifies a max type alignment of 16 bytes on darwin targets (annoyingly in the driver not via cc1), meaning that the builtin nontemporal stores don't correctly align the loads/stores to 32 or 64 bytes when required, resulting in lowering to temporal unaligned loads/stores.
This patch casts the vectors to explicitly aligned types prior to the load/store to ensure that the require alignment is respected.
Differential Revision: https://reviews.llvm.org/D35996
llvm-svn: 309488
The EHABI definition was being inlined into the users even when EHABI
was not in use. Adjust the condition to ensure that the right version
is defined.
llvm-svn: 309327
Ensure that we define the `_Unwind_Control_Block` structure used on ARM
EHABI targets. This is needed for building libc++abi with the unwind.h
from the resource dir. A minor fallout of this is that we needed to
create a typedef for _Unwind_Exception to work across ARM EHABI and
non-EHABI targets. The structure definitions here are based originally
on the documentation from ARM under the "Exception Handling ABI for the
ARM® Architecture" Section 7.2. They are then adjusted to more closely
reflect the definition in libunwind from LLVM. Those changes are
compatible in layout but permit easier use in libc++abi and help
maintain compatibility between libunwind and the compiler provided
definition.
llvm-svn: 309226
This patch updates the vecintrin.h header file to provide the new
set of high-level vector built-in functions. This matches the
updated definition implemented by other compilers for the platform,
indicated by the pre-defined macro __VEC__ == 10302.
Note that some of the new functions (notably those involving the
vector float data type) are only available with -march=z14
(indicated by __ARCH__ == 12).
llvm-svn: 308199
Corrected several typos and incorrect parameters description that Sony
's techinical writer found during review.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 307838
Summary:
The _bit_scan_forward and _bit_scan_reverse intrinsics were accidentally
masked under the preprocessor checks that prune intrinsics definitions for the
benefit of faster compile-time on Windows. This patch moves the
definitons out of that region.
Fixes pr33722
Reviewers: craig.topper, aaboud, thakis
Reviewed By: craig.topper
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D35184
llvm-svn: 307524
The second argument must be a constant, otherwise instruction selection
will fail. always_inline is not enough for isel to always fold
everything away at -O0.
Sadly the overloading turned this into a big macro mess. Fixes PR33212.
llvm-svn: 304205
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the Clang side of the addition of six intrinsics for two new machine instructions (vpopcntd and vpopcntq).
It also includes the addition of the new feature set.
Differential Revision: https://reviews.llvm.org/D33170
llvm-svn: 303857
(2) Removed uncessary anymore \c commands, since the same effect will be achived by <c> ... </c> sequence.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 303228
Separated very long brief sections into two sections.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 303031
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).
Differential Revision: https://reviews.llvm.org/D32770
llvm-svn: 302418
Implemented the remaining integer data processing intrinsics from
the ARM ACLE v2.1 spec, such as parallel arithemtic and DSP style
multiplications.
Differential Revision: https://reviews.llvm.org/D32282
llvm-svn: 302131
- I removed doxygen comments for the intrinsics that "alias" the other existing documented intrinsics and that only sligtly differ in spelling (single underscores vs. double underscores).
#define _tzcnt_u16(a) (__tzcnt_u16((a)))
It will be very hard to keep the documentation for these "aliases" in sync with the documentation for the intrinsics they alias to. Out of sync documentation will be more confusing than no documentation.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 301652
size_t is usually defined as unsigned long, but on 64-bit platforms,
stdint.h currently defines SIZE_MAX using "ull" (unsigned long long).
Although this is the same width, it doesn't necessarily have the same
alignment or calling convention. It also triggers printf warnings when
using the format flag "%zu" to print SIZE_MAX.
This changes SIZE_MAX to reuse the compiler-provided __SIZE_MAX__, and
provides similar fixes for the other integers:
- INTPTR_MIN
- INTPTR_MAX
- UINTPTR_MAX
- PTRDIFF_MIN
- PTRDIFF_MAX
- INTMAX_MIN
- INTMAX_MAX
- UINTMAX_MAX
- INTMAX_C()
- UINTMAX_C()
... and fixes the typedefs for intptr_t and uintptr_t to use
__INTPTR_TYPE__ and __UINTPTR_TYPE__ instead of int32_t, effectively
reverting r89224, r89226, and r89237 (r89221 already having been
effectively reverted).
We can probably also kill __INTPTR_WIDTH__, __INTMAX_WIDTH__, and
__UINTMAX_WIDTH__ in a follow-up, but I was hesitant to delete all the
per-target CHECK lines in this commit since those might serve their own
purpose.
rdar://problem/11811377
llvm-svn: 301593
Summary: This patch makes the header `stdatomic.h` work when `-fms-compatibility` is specified.
Reviewers: rsmith
Reviewed By: rsmith
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D32322
llvm-svn: 300919
- To be consistent with the rest of the intrinsics headers, I removed the tags <i> .. </i> for marking instruction names in italics in in smmintrin.h.
- Formatting changes to fit into 80 characters.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 300578
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.
LLVM companion patch: D31767.
Differential Revision: https://reviews.llvm.org/D31766
llvm-svn: 300326
It's used by MS headers in VS 2017 without including intrin.h, so we
can't implement it in the header anymore.
Differential Revision: https://reviews.llvm.org/D31736
llvm-svn: 299782
It seems MS headers have started using __readgsqword, and since it's
used in a header that doesn't include intrin.h, we can't implement it as
an inline function anymore.
That was already the case for __readfsdword, which Saleem added support
for in r220859. This patch reuses that codegen to implement all of
__read[fg]s{byte,word,dword,qword}.
Differential Revision: https://reviews.llvm.org/D31248
llvm-svn: 298538
I made some small changes in smmintrin.h and emmintrin.h intrinsics.
- changed some regular comments '//' into doxygen-style comments '///' where necessary
- removed some trailing spaces in doxygen comments.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 298371
- Fix a variable naming mismatch
- Fix gcc extension pointer arithmetic on void to cast to char *.
- Test that the header (and htmintrin.h) parse.
llvm-svn: 298318
Reapply r289181 but rename the include guard to avoid
conflict with the one from Darwin.
Allow darwin to provide additional definitions and implementation
specifc values for tgmath.h on Apple platforms.
rdar://problem/19019845
llvm-svn: 298013
The DAZ feature introduces the denormal zero support for x86.
Currently the definitions are located under SSE3 header, however there are some SSE2 targets that support the feature as well.
Differential Revision: https://reviews.llvm.org/D30194
llvm-svn: 296296
Note: The doxygen comments are automatically generated based on Sony's intrinsic
s document.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 295404
Removed ndrange_t as Clang builtin type and added
as a struct type in the OpenCL header.
Use type name to do the Sema checking in enqueue_kernel
and modify IR generation accordingly.
Review: D28058
Patch by Dmitry Borisenkov!
llvm-svn: 295311
__fastfail terminates the process immediately with a special system
call. It does not run any process shutdown code or exception recovery
logic.
Fixes PR31854
llvm-svn: 294606
1. Adds the command line flag for clzero.
2. Includes the clzero flag under znver1.
3. Defines the macro for clzero.
4. Adds a new file which has the intrinsic definition for clzero instruction.
Patch by Ganesh Gopalasubramanian with some additional tests from me.
Differential revision: https://reviews.llvm.org/D29386
llvm-svn: 294559
Added doxygen comments to prfchwintrin.h's intrinsics.
Note: The doxygen comments are automatically generated based on Sony's intrinsic
s document.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
llvm-svn: 293745
Prior to OpenCL 2.0, image3d_t can only be used with the write_only
access qualifier when the cl_khr_3d_image_writes extension is enabled,
see e.g. OpenCL 1.1 s6.8b.
Require the extension for write_only image3d_t types and guard uses of
write_only image3d_t in the OpenCL header.
Patch by Sven van Haastregt!
Review: https://reviews.llvm.org/D28860
llvm-svn: 293050
For a << b (as original vec_sl does), if b >= sizeof(a) * 8, the
behavior is undefined. However, Power instructions do define the
behavior, which is equivalent to a << (b % (sizeof(a) * 8)).
This patch changes altivec.h to use a << (b % (sizeof(a) * 8)), to
ensure the consistent semantic of the instructions. Then it combines
the generated multiple instructions back to a single shift.
This patch handles left shift only. Right shift, on the other hand, is
more complicated, considering arithematic/logical right shift.
Differential Revision: https://reviews.llvm.org/D28037
llvm-svn: 292659
Added doxygen comments for the newly added intrinsics in avxintrin.h, namely _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and _mm256_cvtss_f32
Added doxygen comments for the new intrinsics in emmintrin.h, namely _mm_loadu_si64 and _mm_load_sd.
Explicit parameter names were added for _mm_clflush and _mm_setcsr
The rest of the changes are editorial, removing trailing spaces at the end of the lines.
Differential Revision: https://reviews.llvm.org/D28503
llvm-svn: 291876
Add builtins for the functions and custom codegen mapping the builtins to their
corresponding intrinsics and handling the endian related swapping.
https://reviews.llvm.org/D26546
llvm-svn: 291179
Summary:
MSVC seems to use "__in" and "__out" for its own purposes, so we have to
pick different names in this macro.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D28325
llvm-svn: 291138
Summary:
These duplicate declarations cause a problem for CUDA compiles on
Windows. All implicitly-defined functions are host+device, and this
applies to the declarations in Builtin.def. But then when we see the
declarations in intrin.h, they have no attributes, so are host-only
functions. This is an error.
(A better fix might be to make these builtins host-only, but that is a
much bigger change.)
Reviewers: rnk
Subscribers: cfe-commits, echristo
Differential Revision: https://reviews.llvm.org/D28317
llvm-svn: 291128
CUDA-8.0 comes with new headers which nvcc pre-includes via cuda_runtime.h
Clang now makes them available as well.
Differential Revision: https://reviews.llvm.org/D28301
llvm-svn: 290982
Added \n commands to insert a line breaks where necessary, since one long line of documentation is nearly unreadable.
Formatted comments to fit into 80 chars.
In some cases added \a command in front of the parameter names to display them in italics.
llvm-svn: 290619
Improved doxygen comments for the following intrinsics headers: __wmmintrin_pclmul.h, bmiintrin.h, emmintrin.h, f16cintrin.h, immintrin.h, mmintrin.h, pmmintrin.h, tmmintrin.h
Added \n commands to insert a line breaks where necessary, since one long line of documentation is nearly unreadable.
Formatted comments to fit into 80 chars.
In some cases added \a command in front of the parameter names to display them in italics.
llvm-svn: 290561
According to extended asm syntax, a case where the clobber list includes a variable from the inputs or outputs should be an error - conflict.
for example:
const long double a = 0.0;
int main()
{
char b;
double t1 = a;
__asm__ ("fucompp": "=a" (b) : "u" (t1), "t" (t1) : "cc", "st", "st(1)");
return 0;
}
This should conflict with the output - t1 which is st, and st which is st aswell.
The patch fixes it.
Commit on behald of Ziv Izhar.
Differential Revision: https://reviews.llvm.org/D15075
llvm-svn: 290539
Added \n commands to insert a line breaks where necessary to make the documentation more readable.
Formatted comments to fit into 80 chars.
llvm-svn: 290458
Tagged parameter names with \a doxygen command to display parameters in italics.
Added \n commands to insert a line break to make the documentation more readable.
Formatted comments to fit into 80 chars.
llvm-svn: 290455
Added a map to associate types and declarations with extensions.
Refactored existing diagnostic for disabled types associated with extensions and extended it to declarations for generic situation.
Fixed some bugs for types associated with extensions.
Allow users to use pragma to declare types and functions for supported extensions, e.g.
#pragma OPENCL EXTENSION the_new_extension_name : begin
// declare types and functions associated with the extension here
#pragma OPENCL EXTENSION the_new_extension_name : end
Differential Revision: https://reviews.llvm.org/D21698
llvm-svn: 289979
Reverts r289181: it's currently breaking modules using simd.h in
10.12 SDK.
This reverts commit 6e73e3464e96a4e00492c24aa790d36e1adb5702.
llvm-svn: 289487
This will allow the backend to constant fold these to generic shuffle vectors like 128-bit and 256-bit without having to working about handling masking.
llvm-svn: 289351
This will allow the backend to constant fold these to generic shuffle vectors like 128-bit and 256-bit without having to working about handling masking.
llvm-svn: 289345
Tagged instruction names with <c> INSTR_NAME </c> to display them in typewriter font.
In the past, \c command was used, unfortunately it applied to only one word.
<c> .. </c> has the same meaning, but applies to all words in between the tags.
llvm-svn: 289249
Allow darwin to provide additional definitions and implementation
specifc values for tgmath.h on Apple platforms.
rdar://problem/19019845
llvm-svn: 289181
Improved doxygen comments for fxsrintrin.h and mmintrin.h intrinsics by taagging parameter names with \a doxygen command to display parameters in italics.
Formatted comments to fit into 80 chars.
llvm-svn: 289154
Improved doxygen comments for __wmmintrin_pclmul.h and ammintrin.h intrinsics by taagging parameter names with \a doxygen command to display parameters in italics.
Formatted comments to fit into 80 chars.
llvm-svn: 289083
Documentation for some of the avxintrin.h's intrinsics errorneously said that
non VEX-prefixed instructions could be generated. This was fixed.
I tried several different solutions to achieve pretty printing of unordered lists (nested and non-nested) in param sections in doxygen.
llvm-svn: 287990
(commit again after fixing the buildbot failures)
This adds various overloads of the following builtins to altivec.h:
vec_neg
vec_nabs
vec_adde
vec_addec
vec_sube
vec_subec
vec_subc
Note that for vec_sub builtins on 32 bit integers, the semantics is similar to
what ISA describes for instructions like vsubecuq that work on quadwords: the
first operand is added to the one's complement of the second operand. (As
opposed to two's complement which I expected).
llvm-svn: 287872