This patch implements the __rndr and __rndrrs intrinsics to provide access to the random
number instructions introduced in Armv8.5-A. They are only defined for the AArch64
execution state and are available when __ARM_FEATURE_RNG is defined.
These intrinsics store the random number in their pointer argument and return a status
code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter
intrinsic reseeds the random number generator.
The instructions write the NZCV flags indicating the success of the operation that we can
then read with a CSET.
[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] https://bugs.llvm.org/show_bug.cgi?id=47838
Differential Revision: https://reviews.llvm.org/D98264
Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc
Demonstrate how to generate vadd/vfadd intrinsic functions
1. add -gen-riscv-vector-builtins for clang builtins.
2. add -gen-riscv-vector-builtin-codegen for clang codegen.
3. add -gen-riscv-vector-header for riscv_vector.h. It also generates
ifdef directives with extension checking, base on D94403.
4. add -gen-riscv-vector-generic-header for riscv_vector_generic.h.
Generate overloading version Header for generic api.
https://github.com/riscv/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#c11-generic-interface
5. update tblgen doc for riscv related options.
riscv_vector.td also defines some unused type transformers for vadd,
because I think it could demonstrate how tranfer type work and we need
them for the whole intrinsic functions implementation in the future.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>
Reviewed By: jrtc27, craig.topper, HsiangKai, Jim, Paul-C-Anagnostopoulos
Differential Revision: https://reviews.llvm.org/D95016
The VSX-only overloads (for 8-byte element vectors) are missing.
Add the missing overloads and convert element numbering to
modulo arithmetic to match GCC and XLC.
These interfaces are not covered in the ELFv2 ABI but are rather
implemented to emulate those available in GCC/XLC. However, the
ones in the other compilers are documented to perform modulo
arithmetic on the element number. This patch just brings clang
inline with the other compilers at -O0 (with optimization, clang
already does the right thing).
Adding support for intrinsics of AMX-BF16.
This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong
predicate.
Differential Revision: https://reviews.llvm.org/D97358
Move any remaining preprocessor defines from `opencl-c.h` to
`opencl-c-base.h`, such that they are shared with
`-fdeclare-opencl-builtins` too.
In particular, move:
- the `as_type` and `as_typen` definitions, and
- the `kernel_exec` and `__kernel_exec` definitions.
Also clang-format the changes.
Differential Revision: https://reviews.llvm.org/D96948
Supporting `printf` with `-fdeclare-opencl-builtins` would require
special handling (for e.g. varargs and format attributes) for just
this one function. Instead, move the `printf` declaration to the
shared base header.
Differential Revision: https://reviews.llvm.org/D96789
This is a follow up of D92940.
We have successfully converted fadd/fmul _mm_reduce_* intrinsics to
llvm.reduction + reassoc flag. We can do the same approach for fmin/fmax
too, i.e. llvm.reduction + nnan flag.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93179
vec_xl() and vec_xst() should not emit alignment hints since they take a
scalar pointer and also add a byte offset if passed.
This patch uses memcpy to achieve the desired result.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D96471
Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D96231
This patch adds possibility to define OpenCL C 3.0 feature macros
via command line option or target setting.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D95776
This introduces the ARMv8.7-A LS64 extension's intrinsics for 64 bytes
atomic loads and stores: `__arm_ld64b`, `__arm_st64b`, `__arm_st64bv`,
and `__arm_st64bv0`. These are selected into the LS64 instructions
LD64B, ST64B, ST64BV and ST64BV0, respectively.
Based on patches written by Simon Tatham.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D93232
- MSVC has different `<complex>` implementation which calls into functions
declared in `<ymath.h>`. Provide their device-side implementation to enable
`<complex>` compilation on HIP Windows.
Differential Revision: https://reviews.llvm.org/D93638
Followup to D87604, having confirmed on PR47506 that we can use the llvm codegen expansion for fadd/fmul as well.
Differential Revision: https://reviews.llvm.org/D92940
Extended subgroups are library style extensions and therefore
they require no changes in the frontend. This commit:
1. Moves extension macro definitions to the internal headers.
2. Removes extension pragmas because they are not needed.
Tags: #clang
Differential Revision: https://reviews.llvm.org/D92231
This patch implements amx programming model that discussed in llvm-dev
(http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html).
Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet.
This patch implemeted 7 components.
1. The c interface to end user.
2. The AMX intrinsics in LLVM IR.
3. Transform load/store <256 x i32> to AMX intrinsics or split the
type into two <128 x i32>.
4. The Lowering from AMX intrinsics to AMX pseudo instruction.
5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx
intruction.
6. The register allocation for tile register.
7. Morph AMX pseudo instruction to AMX real instruction.
Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0
Differential Revision: https://reviews.llvm.org/D87981
Previous patch (9a465057a6) did not fix the problem.
https://bugs.llvm.org/show_bug.cgi?id=48228
If the <new> is included too early, before CUDA-specific defines are available,
just include-next the standard <new> and undo the include guard. CUDA-specific
variants of operator new/delete will be declared if/when <new> is used from the
CUDA source itself, when all CUDA-related macros are available.
Differential Revision: https://reviews.llvm.org/D91807
Since there is no ROCm Device Library support for
long double, demote them to double, and use the fp64
math functions.
Differential Revision: https://reviews.llvm.org/D92130
Standard libc++ headers in stdc++ mode include <new> which picks up
cuda_wrappers/new before any of the CUDA macros have been defined.
We can not include CUDA headers that early, so the work-around is to define
__device__ in the wrapper header itself.
Differential Revision: https://reviews.llvm.org/D91807
opencl-c.h disables all extensions at its end, but opencl-c-base.h
does not, and that causes any inclusion of only opencl-c-base.h to
leave some extensions (such as cl_khr_fp16) enabled. This affects the
-fdeclare-opencl-builtins option for example.
This violates the OpenCL Extension Specification which specifies that
"The initial state of the compiler is as if the directive #pragma
OPENCL EXTENSION all : disable was issued".
Fix by disabling all extensions at the end of opencl-c-base.h and
enable extensions inside opencl.h which relied on opencl-c-base.h
enabling the cl_khr_fp16/64 extensions.
Differential Revision: https://reviews.llvm.org/D91429
This header has long lacked a standard multiple inclusion guard
like other headers have, for no apparent reason. The GCC header
of the same name likewise lacks one up through release 10.1, but
trunk GCC (release 11, and perhaps future 10.x) has fixed it
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96238).
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D91226
This patch adds three intrinsics compatible to x86's SSE 4.1 on PowerPC
target, with tests:
- _mm_insert_epi8
- _mm_insert_epi32
- _mm_insert_epi64
The intrinsics implementation is contributed by Paul Clarke.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D89242
Tremont microarchitecture only has GFNI(SSE) version, not AVX and
AVX512 version. This patch is to avoid compiling fail on Windows when
using -march=tremont to invoke one of GFNI(SSE) intrinsic.
Differential Revision: https://reviews.llvm.org/D90822
Similar to libcxx implementation of cmath function
overloads, use type promotion templates to determine
return types of multi-argument math functions.
Fixes: SWDEV-256825
Reviewed By: tra, yaxunl
Differential Revision: https://reviews.llvm.org/D90409
This patch mainly made the following changes:
1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.
Differential Revision: https://reviews.llvm.org/D89105
This is very similar to 7f1e6fcff9, just fixing a left-over.
With this, it should be possible to use both, -x cuda and -fopenmp in the same invocation,
enabling to use both OpenMP, targeting CPU, and CUDA, targeting the GPU.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90415
CUDA buildbots complained about a redefinition when I landed D89971.
This is odd and I fail to understand where in the CUDA headers the other
definition is supposed to be. For now, given that CUDA doesn't need the
overload (AFAIKT), we simply restrict it to the OpenMP mode.
Reported by Colleen Bertoni <bertoni@anl.gov> after running the OvO test
suite: https://github.com/TApplencourt/OvO/
The template overload is still hidden behind an ifdef for OpenMP. In the
future we probably want to remove the ifdef but that requires further
testing.
Reviewed By: JonChesterfield, tra
Differential Revision: https://reviews.llvm.org/D89971
The __ocml_*_rte_f32 and __ocml_*_rte_f64 functions are not
available if OCML_BASIC_ROUNDED_OPERATIONS is not defined.
Reviewed By: b-sumner, yaxunl
Fixes: SWDEV-257235
Differential Revision: https://reviews.llvm.org/D89966
Emit the equivalent integer reduction intrinsics in IR instead of expanding to shuffle+arithmetic sequences.
The fadd/fmul reductions might be trickier as they assume a similar bisection reduction while the generic intrinsics assume a sequential reduction (intel docs are ambiguous on the correct approach) - I'm not sure if we want to always tag them with reassoc? Anyway, that issue can wait until a separate fp patch along with the fmin/fmax reductions.
Differential Revision: https://reviews.llvm.org/D87604
Separate __clang_hip_math.h header into __clang_hip_cmath.h
and __clang_hip_math.h. Improve the math function definition,
and add missing definitions or declarations. Add missing
overloads.
Reviewed By: tra, JonChesterfield
Differential Review: https://reviews.llvm.org/D88837
We were taking multiple pointer arguments in the builtin.
gcc accepts a single void*.
The cast from void* to _m128i* caused the IR generation to assume
the pointer was aligned.
Instead make the builtin take a single void*, emit i8* GEPs to
adjust then cast to <2 x i64>* and perform a store with align of 1.
Summary: This patch implements the builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp.
The instructions correspond to the following builtins:
int vec_test_swdiv(vector double v1, vector double v2);
int vec_test_swdivs(vector float v1, vector float v2);
int vec_test_swsqrt(vector double v1);
int vec_test_swsqrts(vector float v1);
This patch depends on D88274, which fixes the bug in copying from CRRC to GPRC/G8RC.
Reviewed By: steven.zhang, amyk
Differential Revision: https://reviews.llvm.org/D88278
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D88398
This is needed to compile some headers in CUDA-11 that assume that threadIdx is
implicitly convertible to dim3. With NVCC, threadIdx is uint3 and there's
dim3(uint3) constructor. Clang uses a special type for the builtin variables, so
that path does not work. Instead, this patch adds conversion function to the
builtin variable classes. that will allow them to be converted to dim3 and uint3.
Differential Revision: https://reviews.llvm.org/D88250
This patch implements custom codegen for the vec_replace_elt and
vec_replace_unaligned builtins.
These builtins map to the @llvm.ppc.altivec.vinsw and @llvm.ppc.altivec.vinsd
intrinsics depending on the arguments. The main motivation for doing custom
codegen for these intrinsics is because there are float and double versions of
the builtin. Normally, the converting the float to an integer would be done via
fptoui in the IR. This is incorrect as fptoui truncates the value and we must
ensure the value is not truncated. Therefore, we provide custom codegen to utilize
bitcast instead as bitcasts do not truncate.
Differential Revision: https://reviews.llvm.org/D83500
This patch implements the vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins for vector signed/unsigned __int128.
Differential Revision: https://reviews.llvm.org/D87910
This patch implements the vector string isolate (predicate and non-predicate
versions) builtins. The predicate builtins are custom selected within PPCISelDAGToDAG.
Differential Revision: https://reviews.llvm.org/D87671
This patch implements the 128-bit vector divide extended builtins in Clang/LLVM.
These builtins map to the vdivesq and vdiveuq instructions respectively.
Differential Revision: https://reviews.llvm.org/D87729
This patch implements the vec_gen[b|h|w|d|q]m function prototypes in altivec.h
in order to utilize the move to VSR with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82725
This patch implements the vec_cntm function prototypes in altivec.h in order to
utilize the vector count mask bits instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82726
Instead of relying on whether a certain identifier is a builtin, introduce BuiltinAttr to specify a declaration as having builtin semantics.
This fixes incompatible redeclarations of builtins, as reverting the identifier as being builtin due to one incompatible redeclaration would have broken rest of the builtin calls.
Mostly-compatible redeclarations of builtins also no longer have builtin semantics. They don't call the builtin nor inherit their attributes.
A long-standing FIXME regarding builtins inside a namespace enclosed in extern "C" not being recognized is also addressed.
Due to the more correct handling attributes for builtin functions are added in more places, resulting in more useful warnings.
Tests are updated to reflect that.
Intrinsics without an inline definition in intrin.h had `inline` and `static` removed as they had no effect and caused them to no longer be recognized as builtins otherwise.
A pthread_create() related test is XFAIL-ed, as it relied on it being recognized as a builtin based on its name.
The builtin declaration syntax is too restrictive and doesn't allow custom structs, function pointers, etc.
It seems to be the only case and fixing this would require reworking the current builtin syntax, so this seems acceptable.
Fixes PR45410.
Reviewed By: rsmith, yutsumi
Differential Revision: https://reviews.llvm.org/D77491
The last (big) missing piece to get "math" working in OpenMP target
regions (that I know of) was complex math functions, e.g.,
`std::sin(std::complex<double>)`. With this patch we overload the system
template functions for these operations with versions that have been
distilled from `libcxx/include/complex`. We use the same
`omp begin/end declare variant`
mechanism we use for other math functions before, except that we this
time overload templates (via D85735).
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D85777
With this extension the effects of `omp begin declare variant` will be
applied to template function declarations. The behavior is opt-in and
controlled by the `extension(allow_templates)` trait. While generally
useful, this will enable us to implement complex math function calls by
overloading the templates of the standard library with the ones in
libc++.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D85735
`std::isnan` and friends can be found in two variants in the wild, one
returns `bool`, as the standard defines it, one returns `int`, as the C
macros do. So far we kinda hoped the system versions of these functions
will work for people, e.g. they are definitions that can be compiled for
the target. We know that is not the case always so we leverage the
`disable_implicit_base` OpenMP context extension to specialize both
versions of these functions without causing an invalid redeclaration.
Reviewed By: JonChesterfield, tra
Differential Revision: https://reviews.llvm.org/D85879
This patch implements the vec_expandm function prototypes in altivec.h in order
to utilize the vector expand with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82727
These overloads are listed in appendix A of the ELFv2 ABI specification
without a requirement for ISA 3.0. So these need to be available on
all Altivec-capable architectures. The implementation in altivec.h
erroneously had them guarded for Power9 due to the availability of
the VCMPNE[BHW] instructions. However these need to be implemented
in terms of the VCMPEQ[BHW] instructions on older architectures.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=47423
The load builtins in altivec.h do not have const in the signature
for the pointer parameter. This prevents using them for loading
from constant pointers. A notable case for such a use is Eigen.
This patch simply adds the missing const.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=47408
This patch implements the builtins for Vector Multiply Builtins (vmulxxd family of instructions), and adds the appropriate test cases for these builtins. The builtins utilize the vector multiply instructions itnroduced with ISA 3.1.
Differential Revision: https://reviews.llvm.org/D83955
For example:
#define FOO(x) (x)
FOO({});
... forms a statement-expression after macro expansion. This warning
applies to '({' and '})' delimiting statement-expressions, '[[' and ']]'
delimiting attributes, and '::*' introducing a pointer-to-member.
The warning for forming these compound tokens across macro expansions
(or across files!) is enabled by default; the warning for whitespace
within the tokens is not, but is included in -Wall.
Differential Revision: https://reviews.llvm.org/D86751
This patch implements the builtins for Vector Load with Zero and Signed Extend Builtins (lxvr_x for b, h, w, d), and adds the appropriate test cases for these builtins. The builtins utilize the vector load instructions itnroduced with ISA 3.1.
Differential Revision: https://reviews.llvm.org/D82502#inline-797941
This patch implements the function prototypes vec_mulh and vec_dive in order to
utilize the vector multiply high (vmulh[s|u][w|d]) and vector divide extended
(vdive[s|u][w|d]) instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82609
As suggested by @rsmith on PR47267, by replacing the builtin_memcpy bitcast pattern with builtin_bit_cast we can use _castf32_u32, _castu32_f32, _castf64_u64 and _castu64_f64 inside constant expresssions (constexpr). Although __builtin_bit_cast was added for c++20 it works on all clang c/c++ modes.
Differential Revision: https://reviews.llvm.org/D86398
This is a first step patch to enable constexpr support and testing to a large number of x86 intrinsics.
All I've done here is provide a DEFAULT_FN_ATTRS_CONSTEXPR variant to our existing DEFAULT_FN_ATTRS tag approach that adds constexpr on c++ builds. The clang cuda headers do something similar.
I've started with POPCNT mainly as its tiny and are wrappers to generic __builtin_* intrinsics which already act as constexpr.
Differential Revision: https://reviews.llvm.org/D86229
This patch implements the vec_extractm function prototypes in altivec.h in
order to utilize the vector extract with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82675
This patch implements the builtins for the vector shifts (shl, srl, sra), and
adds the appropriate test cases for these builtins. The builtins utilize the
vector shift instructions introduced within ISA 3.1.
Differential Revision: https://reviews.llvm.org/D83338
Automation to detect compiler features, such as CMake's target_compile_features,
would attempt to detect compiler features by explicitly using langugage flags.
This change ensures that the HIP headers would still work with C++98.
Patch by Siu Chi Chan
Differential Revision: https://reviews.llvm.org/D85471
Change-Id: I304e964b18a525b0fde55efd841da74b6c4dc8ed
This patch implements the function prototypes vec_extractl and vec_extracth in altivec.h to utilize the vector extract double element instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D84622
47f7174ffa changed the types used in the Wasm SIMD builtin functions,
but not all of their uses in wasm_simd128.h were updated. This commit
fixes wasm_simd128.h and adds tests to make sure similar problems do
not pass uncaught in the future.
Differential Revision: https://reviews.llvm.org/D85347
Normally math functions are forwarded to __nv_* counterparts provided by CUDA's
libdevice bitcode. However, __nv_rint*()/__nv_nearbyint*() functions there have
a bug -- they use round() which rounds *up* instead of rounding towards the
nearest integer, so we end up with rint(2.5f) producing 3.0 instead of expected
2.0. The broken bitcode is not actually used by NVCC itself, which has both a
work-around in CUDA headers and, in recent versions, uses correct
implementations in NVCC's built-ins.
This patch implements equivalent workaround and directs rint*/nearbyint* to
__builtin_* variants that produce correct results.
Differential Revision: https://reviews.llvm.org/D85236
This allows people to use `int8_t` instead of `char`, -funsigned-char,
and generally decouples SIMD from the specialness of `char`.
And it makes intrinsics like `__builtin_wasm_add_saturate_s_i8x16`
and `__builtin_wasm_add_saturate_u_i8x16` use signed and unsigned
element types, respectively.
Differential Revision: https://reviews.llvm.org/D85074
Power10 introduces new instructions for vector multiply, divide and modulus.
These instructions can be exploited by the builtin functions: vec_mul, vec_div,
and vec_mod, respectively.
This patch aims adds the function prototype, vec_mod, as vec_mul and vec_div
been previously implemented in altivec.h.
This patch also adds the following front end tests:
vec_mul for v2i64
vec_div for v4i32 and v2i64
vec_mod for v4i32 and v2i64
Differential Revision: https://reviews.llvm.org/D82576
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.
Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.
Differential Revision: https://reviews.llvm.org/D84556
This patch implements the `vec_xst_trunc` function in altivec.h in order to
utilize the Store VSX Vector Rightmost [byte | half | word | doubleword] Indexed
instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82467
Summary:
Make hip math headers easier to use from C
Motivation is a step towards using the hip math headers to implement math.h
for openmp, which needs to work with C as well as C++. NFC for C++ code.
Reviewers: yaxunl, jdoerfert
Reviewed By: yaxunl
Subscribers: sstefan1, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D84476
The old way worked to some degree for C++-mode but in C mode we actually
tried to introduce variants of macros (e.g., isinf). To make both modes
work reliably we get rid of those extra variants and directly use NVIDIA
intrinsics in the complex implementation. While this has to be revisited
as we add other GPU targets which want to reuse the code, it should be
fine for now.
Reviewed By: tra, JonChesterfield, yaxunl
Differential Revision: https://reviews.llvm.org/D83591
Due to recent changes we cannot use OpenMP in CUDA files anymore
(PR45533) as the math handling of CUDA is different when _OPENMP is
defined. We actually want this different behavior only if we are
offloading with OpenMP to NVIDIA, thus generating NVPTX. With this patch
we do not interfere with the CUDA math handling except if we are in
NVPTX offloading mode, as indicated by the presence of __OPENMP_NVPTX__.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D78155
To avoid linkage errors we have to ensure the linkage allows multiple
definitions of these compiler inserted functions. Since they are on the
cold path of complex computations, we want to avoid `inline`. Instead,
we opt for `weak` and `noinline` for now.
This simply follows the scheme we have for other wrappers. It resolves
the current link problem, e.g., `__muldc3 not found`, when std::complex
operations are used on a device.
This will not allow complex make math function calls to work properly,
e.g., sin, but that is more complex (pan intended) anyway.
Reviewed By: tra, JonChesterfield
Differential Revision: https://reviews.llvm.org/D80897
This enables _InterlockedAnd64/_InterlockedOr64/_InterlockedXor64/_InterlockedDecrement64/_InterlockedIncrement64/_InterlockedExchange64/_InterlockedExchangeAdd64/_InterlockedExchangeSub64 on 32-bit Windows
The backend already knows how to expand these to a loop using cmpxchg8b on 32-bit targets.
Fixes PR46595
Differential Revision: https://reviews.llvm.org/D83254
This patch implements builtins for the following prototypes:
unsigned long long __builtin_cntlzdm (unsigned long long, unsigned long long)
unsigned long long __builtin_cnttzdm (unsigned long long, unsigned long long)
vector unsigned long long vec_cntlzm (vector unsigned long long, vector unsigned long long)
vector unsigned long long vec_cnttzm (vector unsigned long long, vector unsigned long long)
Differential Revision: https://reviews.llvm.org/D80941
This patch implements builtins for the following prototypes for the VSX Permute
Control Vector Generate with Mask Instructions:
vector unsigned char vec_genpcvm (vector unsigned char, const int);
vector unsigned short vec_genpcvm (vector unsigned short, const int);
vector unsigned int vec_genpcvm (vector unsigned int, const int);
vector unsigned long long vec_genpcvm (vector unsigned long long, const int);
Differential Revision: https://reviews.llvm.org/D81774
This patch implements builtins for the following prototypes:
```
vector signed char vec_clrl (vector signed char a, unsigned int n);
vector unsigned char vec_clrl (vector unsigned char a, unsigned int n);
vector signed char vec_clrr (vector signed char a, unsigned int n);
vector signed char vec_clrr (vector unsigned char a, unsigned int n);
```
Differential Revision: https://reviews.llvm.org/D81707
This patch implements builtins for the following prototypes:
vector unsigned long long vec_pdep(vector unsigned long long, vector unsigned long long);
vector unsigned long long vec_pext(vector unsigned long long, vector unsigned long long __b);
unsigned long long __builtin_pdepd (unsigned long long, unsigned long long);
unsigned long long __builtin_pextd (unsigned long long, unsigned long long);
Revision Depends on D80758
Differential Revision: https://reviews.llvm.org/D80935
Summary:
This instruction was implemented in 3181273be7, but that commit did
not add an intrinsic for it.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81757
Recent change from `#if !defined(__CUDA__)` to `#if !__CUDA__` caused
regression on ROCm 3.5 since there is `#define __CUDA__`
before inclusion of the header file, which causes `#if !__CUDA__`
to be invalid.
Change `#if !__CUDA__` back to `#if !defined(__CUDA__)` for backward
compatibility.
To support std::complex and some other standard C/C++ functions in HIP device code,
they need to be forced to be __host__ __device__ functions by pragmas. This is done
by some clang standard C++ wrapper headers which are shared between cuda-clang and hip-Clang.
For these standard C++ wapper headers to work properly, specific include path order
has to be enforced:
clang C++ wrapper include path
standard C++ include path
clang include path
Also, these C++ wrapper headers require device version of some standard C/C++ functions
must be declared before including them. This needs to be done by including a default
header which declares or defines these device functions. The default header is always
included before any other headers are included by users.
This patch adds the the default header and include path for HIP.
Differential Revision: https://reviews.llvm.org/D81176
Added extensions and their function declarations into
the standard header.
Patch by Piotr Fusik!
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79781
Summary:
The shuffle intrinsic macros did not parenthesize usages of their
constant parameters, which could lead to incorrect results due to
operator precedence issues. This patch fixes the problem by adding the
missing paretheses.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80968
There are 65 that take a scalar shift amount. Intel documentation shows 60 of them taking unsigned int. There are 5 versions of srli_epi16 that use int, the 512-bit maskz and 128/256 mask/maskz.
Fixes PR45931
Differential Revision: https://reviews.llvm.org/D80251
Summary:
This reflects changes in the spec proposal made since basic arithmetic
was first implemented.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80174
Summary:
Add x86 feature with IBT and/or SHSTK bits to ELF program property if they are enabled. Otherwise, contents in this header file are unused.
This file is mainly design for assembly source code which want to enable CET
Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith
Reviewed By: LuoYuanke
Subscribers: cfe-commits, mgorny
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79617
Summary:
Add x86 feature with IBT and/or SHSTK bits to ELF program property if they are enabled. Otherwise, contents in this header file are unused.
This file is mainly design for assembly source code which want to enable CET
Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith
Reviewed By: LuoYuanke
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D79617
Summary:
Move instructions that have recently been implemented in V8 from the
`unimplemented-simd128` target feature to the `simd128` target
feature. The updated instructions match the update at
https://github.com/WebAssembly/simd/pull/223.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79973
Summary:
As proposed in https://github.com/WebAssembly/simd/pull/122. Since
these instructions are not yet merged to the SIMD spec proposal, this
patch makes them entirely opt-in by surfacing them only through LLVM
intrinsics and clang builtins. If these instructions are made
official, these intrinsics and builtins should be replaced with simple
instruction patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79742
Summary:
Although using `__builtin_shufflevector` and the `shufflevector`
instruction works fine, they are not opaque to the optimizer. As a
result, DAGCombine can potentially reduce the number of shuffles and
change the shuffle masks. This is unexpected behavior for users of the
WebAssembly SIMD intrinsics who have crafted their shuffles to
optimize the code generated by engines. This patch solves the problem
by adding a new shuffle intrinsic that is opaque to the optimizers in
line with the decision of the WebAssembly SIMD contributors at
https://github.com/WebAssembly/simd/issues/196#issuecomment-622494748. In
the future we may implement custom DAG combines to properly optimize
shuffles and replace this solution.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D66983
Summary:
As described in https://github.com/WebAssembly/simd/pull/209. This is
the final reorganization of the SIMD opcode space before
standardization. It has been landed in concert with corresponding
changes in other projects in the WebAssembly SIMD ecosystem.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79224
Some intrinsics in vecintrin.h are currently implemented by
performing address arithmetic in __INTPTR_TYPE__ and converting
the result to some pointer type. While this works correctly,
it leads to suboptimal code generation since many optimizers
cannot trace the provenance of the resulting pointers.
Fixed by using "char *" pointer arithmetic instead.
System headers should avoid using the "vector" and "bool" keywords
since those might be redefined by user code. For example, using
<stdbool.h> before <vecintrin.h> will currently lead to compiler
errors.
Fixed by using the reserved "__vector" and "__bool" keywords
instead. NFC otherwise.
If we are in C++ mode and include <math.h> (not <cmath>) first, we still
need to make sure <cmath> is read first. The problem otherwise is that
we haven't seen the declarations of the math.h functions when the system
math.h includes our cmath overlay. However, our cmath overlay, or better
the underlying overlay, e.g. CUDA, uses the math.h functions. Since we
haven't declared them yet we get errors. CUDA avoids this by eagerly
declaring all math functions (in the __device__ space) but we cannot do
this. Instead we break the dependence by forcing cmath to go first.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D77774
For OpenMP target regions to piggy back on the CUDA/AMDGPU/... implementation of math functions,
we include the appropriate definitions inside of an `omp begin/end declare variant match(device={arch(nvptx)})` scope.
This way, the vendor specific math functions will become specialized versions of the system math functions.
When a system math function is called and specialized version is available the selection logic introduced in D75779
instead call the specialized version. In contrast to the code path we used so far, the system header is actually included.
This means functions without specialized versions are available and so are macro definitions.
This should address PR42061, PR42798, and PR42799.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D75788
Warnings in emmintrin.h and xmmintrin.h are reported by
-fsanitize=implicit-integer-sign-change.
Reviewed By: RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D77393
In d1705c1196 (D77238) we accidentally included subsequent changes and
did not only move the code into a new file (which was the intention).
We undo the changes now and re-introduce them with the appropriate test
changes later.
This is not supported to change anything but allow us to reuse the math
functions separately from the device functions, e.g., source them at
different times. This will be used by the OpenMP overlay.
This also adds two `return` keywords that were missing.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D77238
Summary:
The convention for the wasm_simd128.h intrinsics is to have the
integer sign in the lane interpretation rather than as a suffix. This
PR changes the names of the integer min, max, and avgr intrinsics to
match this convention.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77185
Summary:
As the WebAssembly SIMD proposal nears stabilization, there is desire
to use it with toolchains other than Emscripten. Moving the intrinsics
header to clang will make it available to WASI toolchains as well.
Reviewers: aheejin, sunfish
Subscribers: dschuff, mgorny, sbc100, jgravelle-google, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76959
Summary:
- Even though the bindless surface/texture interfaces are promoted,
there are still code using surface/texture references. For example,
[PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the
compilation issue for code using `tex2D` with texture references. For
better compatibility, this patch proposes the support of
surface/texture references.
- Due to the absent documentation and magic headers, it's believed that
`nvcc` does use builtins for texture support. From the limited NVVM
documentation[^nvvm] and NVPTX backend texture/surface related
tests[^test], it's believed that surface/texture references are
supported by replacing their reference types, which are annotated with
`device_builtin_surface_type`/`device_builtin_texture_type`, with the
corresponding handle-like object types, `cudaSurfaceObject_t` or
`cudaTextureObject_t`, in the device-side compilation. On the host
side, that global handle variables are registered and will be
established and updated later when corresponding binding/unbinding
APIs are called[^bind]. Surface/texture references are most like
device global variables but represented in different types on the host
and device sides.
- In this patch, the following changes are proposed to support that
behavior:
+ Refine `device_builtin_surface_type` and
`device_builtin_texture_type` attributes to be applied on `Type`
decl only to check whether a variable is of the surface/texture
reference type.
+ Add hooks in code generation to replace that reference types with
the correponding object types as well as all accesses to them. In
particular, `nvvm.texsurf.handle.internal` should be used to load
object handles from global reference variables[^texsurf] as well as
metadata annotations.
+ Generate host-side registration with proper template argument
parsing.
---
[^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf
[^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
[^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf).
[^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used. But, the current backend doesn't have that supported. We may revise that later.
Reviewers: tra, rjmccall, yaxunl, a.sidorin
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76365
This patch adds 'q' to mean 'scalable vector' in the builtin
type string, and for SVE will return the matching builtin
type as defined in the C/C++ language extensions for SVE.
This patch also adds some scaffolding to generate the arm_sve.h
header file, and some builtin definitions (+CodeGen) to be able
to implement some simple masked load intrinsics that use the
ACLE types, such as:
svint8_t test_svld1_s8(svbool_t pg, const int8_t *base) {
return svld1_s8(pg, base);
}
Reviewers: efriedma, rjmccall, rovka, rsandifo-arm, rengolin
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75298
Summary:
This patch generalizes the existing code to support CDE intrinsics
which will share some properties with existing MVE intrinsics
(some of the intrinsics will be polymorphic and accept/return values
of MVE vector types).
Specifically the patch:
* Adds new tablegen backends -gen-arm-cde-builtin-def,
-gen-arm-cde-builtin-codegen, -gen-arm-cde-builtin-sema,
-gen-arm-cde-builtin-aliases, -gen-arm-cde-builtin-header based on
existing MVE backends.
* Renames the '__clang_arm_mve_alias' attribute into
'__clang_arm_builtin_alias' (it will be used with CDE intrinsics as
well as MVE intrinsics)
* Implements semantic checks for the coprocessor argument of the CDE
intrinsics as well as the existing coprocessor intrinsics.
* Adds one CDE intrinsic __arm_cx1 to test the above changes
Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen
Reviewed By: simon_tatham
Subscribers: sdesmalen, mgorny, kristof.beyls, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75850
arm_acle.h relied on `_MSC_VER` to determine if a given function was
already defined as a builtin. This was incorrect because
`-fms-extensions` enables these builtins, but is not responsible for
defining `_MSC_VER` on any target. The next closest thing is
`_MSC_EXTENSIONS`, which is only defined on Windows targets, but even
this is suboptimal. What this conditional is actually trying to
determine is if the given functions are defined as builtins, so just
check that directly.
I also attempted to do this for `__nop`, but in that case intrin.h,
which is only includable if `_MSC_VER` is defined, has its own
definition. So in that case `_MSC_VER` is correct.
Differential Revision: https://reviews.llvm.org/D75719
rdar://60102353
These declarations use a mix of unsigned and signed argument and
return types. This is not in accordance with OpenCL v2.0 s6.13.11.
Differential Revision: https://reviews.llvm.org/D74910
New intrinisics are implemented for when we need to port SIMD code from other
arhitectures and only load or store portions of MSA registers.
Following intriniscs are added which only load/store element 0 of a vector:
v4i32 __builtin_msa_ldrq_w (const void *, imm_n2048_2044);
v2i64 __builtin_msa_ldr_d (const void *, imm_n4096_4088);
void __builtin_msa_strq_w (v4i32, void *, imm_n2048_2044);
void __builtin_msa_str_d (v2i64, void *, imm_n4096_4088);
Differential Revision: https://reviews.llvm.org/D73644
Summary:
To use new/delete in NVPTX code we need to define them. Implementation
copied from CUDA wrappers.
Reviewers: hfinkel, jdoerfert
Subscribers: mgorny, guansong, kkwli0, caomhin, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73128
This makes clang somewhat forward-compatible with new CUDA releases
without having to patch it for every minor release without adding
any new function.
If an unknown version is found, clang issues a warning (can be disabled
with -Wno-cuda-unknown-version) and assumes that it has detected
the latest known version. CUDA releases are usually supersets
of older ones feature-wise, so it should be sufficient to keep
released clang versions working with minor CUDA updates without
having to upgrade clang, too.
Differential Revision: https://reviews.llvm.org/D73231
Use floating-point instead of integer zero constants to avoid
creating implicit conversions, which currently cause suboptimal
code to be generated with -ffp-exception-behavior=strict.
NFC otherwise.
Enabling `-Wcast-qual` identified many casts in various system headers
that were dropping the `const` qualifier. Fixing those missing
qualifiers pointed out that a few of the definitions of the builtins
did not properly identify their arguments as `const` pointers. This
commit fixes those builtin definitions, and the system header files
so that they no longer drop the qualifier.
Differential Revision: https://reviews.llvm.org/D71718
The forward declaration had a cdecl calling convention, but the
inline version did not. This leads to a conflict if the default
calling convention is not cdecl. Fix this by just removing the
forward declaration.
Fixes PR41503
We need to use a 64-bit type in 64-bit mode so a 64-bit register
will get used in the generated assembly. I've also changed the
constraints to just use "r" intead of "q". "q" forces to a only
an a/b/c/d register in 32-bit mode, but I see no reason that
would matter here.
Fixes Nico's note in PR19301 over 4 years ago.
Differential Revision: https://reviews.llvm.org/D70101
As we currently have it implemented in altivec.h, the offsets for these two
intrinsics are element offsets. The documentation in the ABI (as well as the
implementation in both XL and GCC) states that these should be byte offsets.
Differential revision: https://reviews.llvm.org/D63636
We currently emit a double precision comparison instruction for this, whereas we
need to emit the single precision version.
Differential revision: https://reviews.llvm.org/D64024
Summary:
Writing support for three ACLE functions:
unsigned int __cls(uint32_t x)
unsigned int __clsl(unsigned long x)
unsigned int __clsll(uint64_t x)
CLS stands for "Count number of leading sign bits".
In AArch64, these two intrinsics can be translated into the 'cls'
instruction directly. In AArch32, on the other hand, this functionality
is achieved by implementing it in terms of clz (count number of leading
zeros).
Reviewers: compnerd
Reviewed By: compnerd
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69250
This commit sets up the infrastructure for auto-generating <arm_mve.h>
and doing clang-side code generation for the builtins it relies on,
and demonstrates that it works by implementing a representative sample
of the ACLE intrinsics, more or less matching the ones introduced in
LLVM IR by D67158,D68699,D68700.
Like NEON, that header file will provide a set of vector types like
uint16x8_t and C functions with names like vaddq_u32(). Unlike NEON,
the ACLE spec for <arm_mve.h> includes a polymorphism system, so that
you can write plain vaddq() and disambiguate by the vector types you
pass to it.
Unlike the corresponding NEON code, I've arranged to make every user-
facing ACLE intrinsic into a clang builtin, and implement all the code
generation inside clang. So <arm_mve.h> itself contains nothing but
typedefs and function declarations, with the latter all using the new
`__attribute__((__clang_builtin))` system to arrange that the user-
facing function names correspond to the right internal BuiltinIDs.
So the new MveEmitter tablegen system specifies the full sequence of
IRBuilder operations that each user-facing ACLE intrinsic should
translate into. Where possible, the ACLE intrinsics map to standard IR
operations such as vector-typed `add` and `fadd`; where no standard
representation exists, I call down to the sample IR intrinsics
introduced in an earlier commit.
Doing it like this means that you get the polymorphism for free just
by using __attribute__((overloadable)): the clang overload resolution
decides which function declaration is the relevant one, and _then_ its
BuiltinID is looked up, so by the time we're doing code generation,
that's all been resolved by the standard system. It also means that
you get really nice error messages if the user passes the wrong
combination of types: clang will show the declarations from the header
file and explain why each one doesn't match.
(The obvious alternative approach would be to have wrapper functions
in <arm_mve.h> which pass their arguments to the underlying builtins.
But that doesn't work in the case where one of the arguments has to be
a constant integer: the wrapper function can't pass the constantness
through. So you'd have to do that case using a macro instead, and then
use C11 `_Generic` to handle the polymorphism. Then you have to add
horrible workarounds because `_Generic` requires even the untaken
branches to type-check successfully, and //then// if the user gets the
types wrong, the error message is totally unreadable!)
Reviewers: dmgreen, miyuki, ostannard
Subscribers: mgorny, javed.absar, kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D67161
These intrinsics use llvm.cttz intrinsics so are always available
even without the bmi feature. We already don't check for the bmi
feature on the intrinsics themselves. But we were blocking the
include of the header file with _MSC_VER unless BMI was enabled
on the command line.
Fixes PR30506.
llvm-svn: 374516
Summary:
This is similar to vec_ct* in https://reviews.llvm.org/rL304205.
The argument must be a constant, otherwise instruction selection
will fail. always_inline is not enough for isel to always fold
everything away at -O0.
The fix is to turn the function into macros in altivec.h.
Fixes https://bugs.llvm.org/show_bug.cgi?id=43072
Reviewers: nemanjai, hfinkel, #powerpc, wuzish
Reviewed By: #powerpc, wuzish
Subscribers: wuzish, kbarton, MaskRay, shchenz, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D66699
llvm-svn: 370902
vote.ballot instruction is gone in recent CUDA versions and
vote.sync.ballot can not be used because it needs a thread mask parameter.
Fortunately PTX 6.2 (introduced with CUDA-9.2) provides activemask.b32
instruction for this.
Differential Revision: https://reviews.llvm.org/D66665
llvm-svn: 370792
In C++ mode we should only avoid adding __OPENCL_C_VERSION__,
all other predefined macros about the language mode are still
valid.
This change also fixes the language version check in the
headers accordingly.
Differential Revision: https://reviews.llvm.org/D65941
llvm-svn: 368552
Port existing headers which include x86 intrinsics implementation to
PowerPC platform (using Altivec), along with tests. Also, tests about
including these intrinsic headers are combined.
The headers are mainly developed by Steven Munroe, with contributions
from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D65630
llvm-svn: 368392
Move the platform check out of PPC Linux toolchain code and add platform guards
to the intrinsic headers, since they are supported currently only on 64-bit
PowerPC targets.
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D64849
llvm-svn: 367281
Defining CLK_NULL_EVENT with a `(void*)` cast has the (unintended?)
side-effect that the address space will be fixed (as generic in OpenCL
2.0 mode). The consequence is that any target specific address space
for the clk_event_t type will not be applied.
It is not clear why the void pointer cast was needed in the first
place, and it seems we can do without it.
Differential Revision: https://reviews.llvm.org/D63876
llvm-svn: 366546
Remove dependency of malloc in implementation of mm_malloc function in PowerPC
intrinsics and alignment assumption on glibc.
Reviewed By: Hal Finkel
Differential Revision: https://reviews.llvm.org/D64850
llvm-svn: 366406
The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined.
This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function.
The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used.
I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example).
make check-all didn't show any new failures.
[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
Differential Revision: https://reviews.llvm.org/D64495
llvm-svn: 366197
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining __VEC__ == 10303.
Note: No currently available Z system supports the arch13
architecture. Once new systems become available, the
official system name will be added as supported -march name.
llvm-svn: 365933
We accidentally lost the ATOMIC_VAR_INIT and ATOMIC_FLAG_INIT macros
in r363794.
Also put the `memory_order` typedef back inside a `>= CL2.0` guard.
llvm-svn: 364174
Add overloads with generic address space pointer to old atomics.
This is currently only added for C++ compilation mode.
Differential Revision: https://reviews.llvm.org/D62335
llvm-svn: 364071
These intrinsics should always take an immediate for the rounding mode.
The base instruction comes from before EVEX embdedded rounding. The
user should always provide the immediate rather than us assuming
CUR_DIRECTION.
Make the 512-bit versions also explicit aliases instead of copy
pasting the code.
llvm-svn: 363961
Summary:
AIX system headers need stdint.h and inttypes.h to be re-enterable when macro _STD_TYPES_T is defined so that limit macro definitions such as UINT32_MAX can be found. This patch attempts to allow that on AIX.
Reviewers: hubert.reinterpretcast, jasonliu, mclow.lists, EricWF
Reviewed by: hubert.reinterpretcast, mclow.lists
Subscribers: jfb, jsji, christof, cfe-commits, libcxx-commits, llvm-commits
Tags: #LLVM, #clang, #libc++
Differential Revision: https://reviews.llvm.org/D59253
llvm-svn: 363939
Using the -fdeclare-opencl-builtins option will require a way to
predefine types and macros such as `int4`, `CLK_GLOBAL_MEM_FENCE`,
etc. Move these out of opencl-c.h into opencl-c-base.h such that the
latter can be shared by -fdeclare-opencl-builtins and
-finclude-default-header.
This changes the behaviour of -finclude-default-header when
-fdeclare-opencl-builtins is specified: instead of including the full
header, it will include the header with only the base definitions.
Differential revision: https://reviews.llvm.org/D63256
llvm-svn: 363794
Port emmintrin.h which include Intel SSE2 intrinsics implementation to PowerPC platform (using Altivec).
The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
It's a follow-up patch of D62121.
Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Differential Revision: https://reviews.llvm.org/D62569
llvm-svn: 363122
Summary:
Remove unnecessary definition (otherwise the extension will be defined
where it's not supposed to be defined).
Consider the code:
#pragma OPENCL EXTENSION cl_intel_planar_yuv : begin
// some declarations
#pragma OPENCL EXTENSION cl_intel_planar_yuv : end
is enough for extension to become known for clang.
Patch by: Dmitry Sidorov <dmitry.sidorov@intel.com>
Reviewers: Anastasia, yaxunl
Reviewed By: Anastasia
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58666
llvm-svn: 362398
Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec).
The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D62121
llvm-svn: 362190
Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec).
The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D62121
llvm-svn: 361928
Summary: When including the random header in C++, some of the math functions it relies on are not present in the CUDA headers. We include this variants in this case.
Reviewers: jdoerfert, hfinkel, tra, caomhin
Reviewed By: tra
Subscribers: efriedma, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D62046
llvm-svn: 361066
Previously we were doing this so that the 256 bit selectw builtin could be used in the implementation of the 512->256 bit conversion intrinsic.
After this commit we now use a masked convert builtin that will emit the intrinsic call and the 256-bit select from custom code in CGBuiltin. Then the header only needs to call that one intrinsic.
llvm-svn: 360924
Summary:
This is a fix for the reported bug:
[[ https://bugs.llvm.org/show_bug.cgi?id=41861 | 41861 ]]
abs functions need to be moved under the c++ macro to avoid conflicts with included headers.
Reviewers: tra, jdoerfert, hfinkel, ABataev, caomhin
Reviewed By: jdoerfert
Subscribers: guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61959
llvm-svn: 360809
Summary: In OpenMP device offloading we must ensure that unde C++ 17, the inclusion of cstdlib will works correctly.
Reviewers: ABataev, tra, jdoerfert, hfinkel, caomhin
Reviewed By: jdoerfert
Subscribers: Hahnfeld, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61949
llvm-svn: 360804
Currently `immintrin.h` includes `pconfigintrin.h` and `sgxintrin.h`
which contain inline assembly. It causes failures when building with the
flag `-fno-gnu-inline-asm`.
Fix by excluding functions with inline assembly when this extension is
disabled. So far there was no need to support `_pconfig_u32`,
`_enclu_u32`, `_encls_u32`, `_enclv_u32` on platforms that require
`-fno-gnu-inline-asm`. But if developers start using these functions,
they'll have compile-time undeclared identifier errors which is
preferrable to runtime errors.
rdar://problem/49540880
Reviewers: craig.topper, GBuella, rnk, echristo
Reviewed By: rnk
Subscribers: jkorous, dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D61621
llvm-svn: 360630
Summary: This patches fixes an issue in which the __clang_cuda_cmath.h header is being included even when cmath or math.h headers are not included.
Reviewers: jdoerfert, ABataev, hfinkel, caomhin, tra
Reviewed By: tra
Subscribers: tra, mgorny, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61765
llvm-svn: 360626
This reverts r360271 (git commit a0933bd8ec)
There are concerns on the review that this breaks EFI builds and that
the transitive includes (sal.h) are actually heavy enough that we might
care.
llvm-svn: 360291
compatibility. This allows some applications developed with MSVC to
compile with clang without any extra changes.
Fixes: llvm.org/PR40789
Differential Revision: https://reviews.llvm.org/D61646
llvm-svn: 360271
Summary:
In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang.
We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism.
Authors:
@gtbercea
@jdoerfert
Reviewers: hfinkel, caomhin, ABataev, tra
Reviewed By: hfinkel, ABataev, tra
Subscribers: JDevlieghere, mgorny, guansong, cfe-commits, jdoerfert
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61399
llvm-svn: 360265
This commit appears to be breaking stage-2 builds on GreenDragon. The
OpenMP wrappers for cmath and math.h are copied into the root of the
resource directory and cause a cyclic dependency in module 'Darwin':
Darwin -> std -> Darwin. This blows up when CMake is testing for modules
support and breaks all stage 2 module builds, including the ThinLTO bot
and all LLDB bots.
CMake Error at cmake/modules/HandleLLVMOptions.cmake:497 (message):
LLVM_ENABLE_MODULES is not supported by this compiler
llvm-svn: 360192
Summary:
In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang.
We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism.
Authors:
@gtbercea
@jdoerfert
Reviewers: hfinkel, caomhin, ABataev, tra
Reviewed By: hfinkel, ABataev, tra
Subscribers: mgorny, guansong, cfe-commits, jdoerfert
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61399
llvm-svn: 360063
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable intrinsics for VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
For more details about BF16 intrinsic, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference
Patch by LiuTianle
Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: mgorny, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60552
llvm-svn: 360018
Summary:
This is a follow up to r355253 and a better fix than the first attempt
which was r359257.
We can't install anything from ${CMAKE_CFG_INTDIR}, because this value
is only defined at build time, but we still must make sure to copy the
headers into ${CMAKE_CFG_INTDIR}/lib/clang/$VERSION/include, because the lit
tests look for headers there. So for this fix we revert to the
old behavior of copying the headers to ${CMAKE_CFG_INTDIR}/lib/clang/$VERSION/include
during the build and then installing them from the source tree.
Reviewers: smeenai, vzakhari, phosek
Reviewed By: smeenai, vzakhari
Subscribers: mgorny, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61220
llvm-svn: 359654
This provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
These intrinsics are available when __ARM_FEATURE_MEMORY_TAGGING is defined.
Each intrinsic is described in detail in the ACLE Q1 2019 documentation:
https://developer.arm.com/docs/101028/latest
Reviewed By: Tim Nortover, David Spickett
Differential Revision: https://reviews.llvm.org/D60485
llvm-svn: 359348
Summary:
This is a follow up to r355253, which inadvertently broke Visual
Studio builds by trying to copy files from CMAKE_CFG_INTDIR.
See https://reviews.llvm.org/D58537#inline-532492
Reviewers: smeenai, vzakhari, phosek
Reviewed By: smeenai
Subscribers: mgorny, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61054
llvm-svn: 359257
Port mmintrin.h which include x86 MMX intrinsics implementation to PowerPC platform (using Altivec).
To make the include process correct, PowerPC's toolchain class is overrided to insert new headers directory (named ppc_wrappers) into the path. Basic test cases for several intrinsic functions are added.
The header is mainly developed by Steven Munroe, with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D59924
llvm-svn: 358949
The pattern we replaced these with may be too hard to match as demonstrated by
PR41496 and PR41316.
This patch restores the intrinsics and then we can start focusing
on the optimizing the intrinsics.
I've mostly reverted the original patch that removed them. Though I modified
the avx512 intrinsics to not have masking built in.
Differential Revision: https://reviews.llvm.org/D60674
llvm-svn: 358427
Summary:
These all had somewhat custom file headers with different text from the
ones I searched for previously, and so I missed them. Thanks to Hal and
Kristina and others who prompted me to fix this, and sorry it took so
long.
Reviewers: hfinkel
Subscribers: mcrosier, javed.absar, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60406
llvm-svn: 357941
[IMPORTANT]
With that last fix, CUDA has just started being compiling by clang on Windows after nearly a year and two clang’s major releases (7 and 8).
As long as the last LLVM release, in which clang was compiling CUDA on Windows successfully, was 6.0.1, this fix and two previous have to be included into upcoming 7.1.0 and 8.0.1 releases.
[How to repro]
clang++.exe -x cuda "c:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\0_Simple\simplePrintf\simplePrintf.cu" -I"c:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\common\inc" --cuda-gpu-arch=sm_50 --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0" -L"c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64" -lcudart.lib -v
[Output]
In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:390:11: error: no matching function for call to '__isinfl'
return (__isinfl(a) != 0);
^~~~~~~~
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2662:14: note: candidate function not viable: call to __host__ function from __device__ function
__func__(int __isinfl(long double a))
^
In file included from <built-in>:1:
In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:438:11: error: no matching function for call to '__isnanl'
return (__isnanl(a) != 0);
^~~~~~~~
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2672:14: note: candidate function not viable: call to __host__ function from __device__ function
__func__(int __isnanl(long double a))
^
In file included from <built-in>:1:
In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:486:11: error: no matching function for call to '__finitel'
return (__finitel(a) != 0);
^~~~~~~~~
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2652:14: note: candidate function not viable: call to __host__ function from __device__ function
__func__(int __finitel(long double a))
^
3 errors generated when compiling for sm_50.
[Solution]
Add missing long double device functions' declarations. Provide only declarations to prevent any use of long double on the device side, because CUDA does not support long double on the device side.
[Testing]
{Windows 10, Ubuntu 16.04.5}/{Visual C++ 2017 15.9.9, gcc+ 5.4.0}/CUDA {8.0, 9.0, 9.1, 9.2, 10.0, 10.1}
Reviewed by: Artem Belevich
Differential Revision: http://reviews.llvm.org/D60220
llvm-svn: 357779