This change provides a way to conveniently declare types that have
address space qualifiers removed.
Since OpenCL adds address spaces implicitly even when they are not
specified in source, it is useful to allow deriving address space
unqualified types.
Fixes llvm.org/PR45326
Differential Revision: https://reviews.llvm.org/D106785
Asm is a gnu extension for C, so at present -fopenmp -std=c99
and similar fail to compile on nvptx, bug 51344
Changing to `__asm__` or `__asm` works for openmp, all three appear to work
for cuda. Suggesting `__asm__` here as `__asm` is used by MSVC with different
syntax, so this should make for better error diagnostics if the header is
passed to a compiler other than clang.
Reviewed By: tra, emankov
Differential Revision: https://reviews.llvm.org/D107492
With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.
Reviewed By: JonChesterfield, jdoerfert, scchan
Differential Revision: https://reviews.llvm.org/D104904
For some reason, Microsoft declares _m_prefetch to take a const void*,
but _m_prefetchw to take a /volatile/ const void*.
Do the same for compatibility.
Differential revision: https://reviews.llvm.org/D106790
The builtins vec_xl_len_r and vec_xst_len_r actually use the
wrong side of the vector on big endian Power9 systems. We never
spotted this before because there was no such thing as a big
endian distro that supported Power9. Now we have AIX and the
elements are in the wrong part of the vector. This just fixes
it so the elements are loaded to and stored from the right
side of the vector.
With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.
Reviewed By: JonChesterfield, jdoerfert, scchan
Differential Revision: https://reviews.llvm.org/D104904
CL 2.0 introduced atomics and generic address space so there were
only one set of APIs for doing atomics, however since CL 3.0
makes generic address space optional, there has to be new sets
of atomic interfaces to handle that cases.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D106778
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions
with normal codegen patterns.
Differential Revision: https://reviews.llvm.org/D106724
Redefines NULL as nullptr instead of ((void*)0)
in C++ for OpenCL.
Such internal representation of NULL provides
compatibility with C++11 and later language
standards.
Patch by Topotuna (Justas Janickas)!
Differential Revision: https://reviews.llvm.org/D105987
XL provides functions __vec_ldrmb/__vec_strmb for loading/storing a
sequence of 1 to 16 bytes in big endian order, right justified in the
vector register (regardless of target endianness).
This is equivalent to vec_xl_len_r/vec_xst_len_r which are only
available on Power9.
This patch simply uses the Power9 functions when compiled for Power9,
but provides a more general implementation for Power8.
Differential revision: https://reviews.llvm.org/D106757
This patch changes the index argument of lvxl?/lve[bhw]x and
stvxl?/stve[bhw]x builtins from int to long. Because on 64-bit
subtargets, an extra extsw will always been generated, which is
incorrect.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D106530
This patch adds support for the next-generation arch14
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Detection of arch14 as host processor.
- Assembler/disassembler support for new instructions.
- New LLVM intrinsics for certain new instructions.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining __VEC__ == 10304.
Note: No currently available Z system supports the arch14
architecture. Once new systems become available, the
official system name will be added as supported -march name.
This adds the optional wrappers around things, however this isn't sufficient yet for CL 3.0 without generic address space, I've got one more additional patch to add all those APIs, but this is an easier to review precursor.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D106111
Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax}
with standard codegen patterns. Since wasm_simd128.h uses an integer vector as
the standard single vector type, the IR for the pmin and pmax intrinsic
functions contains bitcasts that would not be there otherwise. Add extra codegen
patterns that can still select the pmin and pmax instructions in the presence of
these bitcasts.
Differential Revision: https://reviews.llvm.org/D106612
NULL was undefined in OpenCL prior to version 2.0. However, the
language specification states that "macro names defined by the C99
specification but not currently supported by OpenCL are reserved
for future use". Therefore, application developers cannot redefine
NULL.
The change is supposed to resolve inconsistency between language
versions. Currently there is no apparent reason why NULL should
be kept undefined.
Patch by Topotuna (Justas Janickas)!
Differential Revision: https://reviews.llvm.org/D105988
Add the builtins defined by Section 42 "Integer dot product" in
the OpenCL Extension Specification.
Differential Revision: https://reviews.llvm.org/D106434
Allow standard header versions of malloc and free to be defined
before introducing the device versions.
Fixes: SWDEV-295901
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D106463
These builtins were added to capture the fact that the underlying Wasm
instructions return i32s and implicitly sign or zero extend the extracted lanes
in the case of the i8x16 and i16x8 variants. But we do sufficient optimizations
during code gen that these low-level details do not need to be exposed to users.
This commit replaces the use of the builtins in wasm_simd128.h with normal
target-independent vector code. As a result, we can switch the relevant
intrinsics to use functions rather than macros and can use more user-friendly
return types rather than trying to precisely expose the underlying Wasm types.
Note, however, that the generated LLVM IR is no different after this change.
Differential Revision: https://reviews.llvm.org/D106500
Remove the workaround for -fopenmp in __clang_hip_runtime_wrapper.h
since it causes device functions in HIP wrapper headers disabled when
compiling HIP program with -fopenmp.
Reviewed by: Aaron Enye Shi, Jon Chesterfield
Differential Revision: https://reviews.llvm.org/D106070
With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.
Reviewed By: JonChesterfield, jdoerfert, scchan
Differential Revision: https://reviews.llvm.org/D104904
Add the builtins defined by Section 40 "Extended Bit Operations" in
the OpenCL Extension Specification.
Differential Revision: https://reviews.llvm.org/D106267
The `__CUDA__` macro is already defined for openmp/nvptx and is not used by
`__clang_cuda_complex_builtins.h`, so dropping that macro slightly simplifies
nvptx and avoids defining it on amdgcn (where it is likely to be harmful).
Also dropped a cplusplus test from a C++ header as compilation will have
failed on cmath earlier if it was included from C.
Reviewed By: jdoerfert, fodinabor
Differential Revision: https://reviews.llvm.org/D105221
On Power PC some legacy compilers included a number of builtins in a
builtins.h header file. While this header file is not required to hold
builtins for clang some legacy code does try to include this file and so
this patch provides an empty version of that file.
Differential Revision: https://reviews.llvm.org/D106065
This just reorders the atomics, it doesn't change anything except their layout in the header.
This is a prep patch for adding some conditionals around these for CL3.0 but that patch is much easier to review if all the atomic operations are grouped together like this.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D105601
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50435.
Differential Revision: https://reviews.llvm.org/D106019
Replace the experimental clang builtin and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50433.
Differential Revision: https://reviews.llvm.org/D105950
Replace the clang builtin function and LLVM intrinsic for
f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing
combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern.
Differential Revision: https://reviews.llvm.org/D105755
We are currently being inconsistent in using signed vs unsigned comparisons for
vec_all_* and vec_any_* interfaces that use vector bool types. For example we
use signed comparison for vec_all_ge(vector signed char, vector bool char) but
unsigned comparison for when the arguments are swapped. GCC and XL use signed
comparison instead. This patch makes clang consistent with itself and with XL
and GCC.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D105666
The function is supposed to be the equivalent of rint() (as in
round to nearest, ties to even) rather than round() (round to
nearest, ties away from zero). In fact, the instruction we emit
without VSX is vrfin which is correct. However, with VSX we emit
xvrspi which is the equivalent of round() and therefore incorrect.
Since there is no equivalent VSX instruction, simply use vrfin
regardless of availability of VSX.
A number of functions in the header have guards for 64-bit only
that were presumably added as some of the functions in the blocks
use vector __int128 which is only available in 64-bit mode.
A more appropriate guard (__SIZEOF_INT128__) has been added for
those functions since, making the 64-bit guards redundant.
This patch removes those guards as they inadvertently guard code
that uses vector long long which does not actually require 64-bit
mode.
Replace the clang builtin function and LLVM intrinsic previously used to select
the f64x2.promote_low_f32x4 instruction with custom combines from standard
SelectionDAG nodes. Implement the new combines to share code with the similar
combines for f64x2.convert_low_i32x4_{s,u}. Resolves PR50232.
Differential Revision: https://reviews.llvm.org/D105675
Set the device malloc and free functions as weak,
and move the std headers after device malloc/free
to avoid issues with std malloc/free.
Fixes: SWDEV-293590
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D105707
This fixes issues with various return types(bool/int) and was already
in place for nvptx headers, adjusted to work for amdgcn. This does
not affect hip as the change is guarded with OPENMP_AMDGCN.
Similar to D85879.
Reviewed By: jdoerfert, JonChesterfield, yaxunl
Differential Revision: https://reviews.llvm.org/D104677
Add runtime functions to detect invalid calls to pure or deleted virtual
functions.
Patch by: Siu Chi Chan
Reviewed by: Yaxun Liu
Differential Revision: https://reviews.llvm.org/D104392
Add the `memory_scope_all_devices` enum value, which is restricted to
OpenCL 3.0 or newer and the `__opencl_c_atomic_scope_all_devices`
feature. Also guard `memory_scope_all_svm_devices` accordingly, which
is already available in OpenCL 2.0.
The `__opencl_c_atomic_scope_all_devices` feature is header-only, so
set its define to 1 in `opencl-c-base.h`. This is done
unconditionally at the moment, as the mechanism for disabling
header-only options hasn't been decided yet.
This patch only adds a negative test for now. Ideally adding a CL3.0
run line to atomic-ops.cl should suffice as a positive test, but we
cannot do that yet until (at least) generic address spaces and program
scope variables are supported in OpenCL 3.0 mode.
Differential Revision: https://reviews.llvm.org/D103241