Move any remaining preprocessor defines from `opencl-c.h` to
`opencl-c-base.h`, such that they are shared with
`-fdeclare-opencl-builtins` too.
In particular, move:
- the `as_type` and `as_typen` definitions, and
- the `kernel_exec` and `__kernel_exec` definitions.
Also clang-format the changes.
Differential Revision: https://reviews.llvm.org/D96948
Supporting `printf` with `-fdeclare-opencl-builtins` would require
special handling (for e.g. varargs and format attributes) for just
this one function. Instead, move the `printf` declaration to the
shared base header.
Differential Revision: https://reviews.llvm.org/D96789
This is a follow up of D92940.
We have successfully converted fadd/fmul _mm_reduce_* intrinsics to
llvm.reduction + reassoc flag. We can do the same approach for fmin/fmax
too, i.e. llvm.reduction + nnan flag.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93179
vec_xl() and vec_xst() should not emit alignment hints since they take a
scalar pointer and also add a byte offset if passed.
This patch uses memcpy to achieve the desired result.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D96471
Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D96231
This patch adds possibility to define OpenCL C 3.0 feature macros
via command line option or target setting.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D95776
This introduces the ARMv8.7-A LS64 extension's intrinsics for 64 bytes
atomic loads and stores: `__arm_ld64b`, `__arm_st64b`, `__arm_st64bv`,
and `__arm_st64bv0`. These are selected into the LS64 instructions
LD64B, ST64B, ST64BV and ST64BV0, respectively.
Based on patches written by Simon Tatham.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D93232
- MSVC has different `<complex>` implementation which calls into functions
declared in `<ymath.h>`. Provide their device-side implementation to enable
`<complex>` compilation on HIP Windows.
Differential Revision: https://reviews.llvm.org/D93638
Followup to D87604, having confirmed on PR47506 that we can use the llvm codegen expansion for fadd/fmul as well.
Differential Revision: https://reviews.llvm.org/D92940
Extended subgroups are library style extensions and therefore
they require no changes in the frontend. This commit:
1. Moves extension macro definitions to the internal headers.
2. Removes extension pragmas because they are not needed.
Tags: #clang
Differential Revision: https://reviews.llvm.org/D92231
This patch implements amx programming model that discussed in llvm-dev
(http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html).
Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet.
This patch implemeted 7 components.
1. The c interface to end user.
2. The AMX intrinsics in LLVM IR.
3. Transform load/store <256 x i32> to AMX intrinsics or split the
type into two <128 x i32>.
4. The Lowering from AMX intrinsics to AMX pseudo instruction.
5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx
intruction.
6. The register allocation for tile register.
7. Morph AMX pseudo instruction to AMX real instruction.
Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0
Differential Revision: https://reviews.llvm.org/D87981
Previous patch (9a465057a6) did not fix the problem.
https://bugs.llvm.org/show_bug.cgi?id=48228
If the <new> is included too early, before CUDA-specific defines are available,
just include-next the standard <new> and undo the include guard. CUDA-specific
variants of operator new/delete will be declared if/when <new> is used from the
CUDA source itself, when all CUDA-related macros are available.
Differential Revision: https://reviews.llvm.org/D91807
Since there is no ROCm Device Library support for
long double, demote them to double, and use the fp64
math functions.
Differential Revision: https://reviews.llvm.org/D92130
Standard libc++ headers in stdc++ mode include <new> which picks up
cuda_wrappers/new before any of the CUDA macros have been defined.
We can not include CUDA headers that early, so the work-around is to define
__device__ in the wrapper header itself.
Differential Revision: https://reviews.llvm.org/D91807
opencl-c.h disables all extensions at its end, but opencl-c-base.h
does not, and that causes any inclusion of only opencl-c-base.h to
leave some extensions (such as cl_khr_fp16) enabled. This affects the
-fdeclare-opencl-builtins option for example.
This violates the OpenCL Extension Specification which specifies that
"The initial state of the compiler is as if the directive #pragma
OPENCL EXTENSION all : disable was issued".
Fix by disabling all extensions at the end of opencl-c-base.h and
enable extensions inside opencl.h which relied on opencl-c-base.h
enabling the cl_khr_fp16/64 extensions.
Differential Revision: https://reviews.llvm.org/D91429
This header has long lacked a standard multiple inclusion guard
like other headers have, for no apparent reason. The GCC header
of the same name likewise lacks one up through release 10.1, but
trunk GCC (release 11, and perhaps future 10.x) has fixed it
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96238).
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D91226
This patch adds three intrinsics compatible to x86's SSE 4.1 on PowerPC
target, with tests:
- _mm_insert_epi8
- _mm_insert_epi32
- _mm_insert_epi64
The intrinsics implementation is contributed by Paul Clarke.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D89242
Tremont microarchitecture only has GFNI(SSE) version, not AVX and
AVX512 version. This patch is to avoid compiling fail on Windows when
using -march=tremont to invoke one of GFNI(SSE) intrinsic.
Differential Revision: https://reviews.llvm.org/D90822
Similar to libcxx implementation of cmath function
overloads, use type promotion templates to determine
return types of multi-argument math functions.
Fixes: SWDEV-256825
Reviewed By: tra, yaxunl
Differential Revision: https://reviews.llvm.org/D90409
This patch mainly made the following changes:
1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.
Differential Revision: https://reviews.llvm.org/D89105
This is very similar to 7f1e6fcff9, just fixing a left-over.
With this, it should be possible to use both, -x cuda and -fopenmp in the same invocation,
enabling to use both OpenMP, targeting CPU, and CUDA, targeting the GPU.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90415
CUDA buildbots complained about a redefinition when I landed D89971.
This is odd and I fail to understand where in the CUDA headers the other
definition is supposed to be. For now, given that CUDA doesn't need the
overload (AFAIKT), we simply restrict it to the OpenMP mode.
Reported by Colleen Bertoni <bertoni@anl.gov> after running the OvO test
suite: https://github.com/TApplencourt/OvO/
The template overload is still hidden behind an ifdef for OpenMP. In the
future we probably want to remove the ifdef but that requires further
testing.
Reviewed By: JonChesterfield, tra
Differential Revision: https://reviews.llvm.org/D89971
The __ocml_*_rte_f32 and __ocml_*_rte_f64 functions are not
available if OCML_BASIC_ROUNDED_OPERATIONS is not defined.
Reviewed By: b-sumner, yaxunl
Fixes: SWDEV-257235
Differential Revision: https://reviews.llvm.org/D89966
Emit the equivalent integer reduction intrinsics in IR instead of expanding to shuffle+arithmetic sequences.
The fadd/fmul reductions might be trickier as they assume a similar bisection reduction while the generic intrinsics assume a sequential reduction (intel docs are ambiguous on the correct approach) - I'm not sure if we want to always tag them with reassoc? Anyway, that issue can wait until a separate fp patch along with the fmin/fmax reductions.
Differential Revision: https://reviews.llvm.org/D87604
Separate __clang_hip_math.h header into __clang_hip_cmath.h
and __clang_hip_math.h. Improve the math function definition,
and add missing definitions or declarations. Add missing
overloads.
Reviewed By: tra, JonChesterfield
Differential Review: https://reviews.llvm.org/D88837
We were taking multiple pointer arguments in the builtin.
gcc accepts a single void*.
The cast from void* to _m128i* caused the IR generation to assume
the pointer was aligned.
Instead make the builtin take a single void*, emit i8* GEPs to
adjust then cast to <2 x i64>* and perform a store with align of 1.
Summary: This patch implements the builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp.
The instructions correspond to the following builtins:
int vec_test_swdiv(vector double v1, vector double v2);
int vec_test_swdivs(vector float v1, vector float v2);
int vec_test_swsqrt(vector double v1);
int vec_test_swsqrts(vector float v1);
This patch depends on D88274, which fixes the bug in copying from CRRC to GPRC/G8RC.
Reviewed By: steven.zhang, amyk
Differential Revision: https://reviews.llvm.org/D88278