llvm-project

Commit Graph

Author	SHA1	Message	Date
Sven van Haastregt	612d0ef173	[OpenCL] Move remaining defines to opencl-c-base.h Move any remaining preprocessor defines from `opencl-c.h` to `opencl-c-base.h`, such that they are shared with `-fdeclare-opencl-builtins` too. In particular, move: - the `as_type` and `as_typen` definitions, and - the `kernel_exec` and `__kernel_exec` definitions. Also clang-format the changes. Differential Revision: https://reviews.llvm.org/D96948	2021-02-23 10:18:14 +00:00
Liu, Chen3	f8b9035aae	[X86] Support amx-int8 intrinsic. Adding support for intrinsics of TDPBSUD/TDPBUSD/TDPBUUD. Differential Revision: https://reviews.llvm.org/D97259	2021-02-23 17:08:05 +08:00
Sven van Haastregt	5a4a01460f	[OpenCL] Move printf declaration to opencl-c-base.h Supporting `printf` with `-fdeclare-opencl-builtins` would require special handling (for e.g. varargs and format attributes) for just this one function. Instead, move the `printf` declaration to the shared base header. Differential Revision: https://reviews.llvm.org/D96789	2021-02-18 11:27:19 +00:00
Wang, Pengfei	61da20575d	[X86] Convert fmin/fmax _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506) This is a follow up of D92940. We have successfully converted fadd/fmul _mm_reduce_* intrinsics to llvm.reduction + reassoc flag. We can do the same approach for fmin/fmax too, i.e. llvm.reduction + nnan flag. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93179	2021-02-15 08:52:06 +08:00
Jonas Paulsson	b3ac5b84cd	[SystemZ] Fix vecintrin.h to not emit alignment hints in vec_xl/vec_xst. vec_xl() and vec_xst() should not emit alignment hints since they take a scalar pointer and also add a byte offset if passed. This patch uses memcpy to achieve the desired result. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D96471	2021-02-12 18:26:36 -06:00
Wang, Pengfei	dd2460ed5d	[X86] Always assign reassoc flag for intrinsics reduce_add/mul_ps/pd. Intrinsics reduce_add/mul_ps/pd have assumption that the elements in the vector are reassociable. So we need to always assign the reassoc flag when we call _mm_reduce_* intrinsics. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D96231	2021-02-09 21:14:06 +08:00
Anton Zabaznov	d88c55ab95	[OpenCL] Add macro definitions of OpenCL C 3.0 features This patch adds possibility to define OpenCL C 3.0 feature macros via command line option or target setting. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D95776	2021-02-05 18:42:25 +03:00
Yaxun (Sam) Liu	0211877a07	[HIP] Add __managed__ macro to header	2021-02-04 16:22:42 -05:00
Wolfgang Pieb	231a82a150	[X86] Correct some cross references in avxintrin.h.	2021-01-25 18:49:28 -08:00
Wolfgang Pieb	350395d82f	[x86] Fix trivial typo in emmintrin.h	2021-01-25 17:28:05 -08:00
Michael Liao	7b5d7c7b0a	[hip] Fix `<complex>` compilation on Windows with VS2019. Differential Revision: https://reviews.llvm.org/D95075	2021-01-20 16:43:44 -05:00
Luo, Yuanke	7e1d2224b4	[X86][AMX] Fix the typo. The dpbsud should be dpbssd. Differential Revision: https://reviews.llvm.org/D94943	2021-01-19 16:57:34 +08:00
Aaron En Ye Shi	be40c12040	[HIP] Add signbit(long double) decl An _MSC_VER version of signbit(long double) is required for MSVC headers. Fixes: SWDEV-256409 Differential Revision: https://reviews.llvm.org/D93062	2021-01-14 18:23:37 +00:00
Lucas Prates	2b1e25befe	[AArch64] Adding ACLE intrinsics for the LS64 extension This introduces the ARMv8.7-A LS64 extension's intrinsics for 64 bytes atomic loads and stores: `__arm_ld64b`, `__arm_st64b`, `__arm_st64bv`, and `__arm_st64bv0`. These are selected into the LS64 instructions LD64B, ST64B, ST64BV and ST64BV0, respectively. Based on patches written by Simon Tatham. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D93232	2021-01-14 09:43:58 +00:00
Esme-Yi	ffa67873a3	[PowerPC] Add variants of 64-bit vector types for vec_sel. Summary: This patch added variants of vec_sel and fixed bugzilla 46770. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D94162	2021-01-11 03:52:16 +00:00
Michael Liao	f78d6af731	[hip] Enable HIP compilation with `<complex`> on MSVC. - MSVC has different `<complex>` implementation which calls into functions declared in `<ymath.h>`. Provide their device-side implementation to enable `<complex>` compilation on HIP Windows. Differential Revision: https://reviews.llvm.org/D93638	2021-01-07 17:41:28 -05:00
Luo, Yuanke	08665b1805	Support tilezero intrinsic and c interface for AMX. Differential Revision: https://reviews.llvm.org/D92837	2020-12-31 13:24:57 +08:00
Michael Liao	bb8d20d9f3	[cuda][hip] Fix typoes in header wrappers.	2020-12-21 13:02:47 -05:00
Simon Pilgrim	4855a1004d	[X86] Convert fadd/fmul _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506) Followup to D87604, having confirmed on PR47506 that we can use the llvm codegen expansion for fadd/fmul as well. Differential Revision: https://reviews.llvm.org/D92940	2020-12-13 15:37:35 +00:00
Anastasia Stulova	a84599f177	[OpenCL] Implement extended subgroups fully in headers. Extended subgroups are library style extensions and therefore they require no changes in the frontend. This commit: 1. Moves extension macro definitions to the internal headers. 2. Removes extension pragmas because they are not needed. Tags: #clang Differential Revision: https://reviews.llvm.org/D92231	2020-12-10 16:40:15 +00:00
Luo, Yuanke	f80b29878b	[X86] AMX programming model. This patch implements amx programming model that discussed in llvm-dev (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html). Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet. This patch implemeted 7 components. 1. The c interface to end user. 2. The AMX intrinsics in LLVM IR. 3. Transform load/store <256 x i32> to AMX intrinsics or split the type into two <128 x i32>. 4. The Lowering from AMX intrinsics to AMX pseudo instruction. 5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx intruction. 6. The register allocation for tile register. 7. Morph AMX pseudo instruction to AMX real instruction. Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0 Differential Revision: https://reviews.llvm.org/D87981	2020-12-10 17:01:54 +08:00
Masoud Ataei	fc750f609d	[PPC] Fixing a typo in altivec.h. Commenting out an unnecessary macro	2020-12-08 19:21:02 +00:00
Artem Belevich	4326792942	[CUDA] Another attempt to fix early inclusion of <new> from libstdc++ Previous patch (`9a465057a6`) did not fix the problem. https://bugs.llvm.org/show_bug.cgi?id=48228 If the <new> is included too early, before CUDA-specific defines are available, just include-next the standard <new> and undo the include guard. CUDA-specific variants of operator new/delete will be declared if/when <new> is used from the CUDA source itself, when all CUDA-related macros are available. Differential Revision: https://reviews.llvm.org/D91807	2020-12-04 12:03:35 -08:00
Martin Storsjö	c17fdca188	[clang] [Headers] Use the corresponding _aligned_free or __mingw_aligned_free in _mm_free Differential Revision: https://reviews.llvm.org/D92570	2020-12-04 11:34:12 +02:00
Aaron En Ye Shi	ba2612ce01	[HIP] cmath demote long double args to double Since there is no ROCm Device Library support for long double, demote them to double, and use the fp64 math functions. Differential Revision: https://reviews.llvm.org/D92130	2020-12-03 23:00:14 +00:00
Reid Kleckner	1e843a987d	[MS] Add more 128bit cmpxchg intrinsics for AArch64 The MSVC STL for requires this on ARM64. Requested in https://llvm.org/pr47099 Depends on D92061 Differential Revision: https://reviews.llvm.org/D92062	2020-11-25 12:07:28 -08:00
Artem Belevich	9a465057a6	[CUDA] Unbreak CUDA compilation with -std=c++20 Standard libc++ headers in stdc++ mode include <new> which picks up cuda_wrappers/new before any of the CUDA macros have been defined. We can not include CUDA headers that early, so the work-around is to define __device__ in the wrapper header itself. Differential Revision: https://reviews.llvm.org/D91807	2020-11-19 10:35:47 -08:00
Sven van Haastregt	f0c690018a	[OpenCL] Stop opencl-c-base.h leaking extension enabling opencl-c.h disables all extensions at its end, but opencl-c-base.h does not, and that causes any inclusion of only opencl-c-base.h to leave some extensions (such as cl_khr_fp16) enabled. This affects the -fdeclare-opencl-builtins option for example. This violates the OpenCL Extension Specification which specifies that "The initial state of the compiler is as if the directive #pragma OPENCL EXTENSION all : disable was issued". Fix by disabling all extensions at the end of opencl-c-base.h and enable extensions inside opencl.h which relied on opencl-c-base.h enabling the cl_khr_fp16/64 extensions. Differential Revision: https://reviews.llvm.org/D91429	2020-11-17 12:07:40 +00:00
Roland McGrath	cf36142d34	[clang] Add missing header guard in <cpuid.h> This header has long lacked a standard multiple inclusion guard like other headers have, for no apparent reason. The GCC header of the same name likewise lacks one up through release 10.1, but trunk GCC (release 11, and perhaps future 10.x) has fixed it (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96238). Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D91226	2020-11-10 19:34:25 -08:00
Qiu Chaofan	979a4d268a	[PowerPC] [Clang] Port SSE4.1-compatible insert intrinsics This patch adds three intrinsics compatible to x86's SSE 4.1 on PowerPC target, with tests: - _mm_insert_epi8 - _mm_insert_epi32 - _mm_insert_epi64 The intrinsics implementation is contributed by Paul Clarke. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D89242	2020-11-10 10:52:13 +08:00
Freddy Ye	5e312e0041	[X86] use macros to split GFNI intrinsics into different kinds Tremont microarchitecture only has GFNI(SSE) version, not AVX and AVX512 version. This patch is to avoid compiling fail on Windows when using -march=tremont to invoke one of GFNI(SSE) intrinsic. Differential Revision: https://reviews.llvm.org/D90822	2020-11-06 16:03:38 +08:00
Albion Fung	1af037f643	[PowerPC] Correct cpsgn's behaviour on PowerPC to match that of the ABI This patch fixes the reversed behaviour exhibited by cpsgn on PPC. It now matches the ABI. Differential Revision: https://reviews.llvm.org/D84962	2020-11-05 15:35:14 -05:00
Aaron En Ye Shi	ca5b31502c	[HIP] Math Headers to use type promotion Similar to libcxx implementation of cmath function overloads, use type promotion templates to determine return types of multi-argument math functions. Fixes: SWDEV-256825 Reviewed By: tra, yaxunl Differential Revision: https://reviews.llvm.org/D90409	2020-11-03 18:40:26 +00:00
Liu, Chen3	756f597841	[X86] Support Intel avxvnni This patch mainly made the following changes: 1. Support AVX-VNNI instructions; 2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix. Differential Revision: https://reviews.llvm.org/D89105	2020-10-31 12:39:51 +08:00
Joachim Meyer	eaee608448	[OpenMP] Use __OPENMP_NVPTX__ instead of _OPENMP in complex wrapper headers. This is very similar to `7f1e6fcff9`, just fixing a left-over. With this, it should be possible to use both, -x cuda and -fopenmp in the same invocation, enabling to use both OpenMP, targeting CPU, and CUDA, targeting the GPU. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90415	2020-10-29 23:24:49 +01:00
Johannes Doerfert	17c8251bca	[OpenMP][CUDA][FIX] Use the new `remquo` overload only for OpenMP CUDA buildbots complained about a redefinition when I landed D89971. This is odd and I fail to understand where in the CUDA headers the other definition is supposed to be. For now, given that CUDA doesn't need the overload (AFAIKT), we simply restrict it to the OpenMP mode.	2020-10-27 23:52:59 -05:00
Johannes Doerfert	b1a90e1599	[OpenMP][CUDA] Add missing overload for `remquo(float,float,int*)` Reported by Colleen Bertoni <bertoni@anl.gov> after running the OvO test suite: https://github.com/TApplencourt/OvO/ The template overload is still hidden behind an ifdef for OpenMP. In the future we probably want to remove the ifdef but that requires further testing. Reviewed By: JonChesterfield, tra Differential Revision: https://reviews.llvm.org/D89971	2020-10-27 19:12:51 -05:00
Aaron En Ye Shi	3700556ecb	[HIP][NFC] Use correct max in cuda_complex_builtins Update the clang complex builtins for OpenMP to use the correct max function from either __nv_* or __ocml_*.	2020-10-27 19:35:09 +00:00
Aaron En Ye Shi	b2524eb944	[HIP] Fix HIP rounding math intrinsics The __ocml__rte_f32 and __ocml__rte_f64 functions are not available if OCML_BASIC_ROUNDED_OPERATIONS is not defined. Reviewed By: b-sumner, yaxunl Fixes: SWDEV-257235 Differential Revision: https://reviews.llvm.org/D89966	2020-10-22 15:57:09 +00:00
Tianqing Wang	be39a6fe6f	[X86] Add User Interrupts(UINTR) instructions For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D89301	2020-10-22 17:33:07 +08:00
Albion Fung	d30155feaa	[PowerPC] Implementation of 128-bit Binary Vector Rotate builtins This patch implements 128-bit Binary Vector Rotate builtins for PowerPC10. Differential Revision: https://reviews.llvm.org/D86819	2020-10-16 18:03:22 -04:00
Simon Pilgrim	6c23cbc560	[X86] Convert integer _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506) Emit the equivalent integer reduction intrinsics in IR instead of expanding to shuffle+arithmetic sequences. The fadd/fmul reductions might be trickier as they assume a similar bisection reduction while the generic intrinsics assume a sequential reduction (intel docs are ambiguous on the correct approach) - I'm not sure if we want to always tag them with reassoc? Anyway, that issue can wait until a separate fp patch along with the fmin/fmax reductions. Differential Revision: https://reviews.llvm.org/D87604	2020-10-13 09:28:39 +01:00
Wang, Pengfei	412cdcf2ed	[X86] Add HRESET instruction. For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D89102	2020-10-13 08:47:26 +08:00
Aaron En Ye Shi	8d2a0c115e	[HIP] NFC Add comments to cmath functions Add missing comments to cmath functions. Differential Revision: https://reviews.llvm.org/D88837	2020-10-06 15:26:56 +00:00
Aaron En Ye Shi	aa2b593f14	[HIP] Restructure hip headers to add cmath Separate __clang_hip_math.h header into __clang_hip_cmath.h and __clang_hip_math.h. Improve the math function definition, and add missing definitions or declarations. Add missing overloads. Reviewed By: tra, JonChesterfield Differential Review: https://reviews.llvm.org/D88837	2020-10-06 14:48:53 +00:00
Craig Topper	a02b449bb1	[X86] Sync AESENC/DEC Key Locker builtins with gcc. For the wide builtins, pass a single input and output pointer to the builtins. Emit the GEPs and input loads from CGBuiltin.	2020-10-04 12:09:41 -07:00
Craig Topper	230c57b0bd	[X86] Synchronize the encodekey builtins with gcc. Don't assume void* is 16 byte aligned. We were taking multiple pointer arguments in the builtin. gcc accepts a single void. The cast from void to _m128i* caused the IR generation to assume the pointer was aligned. Instead make the builtin take a single void, emit i8 GEPs to adjust then cast to <2 x i64>* and perform a store with align of 1.	2020-10-04 12:09:35 -07:00
Craig Topper	28595cbbeb	[X86] Synchronize the loadiwkey builtin operand order with gcc version.	2020-10-04 12:09:29 -07:00
Craig Topper	6c6cd5f8a9	[X86] Consolidate wide Key Locker intrinsics into the same header as the other Key Locker intrinsics.	2020-10-04 12:09:21 -07:00
Esme-Yi	e3475f5b91	[PowerPC] Add builtins for xvtdiv(dp\|sp) and xvtsqrt(dp\|sp). Summary: This patch implements the builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp. The instructions correspond to the following builtins: int vec_test_swdiv(vector double v1, vector double v2); int vec_test_swdivs(vector float v1, vector float v2); int vec_test_swsqrt(vector double v1); int vec_test_swsqrts(vector float v1); This patch depends on D88274, which fixes the bug in copying from CRRC to GPRC/G8RC. Reviewed By: steven.zhang, amyk Differential Revision: https://reviews.llvm.org/D88278	2020-10-04 16:24:20 +00:00

1 2 3 4 5 ...

1749 Commits