llvm-project

Commit Graph

Author	SHA1	Message	Date
Sven van Haastregt	d353d1c501	[OpenCL] Support cl_ext_float_atomics See https://github.com/KhronosGroup/OpenCL-Docs/pull/552 for initial specification. Patch by Haonan Yang. Differential Revision: https://reviews.llvm.org/D106343	2021-09-13 12:12:40 +01:00
Xiang1 Zhang	c81d6ab875	[X86] Adjust Keylocker handle mem size Reviewed By: Topper Craig Differential Revision: https://reviews.llvm.org/D109488	2021-09-13 18:03:27 +08:00
Xiang1 Zhang	bdce8d40c6	Revert "[X86] Adjust Keylocker handle mem size" This reverts commit `3731de6b7f`.	2021-09-13 18:00:46 +08:00
Xiang1 Zhang	3731de6b7f	[X86] Adjust Keylocker handle mem size Reviewed By: Topper Craig Differential Revision: https://reviews.llvm.org/D109354	2021-09-13 17:59:33 +08:00
Wang, Pengfei	2aaa6466fe	[X86] Support *_set1_pch(Float16 _Complex h) Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D109487	2021-09-11 17:47:31 +08:00
Joseph Huber	f28e710db7	[OpenMP] Make CUDA math library functions SPMD amenable This patch adds the SPMD amenable assumption to the CUDA math library defintions in Clang. Previously these functions would block SPMD execution on the device because they're intrinsic calls into the library and can't be calculated. These functions don't have side-effects so they are safe to execute in SPMD mode. Depends on D105937 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108958	2021-09-10 14:52:45 -04:00
Simon Pilgrim	ea685e1028	[X86][AVX] Update _mm256_loadu2_m128* intrinsics to use _mm256_set_m128* (PR51796) As reported on PR51796, the _mm256_loadu2_m128i in particular was inserting bitcasts and shuffles with different types making it trickier for some combines, and prevented the value tracker from identifying the shuffle sequences as a single insert_subvector style concat_vectors pattern. This patch instead concatenate the 128-bit unaligned loads with _mm256_set_m128*, which was written to avoid the unnecessary bitcasts and only emits a single shuffle. Differential Revision: https://reviews.llvm.org/D109497	2021-09-09 19:15:48 +01:00
Simon Pilgrim	55d9396278	[X86] Move _mm256_set_m128* intrinsics before _mm256_loadu2_m128* intrinsics. NFC. This is necessary for PR51796 where we'll update _mm256_loadu2_m128* to use _mm256_set_m128*	2021-09-09 11:23:50 +01:00
Pushpinder Singh	12dcbf913c	[AMDGPU][OpenMP] Use complex definitions from complex_cmath.h Following nvptx approach, this patch uses complex function definitions from complex_cmath.h. With this patch, ovo passes 23/34 complex mathematical test cases. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D109344	2021-09-09 10:55:17 +05:30
Tianqing Wang	12fa608af4	[X86] Add CRC32 feature. `d8faf03807` implemented general-regs-only for X86 by disabling all features with vector instructions. But the CRC32 instruction in SSE4.2 ISA, which uses only GPRs, also becomes unavailable. This patch adds a CRC32 feature for this instruction and allows it to be used with general-regs-only. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D105462	2021-09-06 17:24:30 +08:00
Stuart Brady	32955be6bf	[OpenCL] Remove decls for scalar vloada_half and vstorea_half* fns These functions are not part of the OpenCL C specification. See https://github.com/KhronosGroup/OpenCL-Docs/issues/648 for a clarification regarding the vloada_half declarations. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D108761	2021-09-02 22:08:09 +01:00
Nico Weber	e5438f3868	clang/win: Add __readfsdword to intrin.h When using __readfsdword(), clang used to warn that one has to include <intrin.h> -- no matter if that was already included or not. Now it only warns if it's not yet included. To verify that this was the only intrin with this problem, I ran: $ for f in $(grep intrin.h clang/include/clang/Basic/BuiltinsX86* \| egrep -o '\([^,]+,' \| egrep -o '[^(,]*'); do if ! grep -q $f clang/lib/Headers/intrin.h; then echo $f; fi; done This printed 9 more functions, but those are all in emmintrin.h, xsaveintrin.h (which are included by intrin.h based on /arch: flags). So this is indeed the only built-in that was missing in intrin.h. Fixes PR51188. Differential Revision: https://reviews.llvm.org/D109085	2021-09-02 12:22:07 -04:00
Justas Janickas	fb321c2ea2	[OpenCL] Define OpenCL 3.0 optional core features in C++ for OpenCL 2021 Modifies OpenCL 3.0 optional core feature macro definitions so that they are set analogously in C++ for OpenCL 2021. This change aims to achieve compatibility between C++ for OpenCL 2021 and OpenCL 3.0. Differential Revision: https://reviews.llvm.org/D108704	2021-09-01 10:15:17 +01:00
Victor Huang	2e5c17d19e	[PowerPC][NFC] Rename P10 builtins vec_clrl, vec_clrr to vec_clr_first and vec_clr_last This patch renames the vector clear left/right builtins vec_clrl, vec_clrr to vec_clr_first and vec_clr_last to avoid the ambiguities when dealing with endianness. Reviewed By: amyk, lei Differential revision: https://reviews.llvm.org/D108702	2021-08-30 09:52:15 -05:00
Wang, Pengfei	ab40dbfe03	[X86] AVX512FP16 instructions enabling 6/6 Enable FP16 complex FMA instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105269	2021-08-30 13:08:45 +08:00
Xiang1 Zhang	80f7ce8993	[X86] Support __SSC_MARK(const int id) Differential Revision: https://reviews.llvm.org/D108682	2021-08-30 09:55:35 +08:00
Xiang1 Zhang	4c29dc18cf	Revert "[X86] Support __SSC_MARK(const int id)" This reverts commit `78fbde5779`.	2021-08-30 09:50:26 +08:00
Xiang1 Zhang	78fbde5779	[X86] Support __SSC_MARK(const int id) Differential Revision: https://reviews.llvm.org/D108682	2021-08-30 09:21:22 +08:00
Xiang1 Zhang	fd88fac6ca	Revert "[X86] Support __SSC_MARK(const int id)" This reverts commit `83e82ff767`.	2021-08-30 09:18:27 +08:00
Xiang1 Zhang	83e82ff767	[X86] Support __SSC_MARK(const int id) Differential Revision: https://reviews.llvm.org/D108682	2021-08-30 08:51:20 +08:00
Pushpinder Singh	07e85823aa	[OpenMP][AMDGCN] Enable complex functions This patch enables basic complex functionality using the ocml builtins. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108552	2021-08-24 12:40:41 +05:30
Wang, Pengfei	c728bd5bba	[X86] AVX512FP16 instructions enabling 5/6 Enable FP16 FMA instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105268	2021-08-24 09:07:19 +08:00
Wang, Pengfei	b088536ce9	[X86] AVX512FP16 instructions enabling 4/6 Enable FP16 unary operator instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105267	2021-08-22 08:59:35 +08:00
Craig Topper	5cf5df8014	[X86] Add missing __inline__ to functions in amxintrin.h	2021-08-20 09:35:02 -07:00
Thomas Lively	88962cea46	[WebAssembly] Restore builtins and intrinsics for pmin/pmax Partially reverts `85157c0079`, which had removed these builtins and intrinsics in favor of normal codegen patterns. It turns out that it is possible for the patterns to be split over multiple basic blocks, however, which means that DAG ISel is not able to select them to the pmin/pmax instructions. To make sure the SIMD intrinsics generate the correct instructions in these cases, reintroduce the clang builtins and corresponding LLVM intrinsics, but also keep the normal pattern matching as well. Differential Revision: https://reviews.llvm.org/D108387	2021-08-20 09:21:31 -07:00
Thomas Lively	64a9957bf7	[WebAssembly] Make shift values unsigned in wasm_simd128.h On some platforms, negative shift values mean to shift in the opposite direction, but this is not true with WebAssembly. To avoid confusion, make the shift values in the shift intrinsics unsigned. Differential Revision: https://reviews.llvm.org/D108415	2021-08-20 09:10:37 -07:00
Thomas Lively	2456e11614	[WebAssembly] Add SIMD intrinsics using unsigned integers For each SIMD intrinsic function that takes or returns a scalar signed integer value, ensure there is a corresponding intrinsic that returns or an unsigned value. This is a convenience for users who use -Wsign-conversion so they don't have to insert explicit casts, especially when the intrinsic arguments are integer literals that fit into the unsigned integer type but not the signed type. Differential Revision: https://reviews.llvm.org/D108412	2021-08-20 08:56:51 -07:00
Thomas Lively	fd3bd63df2	[WebAssembly] Make bitmask instructions return unsigned ints Since they are bitmasks, it will be more common for them to be used and potentially extended to 64-bit integers as unsigned values rather than signed values. Differential Revision: https://reviews.llvm.org/D108401	2021-08-19 16:23:47 -07:00
Martin Storsjö	cc3affd8b0	[clang] [MSVC] Implement __mulh and __umulh builtins for aarch64 The code is based on the same __mulh and __umulh intrinsics for x86. This should fix PR51128. Differential Revision: https://reviews.llvm.org/D106721	2021-08-19 11:29:55 +03:00
Jon Chesterfield	dbd7bad9ad	[openmp] Annotate tmp variables with omp_thread_mem_alloc Fixes miscompile of calls into ocml. Bug 51445. The stack variable `double __tmp` is moved to dynamically allocated shared memory by CGOpenMPRuntimeGPU. This is usually fine, but when the variable is passed to a function that is explicitly annotated address_space(5) then allocating the variable off-stack leads to a miscompile in the back end, which cannot decide to move the variable back to the stack from shared. This could be fixed by removing the AS(5) annotation from the math library or by explicitly marking the variables as thread_mem_alloc. The cast to AS(5) is still a no-op once IR is reached. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D107971	2021-08-19 02:22:11 +01:00
Wang, Pengfei	2379949aad	[X86] AVX512FP16 instructions enabling 3/6 Enable FP16 conversion instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105265	2021-08-18 09:03:41 +08:00
Craig Topper	705b1191aa	[X86] Add parentheses around casts in X86 intrinsic headers. Fixes PR51324.	2021-08-14 18:14:44 -07:00
Wang, Pengfei	f1de9d6dae	[X86] AVX512FP16 instructions enabling 2/6 Enable FP16 binary operator instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105264	2021-08-15 08:56:33 +08:00
Craig Topper	d2cb189184	[X86] Use a do {} while (0) in the _MM_EXTRACT_FLOAT implementation. Previously we just used {}, but that doesn't work in situations like this. if (1) _MM_EXTRACT_FLOAT(d, x, n); else ... The semicolon would terminate the if.	2021-08-14 16:41:55 -07:00
Craig Topper	73c4c32767	[X86] Use __builtin_bit_cast _mm_extract_ps instead of type punning through a union. NFC	2021-08-14 16:35:55 -07:00
Craig Topper	4190d99dfc	[X86] Add parentheses around casts in some of the X86 intrinsic headers. This covers the SSE and AVX/AVX2 headers. AVX512 has a lot more macros due to rounding mode. Fixes part of PR51324. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D107843	2021-08-13 09:36:16 -07:00
Jon Chesterfield	6a8e5120ab	Revert "[openmp] Annotate tmp variables with omp_thread_mem_alloc" This reverts commit `b6113548c9`.	2021-08-12 17:44:36 +01:00
Jon Chesterfield	b6113548c9	[openmp] Annotate tmp variables with omp_thread_mem_alloc Fixes miscompile of calls into ocml. Bug 51445. The stack variable `double __tmp` is moved to dynamically allocated shared memory by CGOpenMPRuntimeGPU. This is usually fine, but when the variable is passed to a function that is explicitly annotated address_space(5) then allocating the variable off-stack leads to a miscompile in the back end, which cannot decide to move the variable back to the stack from shared. This could be fixed by removing the AS(5) annotation from the math library or by explicitly marking the variables as thread_mem_alloc. The cast to AS(5) is still a no-op once IR is reached. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D107971	2021-08-12 17:30:22 +01:00
Freddy Ye	6c1468854d	[X86] Reverse _set_ph and _setr_ph 's set order. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D107946	2021-08-12 16:27:04 +08:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Dave Airlie	1854db74c5	opencl-c.h: add 3.0 optional extension support for a few more bits These 3 are fairly simple, pipes, workgroups and subgroups. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D105858	2021-08-07 09:25:00 +10:00
Justas Janickas	a5a2f05dcc	[C++4OpenCL] Introduces __remove_address_space utility This change provides a way to conveniently declare types that have address space qualifiers removed. Since OpenCL adds address spaces implicitly even when they are not specified in source, it is useful to allow deriving address space unqualified types. Fixes llvm.org/PR45326 Differential Revision: https://reviews.llvm.org/D106785	2021-08-06 10:40:22 +01:00
Jon Chesterfield	509854b69c	[clang] Replace asm with __asm__ in cuda header Asm is a gnu extension for C, so at present -fopenmp -std=c99 and similar fail to compile on nvptx, bug 51344 Changing to `__asm__` or `__asm` works for openmp, all three appear to work for cuda. Suggesting `__asm__` here as `__asm` is used by MSVC with different syntax, so this should make for better error diagnostics if the header is passed to a compiler other than clang. Reviewed By: tra, emankov Differential Revision: https://reviews.llvm.org/D107492	2021-08-05 18:46:57 +01:00
Dave Airlie	14cb67862a	[OpenCL] allow generic address and non-generic defs for CL3.0 This allows both sets of definitions to exist on CL 3.0 Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D107318	2021-08-05 07:32:45 +10:00
Pushpinder Singh	f3eb5f900d	[AMDGPU][OpenMP] Wrap amdgcn declare variant inside ifdef This fixes the issue https://bugs.llvm.org/show_bug.cgi?id=51337 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107468	2021-08-04 15:24:46 +00:00
Pushpinder Singh	713a5d12cd	[OpenMP][AMDGCN] Initial math headers support With this patch, OpenMP on AMDGCN will use the math functions provided by ROCm ocml library. Linking device code to the ocml will be done in the next patch. Reviewed By: JonChesterfield, jdoerfert, scchan Differential Revision: https://reviews.llvm.org/D104904	2021-08-02 14:38:52 +00:00
Hans Wennborg	12dc13b73c	prfchwintrin.h: Make _m_prefetchw take a pointer to volatile (PR49124) For some reason, Microsoft declares _m_prefetch to take a const void, but _m_prefetchw to take a /volatile/ const void. Do the same for compatibility. Differential revision: https://reviews.llvm.org/D106790	2021-08-02 15:16:04 +02:00
Jon Chesterfield	7f97ddaf8a	Revert "[OpenMP][AMDGCN] Initial math headers support" Broke nvptx compilation on files including <complex> This reverts commit `12da97ea10`.	2021-07-30 22:07:00 +01:00
Nemanja Ivanovic	9019b55b60	[PowerPC] Fix byte ordering of ld/st with length on BE The builtins vec_xl_len_r and vec_xst_len_r actually use the wrong side of the vector on big endian Power9 systems. We never spotted this before because there was no such thing as a big endian distro that supported Power9. Now we have AIX and the elements are in the wrong part of the vector. This just fixes it so the elements are loaded to and stored from the right side of the vector.	2021-07-30 14:37:24 -05:00
Pushpinder Singh	12da97ea10	[OpenMP][AMDGCN] Initial math headers support With this patch, OpenMP on AMDGCN will use the math functions provided by ROCm ocml library. Linking device code to the ocml will be done in the next patch. Reviewed By: JonChesterfield, jdoerfert, scchan Differential Revision: https://reviews.llvm.org/D104904	2021-07-30 14:52:41 +00:00

1 2 3 4 5 ...

1902 Commits