Commit Graph

1923 Commits

Author SHA1 Message Date
Michael Liao 6fe902daf9 [cuda] Add address space predicate funuctions.
- Add the missing NVVM predicate builtins on address space checking
- Redefine them as pure functions so that they could be used in
  __builtin_assume.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D112053
2021-10-19 16:20:14 -04:00
Artem Belevich f526ee5b85 [CUDA] Provide address space conversion builtins.
CUDA-11 headers rely on these NVCC builtins.
Despite having `__nv` previx, those are *not* provided by libdevice.

Differential Revision: https://reviews.llvm.org/D111665
2021-10-12 14:56:39 -07:00
Sven van Haastregt 544d89e847 [OpenCL] Add atomic_half type builtins
Add atomic_half types and builtins operating on the types from the
cl_ext_float_atomics extension.

Patch by Haonan Yang.

Differential Revision: https://reviews.llvm.org/D109740
2021-10-12 10:45:30 +01:00
Qiu Chaofan 2fc0d439a4 [Clang] [PowerPC] Fix header include typo in smmintrin.h
The SSE4 header (smmintrin.h) should include SSSE3 (tmmintrin.h) instead
of SSE2 (emmintrin.h).

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D111482
2021-10-11 10:44:08 +08:00
Amy Kwan 03bfddae50 [NFC] Update vec_extract builtin signatures to take signed int.
This patch updates the vec_extract builtins to take a signed int as the second
parameter, as defined by the Power Vector Intrinsics Programming Reference.
This patch is NFC and all existing tests pass.

Differential Revision: https://reviews.llvm.org/D110935
2021-10-08 15:09:53 -05:00
Artem Belevich 29e00b29f7 [CUDA] Make sure <string.h> is included with original __THROW defined.
Otherwise we may end up with an inconsistent redeclarations of the standard
library functions if _FORTIFY_SOURCE is in effect.

https://bugs.llvm.org/show_bug.cgi?id=47869

Differential Revision: https://reviews.llvm.org/D110781
2021-10-07 11:43:56 -07:00
Amy Kwan 74b1ac7155 [NFC] Update return type of vec_popcnt to vector unsigned.
This patch updates the vec_popcnt builtins to return vector unsigned,
as defined by the Power Vector Intrinsics Programming Reference.
This patch is NFC and all existing tests pass.

Differential Revision: https://reviews.llvm.org/D110934
2021-10-07 11:33:19 -05:00
Artem Belevich 6707a7d7e9 [CUDA] remove unneeded includes from CUDA-related headers.
This should fix bot failures on PPC and windows.
2021-10-06 17:20:21 -07:00
Artem Belevich ccfb0555f7 [CUDA] Implement experimental support for texture lookups.
The patch implements header-only support for testure lookups.

The patch has been tested on a source file with all possible combinations of
argument types supported by CUDA headers, compiled and verified that the
generated instructions and their parameters match the code generated by NVCC.
Unfortunately, compiling texture code requires CUDA headers and can't be tested
in clang itself.  The test will need to be added to the test-suite later.

While generated code compiles and seems to match NVCC, I do not have any code
that uses textures that I could test correctness of the implementation. Hence
the experimental status.

Differential Revision: https://reviews.llvm.org/D110089
2021-10-06 15:15:53 -07:00
Nico Weber f9457f1f88 [clang] Don't mark _ReadBarrier, _ReadWriteBarrier, _WriteBarrier deprecated
It's true that docs.microsoft.com says:

"""The _ReadBarrier, _WriteBarrier, and _ReadWriteBarrier compiler
intrinsics and the MemoryBarrier macro are all deprecated and should not
be used. For inter-thread communication, use mechanisms such as
atomic_thread_fence and std::atomic<T>, which are defined in the C++
Standard Library. For hardware access, use the /volatile:iso compiler
option together with the volatile keyword."""

And these attributes have been here since these builtins were added in
r192860.

However:

- cl.exe does not warn on them even with /Wall
- none of the replacements are useful for C code
- we don't add __attribute__((__deprecated__())) to any other
  declarations in intrin.h
- intrin0.h in the MSVC headers declares _ReadWriteBarrier() (but
  without the deprecation attribute), so you get inconsistent
  deprecation warnings depending on if you include intrin.h or intrin0.h

The motivation is that compiling sqlite.h with clang-cl produces a
deprecation warning with clang-cl for _ReadWriteBarrier(), but not with
cl.exe.

Differential Revision: https://reviews.llvm.org/D111232
2021-10-06 10:50:02 -04:00
Albion Fung 13d3cd37e2 [PowerPC] Implement vector float and vector double version for vec_orc builtin
The builtin for vec_orc has support for the following two signatures,
but currently the compiler marks it ambiguous:
vector float vec_orc(vector float, vector float)
vector double vec_orc(vector double, vector double)

This patch implements these two builtins.

Differential revision: https://reviews.llvm.org/D110858
2021-10-06 02:47:42 -05:00
Lei Huang 8b3d944a97 [PowerPC] Disable vector types when not supported by subtarget features
Update clang to treat vector unsigned long long and friends as invalid
for AltiVec without VSX.

Reported in: https://bugs.llvm.org/show_bug.cgi?id=47782

Reviewed By: nemanjai, amyk

Differential Revision: https://reviews.llvm.org/D109178
2021-10-04 14:16:47 -05:00
Nemanja Ivanovic 369d785574 [PowerPC] Optimal sequence for doubleword vec_all_{eq|ne} on Power7
These builtins produce inefficient code for CPU's prior to Power8
due to vcmpequd being unavailable. The predicate forms can actually
leverage the available vcmpequw along with xxlxor to produce a better
sequence.
2021-10-01 08:27:15 -05:00
Nemanja Ivanovic fad14a17a4 [PowerPC] Truncate element index for vec_insert in altivec.h
When a user specifies an out-of-range index for vec_insert, we
just produce IR that has undefined behaviour even though the
documentation states that modulo arithmetic is used. This patch
just truncates the value to a valid index.
2021-09-30 05:58:22 -05:00
Nemanja Ivanovic 09b67aa1c3 [PowerPC] Implement builtin for vbpermd
The instruction has similar semantics to vbpermq but for doublewords.
It was added in Power9 and the ABI documents the builtin.

Differential revision: https://reviews.llvm.org/D107899
2021-09-29 06:34:31 -05:00
Wang, Pengfei 7d6889964a [X86][FP16] Add more builtins to avoid multi evaluation problems & add 2 missed intrinsics
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110336
2021-09-27 09:27:04 +08:00
Quinn Pham f9912fe4ea [PowerPC] Add range checks for P10 Vector Builtins
This patch adds range checking for some Power10 altivec builtins and
changes the signature of a builtin to match documentation. For `vec_cntm`,
range checking is done via SemaChecking. For `vec_splati_ins`, the second
argument is masked to extract the 0th bit so that we always receive either a `0`
or a `1`.

Reviewed By: lei, amyk

Differential Revision: https://reviews.llvm.org/D109710
2021-09-23 11:05:49 -05:00
Wang, Pengfei ebec077e07 [X86][FP16] Change the order of the operands in complex FMA intrinsics to allow swap between the mul operands.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D109658
2021-09-23 11:02:48 +08:00
Albion Fung b93359ea3f [PowerPC] Support for vector bool int128 on vector comparison builtins
This patch implements support for the type vector bool int128
for arguments on vector comparison builtins listed below,
which would otherwise crash due to ambiguity.

The following builtins are added:

vec_all_eq (vector bool __int128, vector bool __int128)
vec_all_ne (vector bool __int128, vector bool __int128)
vec_any_eq (vector bool __int128, vector bool __int128)
vec_any_ne (vector bool __int128, vector bool __int128)
vec_cmpne(vector bool __int128 a, vector bool __int128 b)
vec_cmpeq(vector bool __int128 a, vector bool __int128 b)

Differential revision: https://reviews.llvm.org/D110084
2021-09-21 16:29:37 -05:00
Justas Janickas 228dd20c3f [OpenCL] Supports atomics in C++ for OpenCL 2021
Atomics in C++ for OpenCL 2021 are now handled the same way as in
OpenCL C 3.0. This is a header-only change.

Differential Revision: https://reviews.llvm.org/D109424
2021-09-20 16:24:30 +01:00
serge-sans-paille 9aeecdfa8e Check supported architectures in sseXYZ/avxXYZ headers
It doesn't make sense to include those headers on the wrong architecture,
provide an explicit error message in that case.

Fix https://bugs.llvm.org/show_bug.cgi?id=48915

Differential Revision: https://reviews.llvm.org/D109686
2021-09-14 09:57:54 +02:00
Sven van Haastregt d353d1c501 [OpenCL] Support cl_ext_float_atomics
See https://github.com/KhronosGroup/OpenCL-Docs/pull/552 for initial
specification.

Patch by Haonan Yang.

Differential Revision: https://reviews.llvm.org/D106343
2021-09-13 12:12:40 +01:00
Xiang1 Zhang c81d6ab875 [X86] Adjust Keylocker handle mem size
Reviewed By: Topper Craig

Differential Revision: https://reviews.llvm.org/D109488
2021-09-13 18:03:27 +08:00
Xiang1 Zhang bdce8d40c6 Revert "[X86] Adjust Keylocker handle mem size"
This reverts commit 3731de6b7f.
2021-09-13 18:00:46 +08:00
Xiang1 Zhang 3731de6b7f [X86] Adjust Keylocker handle mem size
Reviewed By: Topper Craig

Differential Revision: https://reviews.llvm.org/D109354
2021-09-13 17:59:33 +08:00
Wang, Pengfei 2aaa6466fe [X86] Support *_set1_pch(Float16 _Complex h)
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D109487
2021-09-11 17:47:31 +08:00
Joseph Huber f28e710db7 [OpenMP] Make CUDA math library functions SPMD amenable
This patch adds the SPMD amenable assumption to the CUDA math library
defintions in Clang. Previously these functions would block SPMD
execution on the device because they're intrinsic calls into the library
and can't be calculated. These functions don't have side-effects so they
are safe to execute in SPMD mode.

Depends on D105937

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108958
2021-09-10 14:52:45 -04:00
Simon Pilgrim ea685e1028 [X86][AVX] Update _mm256_loadu2_m128* intrinsics to use _mm256_set_m128* (PR51796)
As reported on PR51796, the _mm256_loadu2_m128i in particular was inserting bitcasts and shuffles with different types making it trickier for some combines, and prevented the value tracker from identifying the shuffle sequences as a single insert_subvector style concat_vectors pattern.

This patch instead concatenate the 128-bit unaligned loads with _mm256_set_m128*, which was written to avoid the unnecessary bitcasts and only emits a single shuffle.

Differential Revision: https://reviews.llvm.org/D109497
2021-09-09 19:15:48 +01:00
Simon Pilgrim 55d9396278 [X86] Move _mm256_set_m128* intrinsics before _mm256_loadu2_m128* intrinsics. NFC.
This is necessary for PR51796 where we'll update _mm256_loadu2_m128* to use  _mm256_set_m128*
2021-09-09 11:23:50 +01:00
Pushpinder Singh 12dcbf913c [AMDGPU][OpenMP] Use complex definitions from complex_cmath.h
Following nvptx approach, this patch uses complex function
definitions from complex_cmath.h. With this patch, ovo passes
23/34 complex mathematical test cases.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D109344
2021-09-09 10:55:17 +05:30
Tianqing Wang 12fa608af4 [X86] Add CRC32 feature.
d8faf03807 implemented general-regs-only for X86 by disabling all features
with vector instructions. But the CRC32 instruction in SSE4.2 ISA, which uses
only GPRs, also becomes unavailable. This patch adds a CRC32 feature for this
instruction and allows it to be used with general-regs-only.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D105462
2021-09-06 17:24:30 +08:00
Stuart Brady 32955be6bf [OpenCL] Remove decls for scalar vloada_half and vstorea_half* fns
These functions are not part of the OpenCL C specification.

See https://github.com/KhronosGroup/OpenCL-Docs/issues/648 for a
clarification regarding the vloada_half declarations.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D108761
2021-09-02 22:08:09 +01:00
Nico Weber e5438f3868 clang/win: Add __readfsdword to intrin.h
When using __readfsdword(), clang used to warn that one has
to include <intrin.h> -- no matter if that was already included
or not.

Now it only warns if it's not yet included.

To verify that this was the only intrin with this problem, I ran:

    $ for f in $(grep intrin.h clang/include/clang/Basic/BuiltinsX86* |
                 egrep -o '\([^,]+,' | egrep -o '[^(,]*'); do
        if ! grep -q $f clang/lib/Headers/intrin.h; then echo $f; fi;
      done

This printed 9 more functions, but those are all in emmintrin.h,
xsaveintrin.h (which are included by intrin.h based on /arch: flags).
So this is indeed the only built-in that was missing in intrin.h.

Fixes PR51188.

Differential Revision: https://reviews.llvm.org/D109085
2021-09-02 12:22:07 -04:00
Justas Janickas fb321c2ea2 [OpenCL] Define OpenCL 3.0 optional core features in C++ for OpenCL 2021
Modifies OpenCL 3.0 optional core feature macro definitions so that
they are set analogously in C++ for OpenCL 2021.

This change aims to achieve compatibility between C++ for OpenCL
2021 and OpenCL 3.0.

Differential Revision: https://reviews.llvm.org/D108704
2021-09-01 10:15:17 +01:00
Victor Huang 2e5c17d19e [PowerPC][NFC] Rename P10 builtins vec_clrl, vec_clrr to vec_clr_first and vec_clr_last
This patch renames the vector clear left/right builtins vec_clrl, vec_clrr to
vec_clr_first and vec_clr_last to avoid the ambiguities when dealing with endianness.

Reviewed By: amyk, lei

Differential revision: https://reviews.llvm.org/D108702
2021-08-30 09:52:15 -05:00
Wang, Pengfei ab40dbfe03 [X86] AVX512FP16 instructions enabling 6/6
Enable FP16 complex FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105269
2021-08-30 13:08:45 +08:00
Xiang1 Zhang 80f7ce8993 [X86] Support __SSC_MARK(const int id)
Differential Revision: https://reviews.llvm.org/D108682
2021-08-30 09:55:35 +08:00
Xiang1 Zhang 4c29dc18cf Revert "[X86] Support __SSC_MARK(const int id)"
This reverts commit 78fbde5779.
2021-08-30 09:50:26 +08:00
Xiang1 Zhang 78fbde5779 [X86] Support __SSC_MARK(const int id)
Differential Revision: https://reviews.llvm.org/D108682
2021-08-30 09:21:22 +08:00
Xiang1 Zhang fd88fac6ca Revert "[X86] Support __SSC_MARK(const int id)"
This reverts commit 83e82ff767.
2021-08-30 09:18:27 +08:00
Xiang1 Zhang 83e82ff767 [X86] Support __SSC_MARK(const int id)
Differential Revision: https://reviews.llvm.org/D108682
2021-08-30 08:51:20 +08:00
Pushpinder Singh 07e85823aa [OpenMP][AMDGCN] Enable complex functions
This patch enables basic complex functionality using the ocml builtins.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108552
2021-08-24 12:40:41 +05:30
Wang, Pengfei c728bd5bba [X86] AVX512FP16 instructions enabling 5/6
Enable FP16 FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105268
2021-08-24 09:07:19 +08:00
Wang, Pengfei b088536ce9 [X86] AVX512FP16 instructions enabling 4/6
Enable FP16 unary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105267
2021-08-22 08:59:35 +08:00
Craig Topper 5cf5df8014 [X86] Add missing __inline__ to functions in amxintrin.h 2021-08-20 09:35:02 -07:00
Thomas Lively 88962cea46 [WebAssembly] Restore builtins and intrinsics for pmin/pmax
Partially reverts 85157c0079, which had removed these builtins and intrinsics
in favor of normal codegen patterns. It turns out that it is possible for the
patterns to be split over multiple basic blocks, however, which means that DAG
ISel is not able to select them to the pmin/pmax instructions. To make sure the
SIMD intrinsics generate the correct instructions in these cases, reintroduce
the clang builtins and corresponding LLVM intrinsics, but also keep the normal
pattern matching as well.

Differential Revision: https://reviews.llvm.org/D108387
2021-08-20 09:21:31 -07:00
Thomas Lively 64a9957bf7 [WebAssembly] Make shift values unsigned in wasm_simd128.h
On some platforms, negative shift values mean to shift in the opposite
direction, but this is not true with WebAssembly. To avoid confusion, make the
shift values in the shift intrinsics unsigned.

Differential Revision: https://reviews.llvm.org/D108415
2021-08-20 09:10:37 -07:00
Thomas Lively 2456e11614 [WebAssembly] Add SIMD intrinsics using unsigned integers
For each SIMD intrinsic function that takes or returns a scalar signed integer
value, ensure there is a corresponding intrinsic that returns or an
unsigned value. This is a convenience for users who use -Wsign-conversion so
they don't have to insert explicit casts, especially when the intrinsic
arguments are integer literals that fit into the unsigned integer type but not
the signed type.

Differential Revision: https://reviews.llvm.org/D108412
2021-08-20 08:56:51 -07:00
Thomas Lively fd3bd63df2 [WebAssembly] Make bitmask instructions return unsigned ints
Since they are bitmasks, it will be more common for them to be used and
potentially extended to 64-bit integers as unsigned values rather than signed
values.

Differential Revision: https://reviews.llvm.org/D108401
2021-08-19 16:23:47 -07:00
Martin Storsjö cc3affd8b0 [clang] [MSVC] Implement __mulh and __umulh builtins for aarch64
The code is based on the same __mulh and __umulh intrinsics for
x86.

This should fix PR51128.

Differential Revision: https://reviews.llvm.org/D106721
2021-08-19 11:29:55 +03:00