Commit Graph

1667 Commits

Author SHA1 Message Date
Albion Fung 3136cbe29e [PowerPC] Implement Vector Shift Builtins
This patch implements the builtins for the vector shifts (shl, srl, sra), and
adds the appropriate test cases for these builtins. The builtins utilize the
vector shift instructions introduced within ISA 3.1.

Differential Revision: https://reviews.llvm.org/D83338
2020-08-12 18:26:58 -05:00
Yaxun (Sam) Liu ac3e720dc1 Make clang HIP headers compatible with C++98
Automation to detect compiler features, such as CMake's target_compile_features,
would attempt to detect compiler features by explicitly using langugage flags.
This change ensures that the HIP headers would still work with C++98.

Patch by Siu Chi Chan

Differential Revision: https://reviews.llvm.org/D85471

Change-Id: I304e964b18a525b0fde55efd841da74b6c4dc8ed
2020-08-07 13:50:22 -04:00
biplmish cce1b0e891 [PowerPC] Implement Vector Extract Low/High Order Builtins in LLVM/Clang
This patch implements the function prototypes vec_extractl and vec_extracth in altivec.h to utilize the vector extract double element instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D84622
2020-08-07 01:02:29 -05:00
Thomas Lively f496950001 [WebAssembly] Fix types in wasm_simd128.h and add tests
47f7174ffa changed the types used in the Wasm SIMD builtin functions,
but not all of their uses in wasm_simd128.h were updated. This commit
fixes wasm_simd128.h and adds tests to make sure similar problems do
not pass uncaught in the future.

Differential Revision: https://reviews.llvm.org/D85347
2020-08-05 14:00:01 -07:00
Artem Belevich 7d057efddc [CUDA] Work around a bug in rint/nearbyint caused by a broken implementation provided by CUDA.
Normally math functions are forwarded to __nv_* counterparts provided by CUDA's
libdevice bitcode. However, __nv_rint*()/__nv_nearbyint*() functions there have
a bug -- they use round() which rounds *up* instead of rounding towards the
nearest integer, so we end up with rint(2.5f) producing 3.0 instead of expected
2.0. The broken bitcode is not actually used by NVCC itself, which has both a
work-around in CUDA headers and, in recent versions, uses correct
implementations in NVCC's built-ins.

This patch implements equivalent workaround and directs rint*/nearbyint* to
__builtin_* variants that produce correct results.

Differential Revision: https://reviews.llvm.org/D85236
2020-08-05 13:13:48 -07:00
Dan Gohman 47f7174ffa [WebAssembly] Use "signed char" instead of "char" in SIMD intrinsics.
This allows people to use `int8_t` instead of `char`, -funsigned-char,
and generally decouples SIMD from the specialness of `char`.

And it makes intrinsics like `__builtin_wasm_add_saturate_s_i8x16`
and `__builtin_wasm_add_saturate_u_i8x16` use signed and unsigned
element types, respectively.

Differential Revision: https://reviews.llvm.org/D85074
2020-08-04 12:48:40 -07:00
Amy Kwan c4e5743232 [PowerPC] Implement low-order Vector Modulus Builtins, and add Vector Multiply/Divide/Modulus Builtins Tests
Power10 introduces new instructions for vector multiply, divide and modulus.
These instructions can be exploited by the builtin functions: vec_mul, vec_div,
and vec_mod, respectively.

This patch aims adds the function prototype, vec_mod, as vec_mul and vec_div
been previously implemented in altivec.h.

This patch also adds the following front end tests:
vec_mul for v2i64
vec_div for v4i32 and v2i64
vec_mod for v4i32 and v2i64

Differential Revision: https://reviews.llvm.org/D82576
2020-07-31 10:58:07 -05:00
Thomas Lively 11bb7eef41 [WebAssembly] Remove intrinsics for SIMD widening ops
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.

Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.

Differential Revision: https://reviews.llvm.org/D84556
2020-07-28 18:25:55 -07:00
Amy Kwan 74790a5dde [PowerPC] Implement Truncate and Store VSX Vector Builtins
This patch implements the `vec_xst_trunc` function in altivec.h in  order to
utilize the Store VSX Vector Rightmost [byte | half | word | doubleword] Indexed
instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D82467
2020-07-24 19:22:39 -05:00
Jon Chesterfield 679158e662 Make hip math headers easier to use from C
Summary:
Make hip math headers easier to use from C

Motivation is a step towards using the hip math headers to implement math.h
for openmp, which needs to work with C as well as C++. NFC for C++ code.

Reviewers: yaxunl, jdoerfert

Reviewed By: yaxunl

Subscribers: sstefan1, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D84476
2020-07-24 20:50:46 +01:00
Yaxun (Sam) Liu ce04d4e39c Fix pow and ldexp in HIP header 2020-07-21 17:39:46 -04:00
Amy Kwan 62f5ba624b [PowerPC][Power10] Implement Test LSB by Byte Builtins in LLVM/Clang
This patch implements builtins for the Test LSB by Byte instruction introduced
in Power10.

Differential Revision: https://reviews.llvm.org/D82431
2020-07-13 22:47:47 -05:00
Johannes Doerfert b5667d00e0 [OpenMP][CUDA] Fix std::complex in GPU regions
The old way worked to some degree for C++-mode but in C mode we actually
tried to introduce variants of macros (e.g., isinf). To make both modes
work reliably we get rid of those extra variants and directly use NVIDIA
intrinsics in the complex implementation. While this has to be revisited
as we add other GPU targets which want to reuse the code, it should be
fine for now.

Reviewed By: tra, JonChesterfield, yaxunl

Differential Revision: https://reviews.llvm.org/D83591
2020-07-11 00:40:05 -05:00
Johannes Doerfert 7f1e6fcff9 [OpenMP] Use __OPENMP_NVPTX__ instead of _OPENMP in wrapper headers
Due to recent changes we cannot use OpenMP in CUDA files anymore
(PR45533) as the math handling of CUDA is different when _OPENMP is
defined. We actually want this different behavior only if we are
offloading with OpenMP to NVIDIA, thus generating NVPTX. With this patch
we do not interfere with the CUDA math handling except if we are in
NVPTX offloading mode, as indicated by the presence of __OPENMP_NVPTX__.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D78155
2020-07-10 18:53:34 -05:00
Johannes Doerfert e3e47e8035 [OpenMP] Make complex soft-float functions on the GPU weak definitions
To avoid linkage errors we have to ensure the linkage allows multiple
definitions of these compiler inserted functions. Since they are on the
cold path of complex computations, we want to avoid `inline`. Instead,
we opt for `weak` and `noinline` for now.
2020-07-09 01:06:55 -05:00
Johannes Doerfert d999cbc988 [OpenMP] Initial support for std::complex in target regions
This simply follows the scheme we have for other wrappers. It resolves
the current link problem, e.g., `__muldc3 not found`, when std::complex
operations are used on a device.

This will not allow complex make math function calls to work properly,
e.g., sin, but that is more complex (pan intended) anyway.

Reviewed By: tra, JonChesterfield

Differential Revision: https://reviews.llvm.org/D80897
2020-07-08 17:33:59 -05:00
Craig Topper 82206e7fb4 [X86] Enabled a bunch of 64-bit Interlocked* functions intrinsics on 32-bit Windows to match recent MSVC
This enables _InterlockedAnd64/_InterlockedOr64/_InterlockedXor64/_InterlockedDecrement64/_InterlockedIncrement64/_InterlockedExchange64/_InterlockedExchangeAdd64/_InterlockedExchangeSub64 on 32-bit Windows

The backend already knows how to expand these to a loop using cmpxchg8b on 32-bit targets.

Fixes PR46595

Differential Revision: https://reviews.llvm.org/D83254
2020-07-08 10:39:56 -07:00
Xiang1 Zhang 939d8309db [X86-64] Support Intel AMX Intrinsic
INTEL ADVANCED MATRIX EXTENSIONS (AMX).
AMX is a new programming paradigm, it has a set of 2-dimensional registers
(TILES) representing sub-arrays from a larger 2-dimensional memory image and
operate on TILES.

These intrinsics use direct TMM register number as its params.

Spec can be found in Chapter 3 here https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D83111
2020-07-07 10:13:40 +08:00
Biplob Mishra 0c6b6e28e7 [PowerPC] Implement Vector Splat Immediate Builtins in Clang
Implements builtins for the following prototypes:
  vector signed int vec_splati (const signed int);
  vector float vec_splati (const float);
  vector double vec_splatid (const float);
  vector signed int vec_splati_ins (vector signed int, const unsigned int,
                                    const signed int);
  vector unsigned int vec_splati_ins (vector unsigned int, const unsigned int,
                                      const unsigned int);
  vector float vec_splati_ins (vector float, const unsigned int, const float);

Differential Revision: https://reviews.llvm.org/D82520
2020-07-06 20:29:33 -05:00
Lei Huang e359ab1eca [PowerPC][NFC] Fix indentation 2020-07-03 16:47:24 -05:00
Biplob Mishra 0939e04e41 [PowerPC] Implement Vector Insert Builtins in LLVM/Clang
Implements vec_insertl() and vec_inserth().

Differential Revision: https://reviews.llvm.org/D82365
2020-07-03 15:30:41 -05:00
Biplob Mishra ca464639a1 [PowerPC] Implement Vector Blend Builtins in LLVM/Clang
Implements vec_blendv()

Differential Revision: https://reviews.llvm.org/D82774
2020-07-02 16:52:52 -05:00
Biplob Mishra 286073484f [PowerPC]Implement Vector Permute Extended Builtin
Implements vector permute builtin: vec_permx()

Differential Revision: https://reviews.llvm.org/D82869
2020-07-02 14:53:18 -05:00
Biplob Mishra 88874f0746 [PowerPC]Implement Vector Shift Double Bit Immediate Builtins
Implement Vector Shift Double Bit Immediate Builtins in LLVM/Clang.
  * vec_sldb ();
  * vec_srdb ();

Differential Revision: https://reviews.llvm.org/D82440
2020-07-01 20:34:53 -05:00
Amy Kwan e0c02dc980 [PowerPC][Power10] Implement centrifuge, vector gather every nth bit, vector evaluate Builtins in LLVM/Clang
This patch implements builtins for the following prototypes:

unsigned long long __builtin_cfuged (unsigned long long, unsigned long long);
vector unsigned long long vec_cfuge (vector unsigned long long, vector unsigned long long);
unsigned long long vec_gnb (vector unsigned __int128, const unsigned int);
vector unsigned char vec_ternarylogic (vector unsigned char, vector unsigned char, vector unsigned char, const unsigned int);
vector unsigned short vec_ternarylogic (vector unsigned short, vector unsigned short, vector unsigned short, const unsigned int);
vector unsigned int vec_ternarylogic (vector unsigned int, vector unsigned int, vector unsigned int, const unsigned int);
vector unsigned long long vec_ternarylogic (vector unsigned long long, vector unsigned long long, vector unsigned long long, const unsigned int);
vector unsigned __int128 vec_ternarylogic (vector unsigned __int128, vector unsigned __int128, vector unsigned __int128, const unsigned int);

Differential Revision: https://reviews.llvm.org/D80970
2020-06-25 21:34:41 -05:00
Amy Kwan d82f26cc4b [PowerPC][Power10] Implement Count Leading/Trailing Zeroes Builtins under bit Mask in LLVM/Clang
This patch implements builtins for the following prototypes:

unsigned long long __builtin_cntlzdm (unsigned long long, unsigned long long)
unsigned long long __builtin_cnttzdm (unsigned long long, unsigned long long)
vector unsigned long long vec_cntlzm (vector unsigned long long, vector unsigned long long)
vector unsigned long long vec_cnttzm (vector unsigned long long, vector unsigned long long)

Differential Revision: https://reviews.llvm.org/D80941
2020-06-24 16:03:45 -05:00
Amy Kwan 19df9e2959 [PowerPC][Power10] Implement VSX PCV Generate Operations in LLVM/Clang
This patch implements builtins for the following prototypes for the VSX Permute
Control Vector Generate with Mask Instructions:

vector unsigned char vec_genpcvm (vector unsigned char, const int);
vector unsigned short vec_genpcvm (vector unsigned short, const int);
vector unsigned int vec_genpcvm (vector unsigned int, const int);
vector unsigned long long vec_genpcvm (vector unsigned long long, const int);

Differential Revision: https://reviews.llvm.org/D81774
2020-06-22 21:09:34 -05:00
Amy Kwan cc95635b1b [PowerPC][Power10] Implement Vector Clear Left/Rightmost Bytes Builtins in LLVM/Clang
This patch implements builtins for the following prototypes:
```
vector signed char vec_clrl (vector signed char a, unsigned int n);
vector unsigned char vec_clrl (vector unsigned char a, unsigned int n);
vector signed char vec_clrr (vector signed char a, unsigned int n);
vector signed char vec_clrr (vector unsigned char a, unsigned int n);
```

Differential Revision: https://reviews.llvm.org/D81707
2020-06-20 18:29:16 -05:00
Amy Kwan c45c161130 [PowerPC][Power10] Implement Parallel Bits Deposit/Extract Builtins in LLVM/Clang
This patch implements builtins for the following prototypes:

vector unsigned long long vec_pdep(vector unsigned long long, vector unsigned long long);
vector unsigned long long vec_pext(vector unsigned long long, vector unsigned long long __b);
unsigned long long __builtin_pdepd (unsigned long long, unsigned long long);
unsigned long long __builtin_pextd (unsigned long long, unsigned long long);

Revision Depends on D80758

Differential Revision: https://reviews.llvm.org/D80935
2020-06-18 16:23:56 -05:00
Thomas Lively d2c394e74f [WebAssembly] Add intrinsic for i64x2.mul
Summary:
This instruction was implemented in 3181273be7, but that commit did
not add an intrinsic for it.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D81757
2020-06-12 14:08:18 -07:00
Yaxun (Sam) Liu af00eb25f8 Fix __clang_cuda_math_forward_declares.h
Recent change from `#if !defined(__CUDA__)` to `#if !__CUDA__` caused
regression on ROCm 3.5 since there is `#define __CUDA__`
before inclusion of the header file, which causes `#if !__CUDA__`
to be invalid.

Change `#if !__CUDA__` back to `#if !defined(__CUDA__)` for backward
compatibility.
2020-06-10 23:47:13 -04:00
Yaxun (Sam) Liu 4615abc11f Rename arg name in __clang_hip_math.h
__sptr is a keyword for ms-extension. Change it from __sptr to __sinptr.
2020-06-08 14:13:32 -04:00
Yaxun (Sam) Liu 8422bc9efc recommit "[HIP] Add default header and include path"
recommit 11d06b9511 with
fix for lit tests.
2020-06-06 14:21:22 -04:00
Nico Weber 2920348063 Revert "recommit "[HIP] Add default header and include path""
This reverts commit 1fa43e0b34.
Still breaks tests on several bots, see https://reviews.llvm.org/D81176
2020-06-05 21:50:04 -04:00
Yaxun (Sam) Liu 1fa43e0b34 recommit "[HIP] Add default header and include path"
recommit 11d06b9511 with
fix for lit tests.
2020-06-05 20:41:15 -04:00
Yaxun (Sam) Liu 8a8c6913a9 Revert "[HIP] Add default header and include path"
This reverts commit 11d06b9511.
2020-06-05 15:42:57 -04:00
Yaxun (Sam) Liu 11d06b9511 [HIP] Add default header and include path
To support std::complex and some other standard C/C++ functions in HIP device code,
they need to be forced to be __host__ __device__ functions by pragmas. This is done
by some clang standard C++ wrapper headers which are shared between cuda-clang and hip-Clang.

For these standard C++ wapper headers to work properly, specific include path order
has to be enforced:

  clang C++ wrapper include path
  standard C++ include path
  clang include path

Also, these C++ wrapper headers require device version of some standard C/C++ functions
must be declared before including them. This needs to be done by including a default
header which declares or defines these device functions. The default header is always
included before any other headers are included by users.

This patch adds the the default header and include path for HIP.

Differential Revision: https://reviews.llvm.org/D81176
2020-06-05 12:44:57 -04:00
Ties Stuij a6fcf5ca03 [clang][BFloat] add NEON emitter for bfloat
Summary:
This patch adds the bfloat16_t struct typedefs (e.g. bfloat16x8x2_t) to
arm_neon.h

This patch is part of a series implementing the Bfloat16 extension of the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:
- Luke Cheeseman
- Simon Tatham
- Ties Stuij

Reviewers: t.p.northover, fpetrogalli, sdesmalen, az, LukeGeeson

Reviewed By: fpetrogalli

Subscribers: SjoerdMeijer, LukeGeeson, pbarrio, mgorny, kristof.beyls, ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79708
2020-06-05 14:11:51 +01:00
Anastasia Stulova 4a4402f0d7 [OpenCL] Add cl_khr_extended_subgroup extensions.
Added extensions and their function declarations into
the standard header.

Patch by Piotr Fusik!

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79781
2020-06-04 13:29:30 +01:00
Thomas Lively 237be3404b [WebAssembly] Improve macro hygiene in wasm_simd128.h
Summary:
The shuffle intrinsic macros did not parenthesize usages of their
constant parameters, which could lead to incorrect results due to
operator precedence issues. This patch fixes the problem by adding the
missing paretheses.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D80968
2020-06-02 12:55:06 -07:00
Craig Topper 1b02db52b7 [X86] Update some av512 shift intrinsics to use "unsigned int" parameter instead of int to match Intel documentation
There are 65 that take a scalar shift amount. Intel documentation shows 60 of them taking unsigned int. There are 5 versions of srli_epi16 that use int, the 512-bit maskz and 128/256 mask/maskz.

Fixes PR45931

Differential Revision: https://reviews.llvm.org/D80251
2020-05-22 20:12:57 -07:00
Thomas Lively 3181273be7 [WebAssembly] Implement i64x2.mul and remove i8x16.mul
Summary:
This reflects changes in the spec proposal made since basic arithmetic
was first implemented.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80174
2020-05-19 12:50:44 -07:00
Xiang1 Zhang bcc0c894f3 Add cet.h for writing CET-enabled assembly code
Summary:
Add x86 feature with IBT and/or SHSTK bits to ELF program property if they  are enabled. Otherwise, contents in this header file are unused.
This file is mainly design for assembly source code which want to enable CET

Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith

Reviewed By: LuoYuanke

Subscribers: cfe-commits, mgorny

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79617
2020-05-19 14:03:17 +08:00
Xiang1 Zhang 62a9eca859 Test asm-cet.S fail for window clang
This reverts commit e7e84ff24a.
2020-05-19 13:18:05 +08:00
Xiang1 Zhang e7e84ff24a Add cet.h for writing CET-enabled assembly code
Summary:
Add x86 feature with IBT and/or SHSTK bits to ELF program property if they  are enabled. Otherwise, contents in this header file are unused.
This file is mainly design for assembly source code which want to enable CET

Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith

Reviewed By: LuoYuanke

Subscribers: mgorny

Differential Revision: https://reviews.llvm.org/D79617
2020-05-19 10:37:46 +08:00
Thomas Lively c702d4bf41 [WebAssembly] Update latest implemented SIMD instructions
Summary:
Move instructions that have recently been implemented in V8 from the
`unimplemented-simd128` target feature to the `simd128` target
feature. The updated instructions match the update at
https://github.com/WebAssembly/simd/pull/223.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D79973
2020-05-15 10:53:02 -07:00
Thomas Lively 3d49d1cfa7 [WebAssembly] Implement pseudo-min/max SIMD instructions
Summary:
As proposed in https://github.com/WebAssembly/simd/pull/122. Since
these instructions are not yet merged to the SIMD spec proposal, this
patch makes them entirely opt-in by surfacing them only through LLVM
intrinsics and clang builtins. If these instructions are made
official, these intrinsics and builtins should be replaced with simple
instruction patterns.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D79742
2020-05-12 09:39:01 -07:00
Thomas Lively 8e3e56f2a3 [WebAssembly] Add wasm-specific vector shuffle builtin and intrinsic
Summary:

Although using `__builtin_shufflevector` and the `shufflevector`
instruction works fine, they are not opaque to the optimizer. As a
result, DAGCombine can potentially reduce the number of shuffles and
change the shuffle masks. This is unexpected behavior for users of the
WebAssembly SIMD intrinsics who have crafted their shuffles to
optimize the code generated by engines. This patch solves the problem
by adding a new shuffle intrinsic that is opaque to the optimizers in
line with the decision of the WebAssembly SIMD contributors at
https://github.com/WebAssembly/simd/issues/196#issuecomment-622494748. In
the future we may implement custom DAG combines to properly optimize
shuffles and replace this solution.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D66983
2020-05-11 10:01:55 -07:00
Thomas Lively e0f52842c8 [WebAssembly] Renumber SIMD opcodes
Summary:
As described in https://github.com/WebAssembly/simd/pull/209. This is
the final reorganization of the SIMD opcode space before
standardization. It has been landed in concert with corresponding
changes in other projects in the WebAssembly SIMD ecosystem.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79224
2020-05-01 17:20:49 -07:00
Craig Topper af28e02e74 [clang] Add vendor identity for Hygon Dhyana processor to cpuid.h
The vendor id is used to determine whether the processor
supports hardware CRC32 in the Scudo code. The previous
discussion about the patch is in [1], and more information
about Hygon Dhyana processor is in[2].

[1]: https://reviews.llvm.org/D62368
[2]: https://git.kernel.org/torvalds/c/c9661c1e80b609cd038db7c908e061f0535804ef

Patch by fanjinke (Jinke Fan)

Differential Revision: https://reviews.llvm.org/D78874
2020-04-30 18:17:01 -07:00