forked from OSchip/llvm-project
[OPENMP] What's new for OpenMP in clang.
Updated ReleaseNotes + Status of the OpenMP support in clang. llvm-svn: 338049
This commit is contained in:
parent
66d405d31f
commit
3bdd60095f
|
@ -10,13 +10,15 @@
|
|||
.. role:: partial
|
||||
.. role:: good
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
==================
|
||||
OpenMP Support
|
||||
==================
|
||||
|
||||
Clang fully supports OpenMP 3.1 + some elements of OpenMP 4.5. Clang supports offloading to X86_64, AArch64 and PPC64[LE] devices.
|
||||
Support for Cuda devices is not ready yet.
|
||||
The status of major OpenMP 4.5 features support in Clang.
|
||||
Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64,
|
||||
PPC64[LE] and has `basic support for Cuda devices`_.
|
||||
|
||||
Standalone directives
|
||||
=====================
|
||||
|
@ -35,7 +37,7 @@ Standalone directives
|
|||
|
||||
* #pragma omp target: :good:`Complete`.
|
||||
|
||||
* #pragma omp declare target: :partial:`Partial`. No full codegen support.
|
||||
* #pragma omp declare target: :good:`Complete`.
|
||||
|
||||
* #pragma omp teams: :good:`Complete`.
|
||||
|
||||
|
@ -64,5 +66,66 @@ Combined directives
|
|||
|
||||
* #pragma omp target teams distribute parallel for [simd]: :good:`Complete`.
|
||||
|
||||
Clang does not support any constructs/updates from upcoming OpenMP 5.0 except for `reduction`-based clauses in the `task` and `target`-based directives.
|
||||
In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS.
|
||||
Clang does not support any constructs/updates from upcoming OpenMP 5.0 except
|
||||
for `reduction`-based clauses in the `task` and `target`-based directives.
|
||||
|
||||
In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools
|
||||
Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS.
|
||||
ows, and mac OS.
|
||||
|
||||
.. _basic support for Cuda devices:
|
||||
|
||||
Cuda devices support
|
||||
====================
|
||||
|
||||
Directives execution modes
|
||||
--------------------------
|
||||
|
||||
Clang code generation for target regions supports two modes: the SPMD and
|
||||
non-SPMD modes. Clang chooses one of these two modes automatically based on the
|
||||
way directives and clauses on those directives are used. The SPMD mode uses a
|
||||
simplified set of runtime functions thus increasing performance at the cost of
|
||||
supporting some OpenMP features. The non-SPMD mode is the most generic mode and
|
||||
supports all currently available OpenMP features. The compiler will always
|
||||
attempt to use the SPMD mode wherever possible. SPMD mode will not be used if:
|
||||
|
||||
- The target region contains an `if()` clause that refers to a `parallel`
|
||||
directive.
|
||||
|
||||
- The target region contains a `parallel` directive with a `num_threads()`
|
||||
clause.
|
||||
|
||||
- The target region contains user code (other than OpenMP-specific
|
||||
directives) in between the `target` and the `parallel` directives.
|
||||
|
||||
Data-sharing modes
|
||||
------------------
|
||||
|
||||
Clang supports two data-sharing models for Cuda devices: `Generic` and `Cuda`
|
||||
modes. The default mode is `Generic`. `Cuda` mode can give an additional
|
||||
performance and can be activated using the `-fopenmp-cuda-mode` flag. In
|
||||
`Generic` mode all local variables that can be shared in the parallel regions
|
||||
are stored in the global memory. In `Cuda` mode local variables are not shared
|
||||
between the threads and it is user responsibility to share the required data
|
||||
between the threads in the parallel regions.
|
||||
|
||||
Features not supported or with limited support for Cuda devices
|
||||
---------------------------------------------------------------
|
||||
|
||||
- Reductions across the teams are not supported yet.
|
||||
|
||||
- Cancellation constructs are not supported.
|
||||
|
||||
- Doacross loop nest is not supported.
|
||||
|
||||
- User-defined reductions are supported only for trivial types.
|
||||
|
||||
- Nested parallelism: inner parallel regions are executed sequentially.
|
||||
|
||||
- Static linking of libraries containing device code is not supported yet.
|
||||
|
||||
- Automatic translation of math functions in target regions to device-specific
|
||||
math functions is not implemented yet.
|
||||
|
||||
- Debug information for OpenMP target regions is not supported yet.
|
||||
|
||||
|
|
|
@ -216,7 +216,21 @@ OpenCL C Language Changes in Clang
|
|||
OpenMP Support in Clang
|
||||
----------------------------------
|
||||
|
||||
- ...
|
||||
- Clang gained basic support for OpenMP 4.5 offloading for NVPTX target.
|
||||
To compile your program for NVPTX target use the following options:
|
||||
`-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda` for 64 bit platforms or
|
||||
`-fopenmp -fopenmp-targets=nvptx-nvidia-cuda` for 32 bit platform.
|
||||
|
||||
- Passing options to the OpenMP device offloading toolchain can be done using
|
||||
the `-Xopenmp-target=<triple> -opt=val` flag. In this way the `-opt=val`
|
||||
option will be forwarded to the respective OpenMP device offloading toolchain
|
||||
described by the triple. For example passing the compute capability to
|
||||
the OpenMP NVPTX offloading toolchain can be done as follows:
|
||||
`-Xopenmp-target=nvptx62-nvidia-cuda -march=sm_60`. For the case when only one
|
||||
target offload toolchain is specified under the `-fopenmp-targets=<triples>`
|
||||
option, then the triple can be skipped: `-Xopenmp-target -march=sm_60`.
|
||||
|
||||
- Other bugfixes.
|
||||
|
||||
CUDA Support in Clang
|
||||
---------------------
|
||||
|
|
Loading…
Reference in New Issue