From 3bdd60095f637118b7227065868a9206577e4867 Mon Sep 17 00:00:00 2001 From: Alexey Bataev Date: Thu, 26 Jul 2018 17:53:45 +0000 Subject: [PATCH] [OPENMP] What's new for OpenMP in clang. Updated ReleaseNotes + Status of the OpenMP support in clang. llvm-svn: 338049 --- clang/docs/OpenMPSupport.rst | 75 +++++++++++++++++++++++++++++++++--- clang/docs/ReleaseNotes.rst | 16 +++++++- 2 files changed, 84 insertions(+), 7 deletions(-) diff --git a/clang/docs/OpenMPSupport.rst b/clang/docs/OpenMPSupport.rst index f34049473f82..e8ec1e371b04 100644 --- a/clang/docs/OpenMPSupport.rst +++ b/clang/docs/OpenMPSupport.rst @@ -10,13 +10,15 @@ .. role:: partial .. role:: good +.. contents:: + :local: + ================== OpenMP Support ================== -Clang fully supports OpenMP 3.1 + some elements of OpenMP 4.5. Clang supports offloading to X86_64, AArch64 and PPC64[LE] devices. -Support for Cuda devices is not ready yet. -The status of major OpenMP 4.5 features support in Clang. +Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64, +PPC64[LE] and has `basic support for Cuda devices`_. Standalone directives ===================== @@ -35,7 +37,7 @@ Standalone directives * #pragma omp target: :good:`Complete`. -* #pragma omp declare target: :partial:`Partial`. No full codegen support. +* #pragma omp declare target: :good:`Complete`. * #pragma omp teams: :good:`Complete`. @@ -64,5 +66,66 @@ Combined directives * #pragma omp target teams distribute parallel for [simd]: :good:`Complete`. -Clang does not support any constructs/updates from upcoming OpenMP 5.0 except for `reduction`-based clauses in the `task` and `target`-based directives. -In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS. +Clang does not support any constructs/updates from upcoming OpenMP 5.0 except +for `reduction`-based clauses in the `task` and `target`-based directives. + +In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools +Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS. +ows, and mac OS. + +.. _basic support for Cuda devices: + +Cuda devices support +==================== + +Directives execution modes +-------------------------- + +Clang code generation for target regions supports two modes: the SPMD and +non-SPMD modes. Clang chooses one of these two modes automatically based on the +way directives and clauses on those directives are used. The SPMD mode uses a +simplified set of runtime functions thus increasing performance at the cost of +supporting some OpenMP features. The non-SPMD mode is the most generic mode and +supports all currently available OpenMP features. The compiler will always +attempt to use the SPMD mode wherever possible. SPMD mode will not be used if: + + - The target region contains an `if()` clause that refers to a `parallel` + directive. + + - The target region contains a `parallel` directive with a `num_threads()` + clause. + + - The target region contains user code (other than OpenMP-specific + directives) in between the `target` and the `parallel` directives. + +Data-sharing modes +------------------ + +Clang supports two data-sharing models for Cuda devices: `Generic` and `Cuda` +modes. The default mode is `Generic`. `Cuda` mode can give an additional +performance and can be activated using the `-fopenmp-cuda-mode` flag. In +`Generic` mode all local variables that can be shared in the parallel regions +are stored in the global memory. In `Cuda` mode local variables are not shared +between the threads and it is user responsibility to share the required data +between the threads in the parallel regions. + +Features not supported or with limited support for Cuda devices +--------------------------------------------------------------- + +- Reductions across the teams are not supported yet. + +- Cancellation constructs are not supported. + +- Doacross loop nest is not supported. + +- User-defined reductions are supported only for trivial types. + +- Nested parallelism: inner parallel regions are executed sequentially. + +- Static linking of libraries containing device code is not supported yet. + +- Automatic translation of math functions in target regions to device-specific + math functions is not implemented yet. + +- Debug information for OpenMP target regions is not supported yet. + diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index dab70f1bcaa3..24f182b824e6 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -216,7 +216,21 @@ OpenCL C Language Changes in Clang OpenMP Support in Clang ---------------------------------- -- ... +- Clang gained basic support for OpenMP 4.5 offloading for NVPTX target. + To compile your program for NVPTX target use the following options: + `-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda` for 64 bit platforms or + `-fopenmp -fopenmp-targets=nvptx-nvidia-cuda` for 32 bit platform. + +- Passing options to the OpenMP device offloading toolchain can be done using + the `-Xopenmp-target= -opt=val` flag. In this way the `-opt=val` + option will be forwarded to the respective OpenMP device offloading toolchain + described by the triple. For example passing the compute capability to + the OpenMP NVPTX offloading toolchain can be done as follows: + `-Xopenmp-target=nvptx62-nvidia-cuda -march=sm_60`. For the case when only one + target offload toolchain is specified under the `-fopenmp-targets=` + option, then the triple can be skipped: `-Xopenmp-target -march=sm_60`. + +- Other bugfixes. CUDA Support in Clang ---------------------