llvm-project/openmp/libomptarget
Ye Luo c5348aecd7 [OpenMP] Use primary context in CUDA plugin
Summary:
Retaining per device primary context is preferred to creating a context owned by the plugin.

From CUDA documentation
1. Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used." from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html
2. Right under cuCtxCreate. In most cases it is recommended to use cuDevicePrimaryCtxRetain. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf
3. The primary context is unique per device and shared with the CUDA runtime API. These functions allow integration with other libraries using CUDA.  https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX

Two issues are addressed by this patch:
1. Not using the primary context caused interoperability issue with libraries like cublas, cusolver. CUBLAS_STATUS_EXECUTION_FAILED and cudaErrorInvalidResourceHandle
2. On OLCF summit, "Error returned from cuCtxCreate" and "CUDA error is: invalid device ordinal"

Regarding the flags of the primary context. If it is inactive, we set CU_CTX_SCHED_BLOCKING_SYNC. If it is already active, we respect the current flags.

Reviewers: grokos, ABataev, jdoerfert, protze.joachim, AndreyChurbanov, Hahnfeld

Reviewed By: jdoerfert

Subscribers: openmp-commits, yaxunl, guansong, sstefan1, tianshilei1992

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82718
2020-07-07 10:14:51 -04:00
..
cmake/Modules [Openmp][VE] Libomptarget plugin for NEC SX-Aurora 2020-05-12 10:47:30 +02:00
deviceRTLs [libomptarget] Implement atomic inc and fence functions for AMDGCN using clang builtins 2020-07-07 06:36:25 +00:00
include Revert "[OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info" 2020-06-17 15:01:16 -04:00
plugins [OpenMP] Use primary context in CUDA plugin 2020-07-07 10:14:51 -04:00
src [OpenMP] Adopt std::set in HostDataToTargetMap 2020-06-24 12:22:45 -04:00
test [libomptarget][test] Fix text relocations by adding -fPIC 2020-07-05 12:51:28 -07:00
CMakeLists.txt [OpenMP] build offload plugins before testing them 2019-11-28 17:43:56 -05:00
README.txt Unify build documentation and convert to reStructuredText 2017-12-27 09:15:10 +00:00

README.txt

    README for the LLVM* OpenMP* Offloading Runtime Library (libomptarget)
    ======================================================================

How to Build the LLVM* OpenMP* Offloading Runtime Library (libomptarget)
========================================================================
In-tree build:

$ cd where-you-want-to-live
Check out openmp (libomptarget lives under ./libomptarget) into llvm/projects
$ cd where-you-want-to-build
$ mkdir build && cd build
$ cmake path/to/llvm -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
$ make omptarget

Out-of-tree build:

$ cd where-you-want-to-live
Check out openmp (libomptarget lives under ./libomptarget)
$ cd where-you-want-to-live/openmp/libomptarget
$ mkdir build && cd build
$ cmake path/to/openmp -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
$ make

For details about building, please look at README.rst in the parent directory.

Architectures Supported
=======================
The current library has been only tested in Linux operating system and the
following host architectures:
* Intel(R) 64 architecture
* IBM(R) Power architecture (big endian)
* IBM(R) Power architecture (little endian)
* ARM(R) AArch64 architecture (little endian)

The currently supported offloading device architectures are:
* Intel(R) 64 architecture (generic 64-bit plugin - mostly for testing purposes)
* IBM(R) Power architecture (big endian) (generic 64-bit plugin - mostly for testing purposes)
* IBM(R) Power architecture (little endian) (generic 64-bit plugin - mostly for testing purposes)
* ARM(R) AArch64 architecture (little endian) (generic 64-bit plugin - mostly for testing purposes)
* CUDA(R) enabled 64-bit NVIDIA(R) GPU architectures

Supported RTL Build Configurations
==================================
Supported Architectures: Intel(R) 64, IBM(R) Power 7 and Power 8

              ---------------------------
              |   gcc      |   clang    |
--------------|------------|------------|
| Linux* OS   |  Yes(1)    |  Yes(2)    |
-----------------------------------------

(1) gcc version 4.8.2 or later is supported.
(2) clang version 3.7 or later is supported.


Front-end Compilers that work with this RTL
===========================================

The following compilers are known to do compatible code generation for
this RTL:
  - clang (from https://github.com/clang-ykt )
  - clang (development branch at http://clang.llvm.org - several features still
    under development)

-----------------------------------------------------------------------

Notices
=======
This library and related compiler support is still under development, so the
employed interface is likely to change in the future.

*Other names and brands may be claimed as the property of others.