forked from OSchip/llvm-project
06e08f0b0a
Summary: This patch adds a more sophisticated team reduction scheme to the OpenMP libomptarget-nvptx runtime. The scheme uses a fixed size global memory buffer whose length can be adjusted via compiler flag: ``` -fopenmp-cuda-teams-reduction-recs-num=1024 ``` The global buffer is a structure of arrays (with default size of 1024 each and controlled by the above flag), one array for each reduction variable. Values in the buffer are processed by the last team to finish executing the body of the target region. In addition to adding support for the new flag, the compiler also emits special functions used for the reduction of the intermediate reduction values. These changes will be added in a separate compiler patch following this one. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, jdoerfert, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D58409 llvm-svn: 354471 |
||
---|---|---|
.. | ||
cmake/Modules | ||
deviceRTLs | ||
include | ||
plugins | ||
src | ||
test | ||
CMakeLists.txt | ||
README.txt |
README.txt
README for the LLVM* OpenMP* Offloading Runtime Library (libomptarget) ====================================================================== How to Build the LLVM* OpenMP* Offloading Runtime Library (libomptarget) ======================================================================== In-tree build: $ cd where-you-want-to-live Check out openmp (libomptarget lives under ./libomptarget) into llvm/projects $ cd where-you-want-to-build $ mkdir build && cd build $ cmake path/to/llvm -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler> $ make omptarget Out-of-tree build: $ cd where-you-want-to-live Check out openmp (libomptarget lives under ./libomptarget) $ cd where-you-want-to-live/openmp/libomptarget $ mkdir build && cd build $ cmake path/to/openmp -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler> $ make For details about building, please look at README.rst in the parent directory. Architectures Supported ======================= The current library has been only tested in Linux operating system and the following host architectures: * Intel(R) 64 architecture * IBM(R) Power architecture (big endian) * IBM(R) Power architecture (little endian) * ARM(R) AArch64 architecture (little endian) The currently supported offloading device architectures are: * Intel(R) 64 architecture (generic 64-bit plugin - mostly for testing purposes) * IBM(R) Power architecture (big endian) (generic 64-bit plugin - mostly for testing purposes) * IBM(R) Power architecture (little endian) (generic 64-bit plugin - mostly for testing purposes) * ARM(R) AArch64 architecture (little endian) (generic 64-bit plugin - mostly for testing purposes) * CUDA(R) enabled 64-bit NVIDIA(R) GPU architectures Supported RTL Build Configurations ================================== Supported Architectures: Intel(R) 64, IBM(R) Power 7 and Power 8 --------------------------- | gcc | clang | --------------|------------|------------| | Linux* OS | Yes(1) | Yes(2) | ----------------------------------------- (1) gcc version 4.8.2 or later is supported. (2) clang version 3.7 or later is supported. Front-end Compilers that work with this RTL =========================================== The following compilers are known to do compatible code generation for this RTL: - clang (from https://github.com/clang-ykt ) - clang (development branch at http://clang.llvm.org - several features still under development) ----------------------------------------------------------------------- Notices ======= This library and related compiler support is still under development, so the employed interface is likely to change in the future. *Other names and brands may be claimed as the property of others.