forked from OSchip/llvm-project
8ec9aa236e
Nesting mode is a new experimental feature in the OpenMP runtime. It allows a user to set up nesting for an application in a way that corresponds to the hardware topology levels on the machine an application is being run on. For example, if a machine has 2 sockets, each with 12 cores, then use of nesting mode could set up an outer level of nesting that uses 2 threads per parallel region, and an inner level of nesting that uses 12 threads per parallel region. Nesting mode is controlled with the KMP_NESTING_MODE environment variable as follows: 1) KMP_NESTING_MODE = 0: Nesting mode is off (default); max-active-levels-var is set to 1 (the default -- nesting is off, nested parallel regions are serialized). 2) KMP_NESTING_MODE = 1: Nesting mode is on, and a number of threads will be assigned for each level discovered in the machine topology; max-active-levels-var is set to the number of levels discovered. 3) KMP_NESTING_MODE = n, n>1: [Note: this option is experimental and may change or be removed in the future.] Nesting mode is on, and a number of threads will be assigned for each topology level discovered on the machine, up to k<=n levels (since there may be fewer than n levels discovered in the topology), and beyond the kth level, nested parallel regions will be serialized; NOTE: max-active-levels-var is 1 (the default -- nesting is off, and nested parallel regions are serialized until the user changes max-active-levels-var. If the user sets OMP_NUM_THREADS or OMP_MAX_ACTIVE_LEVELS, they will override KMP_NESTING_MODE settings for the associated environment variables. The detected topology may be limited by an affinity mask setting on the initial thread, or if the user sets KMP_HW_SUBSET. See also: KMP_HOT_TEAMS_MAX_LEVEL for controlling use of hot teams for nested parallel regions. Note that this feature only sets numbers of threads used at nesting levels. The user should make use of OMP_PLACES and OMP_PROC_BIND or KMP_AFFINITY for affinitizing those threads, if desired. Differential Revision: https://reviews.llvm.org/D102188 |
||
---|---|---|
.. | ||
cmake | ||
doc | ||
src | ||
test | ||
tools | ||
.clang-format | ||
CMakeLists.txt | ||
README.txt |
README.txt
README for the LLVM* OpenMP* Runtime Library ============================================ How to Build Documentation ========================== The main documentation is in Doxygen* format, and this distribution should come with pre-built PDF documentation in doc/Reference.pdf. However, an HTML version can be built by executing: % doxygen doc/doxygen/config in the runtime directory. That will produce HTML documentation in the doc/doxygen/generated directory, which can be accessed by pointing a web browser at the index.html file there. If you don't have Doxygen installed, you can download it from www.doxygen.org. How to Build the LLVM* OpenMP* Runtime Library ============================================== In-tree build: $ cd where-you-want-to-live Check out openmp into llvm/projects $ cd where-you-want-to-build $ mkdir build && cd build $ cmake path/to/llvm -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler> $ make omp Out-of-tree build: $ cd where-you-want-to-live Check out openmp $ cd where-you-want-to-live/openmp/runtime $ mkdir build && cd build $ cmake path/to/openmp -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler> $ make For details about building, please look at README.rst in the parent directory. Architectures Supported ======================= * IA-32 architecture * Intel(R) 64 architecture * Intel(R) Many Integrated Core Architecture * ARM* architecture * Aarch64 (64-bit ARM) architecture * IBM(R) Power architecture (big endian) * IBM(R) Power architecture (little endian) * MIPS and MIPS64 architecture * RISCV64 architecture Supported RTL Build Configurations ================================== Supported Architectures: IA-32 architecture, Intel(R) 64, and Intel(R) Many Integrated Core Architecture ---------------------------------------------- | icc/icl | gcc | clang | --------------|---------------|----------------------------| | Linux* OS | Yes(1,5) | Yes(2,4) | Yes(4,6,7) | | FreeBSD* | No | No | Yes(4,6,7,8) | | OS X* | Yes(1,3,4) | No | Yes(4,6,7) | | Windows* OS | Yes(1,4) | No | No | ------------------------------------------------------------ (1) On IA-32 architecture and Intel(R) 64, icc/icl versions 12.x are supported (12.1 is recommended). (2) GCC* version 4.7 is supported. (3) For icc on OS X*, OS X* version 10.5.8 is supported. (4) Intel(R) Many Integrated Core Architecture not supported. (5) On Intel(R) Many Integrated Core Architecture, icc/icl versions 13.0 or later are required. (6) Clang* version 3.3 is supported. (7) Clang* currently does not offer a software-implemented 128 bit extended precision type. Thus, all entry points reliant on this type are removed from the library and cannot be called in the user program. The following functions are not available: __kmpc_atomic_cmplx16_* __kmpc_atomic_float16_* __kmpc_atomic_*_fp (8) Community contribution provided AS IS, not tested by Intel. Supported Architectures: IBM(R) Power 7 and Power 8 ----------------------------- | gcc | clang | --------------|------------|--------------| | Linux* OS | Yes(1,2) | Yes(3,4) | ------------------------------------------- (1) On Power 7, gcc version 4.8.2 is supported. (2) On Power 8, gcc version 4.8.2 is supported. (3) On Power 7, clang version 3.7 is supported. (4) On Power 8, clang version 3.7 is supported. Front-end Compilers that work with this RTL =========================================== The following compilers are known to do compatible code generation for this RTL: clang (from the OpenMP development branch at http://clang-omp.github.io/ ), Intel compilers, GCC. See the documentation for more details. ----------------------------------------------------------------------- Notices ======= *Other names and brands may be claimed as the property of others.