The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks. We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.
Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.
Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D77609
[libomptarget][nvptx] Reduce calls to cuda header
Remove use of clock_t in favour of a builtin. Drop a preprocessor branch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94731
[libomptarget][nvptx][nfc] Move target_impl functions out of header
This removes most of the differences between the two target_impl.h.
Also change name mangling from C to C++ for __kmpc_impl_*_lock.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D94728
`omptarget-nvptx` is still a dependence for `check-libomptarget-nvtpx`
although it has been removed by D94573.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D94725
The comment said CUDA 9 header files use the `nv_weak` attribute which
`clang` is not yet prepared to handle. It's three years ago and now things have
changed. Based on my test, removing the definition doesn't have any problem on
my machine with CUDA 11.1 installed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94700
For NVPTX target, OpenMP provides a static library `libomptarget-nvptx`
built by NVCC, and another bitcode `libomptarget-nvptx-sm_{$sm}.bc` generated by
Clang. When compiling an OpenMP program, the `.bc` file will be fed to `clang`
in the second run on the program that compiles the target part. Then the generated
PTX file will be fed to `ptxas` to generate the object file, and finally the driver
invokes `nvlink` to generate the binary, where the static library will be appened
to `nvlink`.
One question is, why do we need two libraries? The only difference is, the static
library contains `omp_data.cu` and the bitcode library doesn't. It's unclear why
they were implemented in this way, but per D94565, there is no issue if we also
include the file into the bitcode library. Therefore, we can safely drop the
static library.
This patch is about the change in OpenMP. The driver will be updated as well if
this patch is accepted.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D94573
Restore control of kernel launch tracing to be >= 1 as it was before
export LIBOMPTARGET_KERNEL_TRACE=1
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D94695
Hierarchical barrier is an experimental barrier algorithm that uses aspects
of machine hierarchy to define the barrier tree structure. This patch fixes
offset calculation in hierarchical barrier. The offset is used to store info
on a flag about sleeping threads waiting on a location stored in the flag.
This commit also fixes a potential deadlock in hierarchical barrier when
using infinite blocktime by adjusting the offset value of leaf kids so that
it matches the value of leaf state. It also adds testing of default barriers
with infinite blocktime, and also tests hierarchical barrier algorithm with
both default and infinite blocktime.
Patch by Terry Wilmarth and Nawrin Sultana.
Differential Revision: https://reviews.llvm.org/D94241
Add extra information to the runtime page describing the error messages and add information to the release notes for clang 12.0
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94562
This change enables volatile use of persistent memory for omp_large_cap_mem*
on supported systems. It depends on libmemkind's support for persistent memory,
and requirements/details can be found at the following url.
https://pmem.io/2020/01/20/memkind-dax-kmem.html
Differential Revision: https://reviews.llvm.org/D94353
Constant static data member can be defined in the class without another
define after the class in C++17. Although it is C++17, Clang can still handle it
even w/o the flag for C++17. Unluckily, GCC cannot handle that.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D94541
[libomptarget][amdgpu][nfc] Fix build on centos
rtl.cpp replaced 224 with a #define from elf.h, but that
doesn't work on a centos 7 build machine with an old elf.h
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D94528
Some LLVM headers are generated by CMake. Before the installation,
LLVM's headers are distributed everywhere, some of which are in
`${LLVM_SRC_ROOT}/llvm/include/llvm`, and some are in
`${LLVM_BINARY_ROOT}/include/llvm`. After intallation, they're all in
`${LLVM_INSTALLATION_ROOT}/include/llvm`.
OpenMP now depends on LLVM headers. Some headers depend on headers generated
by CMake. When building OpenMP along with LLVM, a.k.a via `LLVM_ENABLE_RUNTIMES`,
we need to tell OpenMP where it can find those headers, especially those still
have not been copied/installed.
Reviewed By: jdoerfert, jhuber6
Differential Revision: https://reviews.llvm.org/D94534
The lifetime of `libomptarget` and its opened plugins are not aligned
and it's hard for `libomptarget` to determine when the plugins are destroyed.
As a result, some issues (see D94256 for details) occur on some platforms.
Actually, if we take target memory as target resources, same as other resources,
such as CUDA streams, in each plugin, then the memory manager should also be in
the plugin. Also considering some platforms may want to opt out the feature, it
makes sense to move the memory manager to plugin, make it a common interface, and
let plguin developers determine whether they need it. This is what this patch does.
CUDA plugin is taken as example to show how to integrate it. In this way, we can
also get a bonus that different thresholds can be set for different platforms.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D94379
For now `elf_common.c` is taken as a common part included into
different plugin implementations directly via
`#include "../../common/elf_common.c"`, which is not a best practice. Since it
is simple enough such that we don't need to create a real library for it, we just
take it as a interface library so that other targets can link it directly. Another
advantage of this method is, we don't need to add the folder into header search
path which can potentially pollute the search path.
VE and AMD platforms have not been tested because I don't have target machines.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D94443
For now, `*_STANDALONE_BUILD` is set to ON even if they're built along
with LLVM because of issues mentioned in the comments. This can cause some issues.
For example, if we build OpenMP along with LLVM, we'd like to copy those OpenMP
headers to `<prefix>/lib/clang/<version>/include` such that `clang` can find
those headers without using `-I <prefix>/include` because those headers will be
copied to `<prefix>/include` if it is built standalone.
In this patch, we fixed the dependence issue in OpenMP such that it can be built
correctly even with `OPENMP_STANDALONE_BUILD=OFF`. The issue is in the call to
`add_lit_testsuite`, where `clang` and `clang-resource-headers` are passed as
`DEPENDS`. Since we're building OpenMP along with LLVM, `clang` is set by CMake
to be the C/C++ compiler, therefore these two dependences are no longer needed,
where caused the dependence issue.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93738
Multiple `RTLInfoTy` objects are stored in a list `AllRTLs`. Since
`RTLInfoTy` contains a `std::mutex`, it is by default not a copyable object.
In order to support `AllRTLs.push_back(...)` which is currently used, a customized
copy constructor is provided. Every time we need to add a new data member into
`RTLInfoTy`, we should keep in mind not forgetting to add corresponding assignment
in the copy constructor. In fact, the only use of the copy constructor is to push
the object into the list, we can of course write it in a way that first emplace
a new object back, and then use the reference to the last element. In this way we
don't need the copy constructor anymore. If the element is invalid, we just need
to pop it, and that's what this patch does.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D94361
Fugaku supercomputer is built with the Fujitsu A64FX microprocessor, whose cache line is 256. In current libomp, we only have cache line size 128 for PPC64 and otherwise 64. This patch added the support of cache line 256 for A64FX. It's worth noting that although A64FX is a variant of AArch64, this property is not shared. As a result, in light of UCX source code (392443ab92/src/ucs/arch/aarch64/cpu.c (L17)), we can only determine by checking whether the CPU is FUJITSU A64FX.
Reviewed By: jdoerfert, Hahnfeld
Differential Revision: https://reviews.llvm.org/D93169
Summary:
Currently error messages from the CUDA plugins are only printed to the user if they have debugging enabled. Change this behaviour to always print the messages that result in offloading failure. This improves the error messages by indidcating what happened when the error occurs in the plugin library, such as a segmentation fault on the device.
Reviewed by: jdoerfert
Differential Revision: https://reviews.llvm.org/D94263
Add an example to the OpenMP Documentation on the LIBOMPTARGET_INFO environment variable
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94246
Wrong LLVM headers might be included if we don't set `include_directories`
to a right place. This will cause a compilation error if LLVM is installed in
system directories.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93737
Currently all built libraries in OpenMP are anywhere if building along
with LLVM. It is not an issue if we don't execute any test. However, almost all
tests for `libomptarget` fails because in the lit configuration, we only set
`<build_dir>/libomptarget` to `LD_LIBRARY_PATH` and `LIBRARY_PATH`. Since those
libraries are everywhere, `clang` can no longer find `libomptarget.so` or those
deviceRTLs anymore.
In this patch, we set a unified path for all built libraries, no matter whether
it is built along with LLVM or not. In this way, our lit configuration can work
propoerly.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93736
Summary:
This patch adds more fine-grained support over which information is output from the libomptarget runtime when run with the environment variable LIBOMPTARGET_INFO set. An extensible set of flags can be used to pick and choose which information the user is interested in.
Reviewers: jdoerfert JonChesterfield grokos
Differential Revision: https://reviews.llvm.org/D93727
[libomptarget][amdgpu] Call into deviceRTL instead of ockl
Amdgpu codegen presently emits a call into ockl. The same functionality
is already present in the deviceRTL. Adds an amdgpu specific entry point
to avoid the dependency. This lets simple openmp code (specifically, that
which doesn't use libm) run without rocm device libraries installed.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D93356
This patch partially prepares the runtime source code to be built with
-Wconversion, which should trigger warnings if any implicit conversions
can possibly change a value. For builds done with icc or gcc, all such
warnings are handled in this patch. clang gives a much longer list of
warnings, particularly for sign conversions, which the other compilers
don't report. The -Wconversion flag is commented into cmake files, but
I'm not going to turn it on. If someone thinks it is important, and wants
to fix all the clang warnings, they are welcome to.
Types of changes made here involve either improving the consistency of types
used so that no conversion is needed, or else performing careful explicit
conversions, when we're sure a problem won't arise.
Patch is a combination of changes by Terry Wilmarth and Johnny Peyton.
Differential Revision: https://reviews.llvm.org/D92942
Add support to the OpenMP web pages for environment variables supported
by Libomptarget and their usage.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93723
When setting `LLVM_ENABLE_RUNTIMES`, lower case word should be used;
otherwise, it can cause a CMake error that specific path is not found.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D93719
After some issues about building runtimes along with LLVM were fixed,
building an OpenMP offloading capable compiler is pretty simple. This patch updates
the FAQ part in the doc.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93671
If a GPU function is externally reachable we give up trying to find the
(unique) kernel it is called from. This can hinder optimizations. Emit a
remark and explain mitigation strategies.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D93439
This patchs adds CMake variables to add subdirectories and include
directories for libomptarget and explicitly gives the location of source
files.
Differential Revision: https://reviews.llvm.org/D93290
[libomptarget][nfc] Replace static const with enum
Semantically identical. Replaces 0xff... with ~0 to spare counting the f.
Has the advantage that the compiler doesn't need to prove the 4/8 byte
value dead before discarding it, and sidesteps the compilation question
associated with what static means for a single source language.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93328
Introduce new kmp_safe_raii_file_t class with RAII semantics for file
open/close. It is essentially a wrapper around the C-style FILE* object.
This also unifies the way we error report if a file can't be opened.
Differential Revision: https://reviews.llvm.org/D92604
This patch enables serial initialization in the forked child process
to fix unstable runtime behavior when used with Python-based AI tools.
Differential Revision: https://reviews.llvm.org/D93230