Commit Graph

1492 Commits

Author SHA1 Message Date
Johannes Doerfert 8c7fdc4c61 [OpenMP] Add source location information to the libomptarget profile
In much of the libomptarget interface we have an ident_t object now, if
it is not null we can use it to improve the profile output. For now, we
simply use the ident_t "source information string" as generated by the
FE.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D95282
2021-01-25 22:43:43 -06:00
Shilei Tian 9d64275ae0 [OpenMP] Added the support for hidden helper task in RTL
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks.  We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.

Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.

Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D77609
2021-01-25 22:16:17 -05:00
Jon Chesterfield 357eea6e8b Revert "[libomptarget][cuda] Gracefully handle missing cuda library"
This reverts commit fafd45c01f.
2021-01-26 03:14:53 +00:00
Jon Chesterfield fafd45c01f [libomptarget][cuda] Gracefully handle missing cuda library
[libomptarget][cuda] Gracefully handle missing cuda library

If using dynamic cuda, and it failed to load, it is not safe to call
cuGetErrorString.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95412
2021-01-26 02:54:00 +00:00
Shilei Tian 3333244d77 [OpenMP][deviceRTLs] Remove omp_is_initial_device
`omp_is_initial_device` in device code was implemented as a builtin
function in D38968 for a better performance. Therefore there is no chance that
this function will be called to `deviceRTLs`. As we're moving to build `deviceRTLs`
with OpenMP compiler, this function can lead to a compilation error. This patch
just simply removes it.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95397
2021-01-25 18:34:23 -05:00
Shilei Tian 27cc4a8138 [OpenMP][NVPTX] Rewrite CUDA intrinsics with NVVM intrinsics
This patch makes prep for dropping CUDA when compiling `deviceRTLs`.
CUDA intrinsics are replaced by NVVM intrinsics which refers to code in
`__clang_cuda_intrinsics.h`. We don't want to directly include it because in the
near future we're going to switch to OpenMP and by then the header cannot be
used anymore.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95327
2021-01-25 14:14:30 -05:00
Joseph Huber 93eef7d8e9 [OpenMP][NFC] Fix SourceInfo.h variable names
Summary:
Fix the names to use Pascal case to comply with the LLVM coding guidelines. `ident_t` is required for compatibility with the rest of libomp.
2021-01-25 12:43:34 -05:00
Jon Chesterfield 95f0d1edaf [libomptarget] Compile with older cuda, revert D95274
[libomptarget] Compile with older cuda, revert D95274

Fixes regression reported in comments of D95274.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95367
2021-01-25 16:12:56 +00:00
Jon Chesterfield e5e448aafa [libomptarget][cuda] Fix build, change missed from D95274 2021-01-24 18:30:04 +00:00
Shilei Tian cfd978d5d3 [OpenMP] Fixed test environment of `check-libomptarget-nvptx`
D95161 removed the option `--libomptarget-nvptx-path`, which is used in
the tests for `libomptarget-nvptx`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95293
2021-01-24 13:18:33 -05:00
Jon Chesterfield c3074d48d3 [libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics
[libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics

Tested by diff of IR generated for target_impl.cu before and after. NFC. Part
of removing deviceRTL build time dependency on cuda SDK.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D95294
2021-01-24 10:59:15 +00:00
Jon Chesterfield dc70c56be5 [libomptarget][amdgpu][nfc] Update comments
[libomptarget][amdgpu][nfc] Update comments

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95295
2021-01-23 22:53:58 +00:00
Jon Chesterfield 78b0630b72 [libomptarget][cuda] Call v2 functions explicitly
[libomptarget][cuda] Call v2 functions explicitly

rtl.cpp calls functions like cuMemFree that are replaced by a macro
in cuda.h with cuMemFree_v2. This patch changes the source to use
the v2 names consistently.

See also D95104, D95155 for the idea. Alternatives are to use a mixture,
e.g. call the macro names and explictly dlopen the _v2 names, or to keep
the current status where the symbols are replaced by macros in both files

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95274
2021-01-23 20:33:13 +00:00
Hansang Bae 480cbed31e [OpenMP] Remove unnecessary pointer checks in a few locations
Also, return NULL from unsuccessful OMPT function lookup.

Differential Revision: https://reviews.llvm.org/D95277
2021-01-22 19:18:50 -06:00
Jon Chesterfield 47e95e87a3 [libomptarget] Build cuda plugin without cuda installed locally
[libomptarget] Build cuda plugin without cuda installed locally

Compiles a new file, `plugins/cuda/dynamic_cuda/cuda.cpp`, to an object file that exposes the same symbols that the plugin presently uses from libcuda. The object file contains dlopen of libcuda and cached dlsym calls. Also provides a cuda.h containing the subset that is used.

This lets the cmake file choose between the system cuda and a dlopen shim, with no changes to rtl.cpp.

The corresponding change to amdgpu is postponed until after a refactor of the plugin to reduce the size of the hsa.h stub required

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95155
2021-01-23 00:15:04 +00:00
Joseph Schuchart edbcc17b7a [OpenMP] libomp: properly initialize buckets in __kmp_dephash_extend
The buckets are initialized in __kmp_dephash_create but when they are extended
the memory is allocated but not NULL'd, potentially leaving some buckets
uninitialized after all entries have been copied into the new allocation.
This commit makes sure the buckets are properly initialized with NULL before
copying the entries.

Differential Revision: https://reviews.llvm.org/D95167
2021-01-22 20:29:46 +03:00
Jon Chesterfield 9b19ecb8f1 [libomptarget][devicertl] Drop templated atomic functions
[libomptarget][devicertl] Drop templated atomic functions

The five __kmpc_atomic templates are instantiated a total of seven times.
This change replaces the template with explictly typed functions, which
have the same prototype for amdgcn and nvptx, and implements them with
the same code presently in use.

Rolls in the accepted but not yet landed D95085.

The unsigned long long type can be replaced with uint64_t when replacing
the cuda function. Until then, clang warns on casting a pointer to one to
a pointer to the other.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D95093
2021-01-22 14:48:22 +00:00
Joseph Huber 119a9ea13f [OpenMP] Fix failing test due to change in offloading flags
Summary:
Prior to D91261 the information checked the OMP_MAP_TARGET_PARAM flag, change this as it has been removed. The INFO macro was changed to accept a flag as input to make conditionally printing information easier.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95133
2021-01-21 14:09:36 -05:00
Giorgis Georgakoudis 6b7645dd31 [OpenMP] Add time profiling support in libomp
Profiling has been recently implemented in libomptarget (D93055). This patch enables time profiling support for libomptarget in libomp, to support profiling of multi-threaded execution of offloaded regions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94855
2021-01-21 09:15:14 -08:00
Shilei Tian 48c54f0f62 [OpenMP][NVPTX] Added forward declaration for atomic operations
Pretty similar to D95058, this patch added forward declaration for
CUDA atomic functions. We already have definitions with right mangled names in
internal CUDA headers so the forward declaration here can work properly.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D95085
2021-01-21 10:37:16 -05:00
Joseph Huber e4eaf9d820 [OpenMP] Add support for mapping names in mapper API
Summary:
The custom mapper API did not previously support the mapping names added previously. This means they were not present if a user requested debugging information while using the mapper functions. This adds basic support for passing the mapped names to the runtime library.

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D94806
2021-01-21 09:26:44 -05:00
Shilei Tian 33a5d212c6 [OpenMP][NVPTX] Added forward declaration to pave the way for building deviceRTLs with OpenMP
Once we switch to build deviceRTLs with OpenMP, primitives and CUDA
intrinsics cannot be used directly anymore because `__device__` is not recognized
by OpenMP compiler. To avoid involving all CUDA internal headers we had in `clang`,
we forward declared these functions. Eventually they will be transformed into
right LLVM instrinsics.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95058
2021-01-20 15:56:02 -05:00
Jon Chesterfield fbc1dcb946 [libomptarget][devicertl][nfc] Simplify target_atomic abstraction
[libomptarget][devicertl][nfc] Simplify target_atomic abstraction

Atomic functions were implemented as a shim around cuda's atomics, with
amdgcn implementing those symbols as a shim around gcc style intrinsics.

This patch folds target_atomic.h into target_impl.h and folds amdgcn.

Further work is likely to be useful here, either changing to openmp's atomic
interface or instantiating the templates on the few used types in order to
move them into a cuda/c++ implementation file. This change is mostly to
group the remaining uses of the cuda api under nvptx' target_impl abstraction.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95062
2021-01-20 19:50:50 +00:00
Jon Chesterfield ea616f9026 [libomptarget][devicertl][nfc] Remove some cuda intrinsics, simplify
[libomptarget][devicertl][nfc] Remove some cuda intrinsics, simplify

Replace __popc, __ffs with clang intrinsics. Move kmpc_impl_min to only file
that uses it and replace template with explictly typed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95060
2021-01-20 19:45:05 +00:00
Shilei Tian fd70f70d1e [OpenMP][NVPTX] Replaced CUDA builtin vars with LLVM intrinsics
Replaced CUDA builtin vars with LLVM intrinsics such that we don't need
definitions of those intrinsics.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95013
2021-01-20 12:02:06 -05:00
Jon Chesterfield e069662deb [libomptarget][devicertl] Wrap source in declare target pragmas
[libomptarget][devicertl] Wrap source in declare target pragmas

Factored out of D93135 / D94745. C++ and cuda ignore unknown pragmas
so this is a NFC for the current implementation language. Removes noise
from patches for building deviceRTL as openmp.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D95048
2021-01-20 15:50:41 +00:00
Hansang Bae 2d911f7c72 [OpenMP] Fix atomic entries for captured logical operation
Added missing code for the captured atomic operation.

Differential Revision: https://reviews.llvm.org/D94848
2021-01-19 09:59:28 -06:00
AndreyChurbanov a60bc55c69 [OpenMP] libomp: cleanup parsing of OMP_ALLOCATOR env variable.
Differential Revision: https://reviews.llvm.org/D94932
2021-01-19 16:21:22 +03:00
Kelvin Li 9d81073acb [OpenMP][Docs] Fix typos in FAQ (NFC) 2021-01-18 18:55:58 -05:00
AndreyChurbanov aa3a59e0c6 [OpenMP][NFC] Fix test
The test fails if memkind library is accessible.
2021-01-19 00:05:34 +03:00
Shilei Tian 9bf843bdc8 Revert "[OpenMP] Added the support for hidden helper task in RTL"
This reverts commit ed939f853d.
2021-01-18 06:57:52 -05:00
Chandler Carruth f855751c12 Fix openmp CMake build on non-Linux AArch64 systems.
This just checks for `/proc/cpuinfo` existing before reading it.

Tested on an ARM macOS machine.
2021-01-17 16:18:31 -08:00
Shilei Tian ed939f853d [OpenMP] Added the support for hidden helper task in RTL
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks.  We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.

Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.

Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D77609
2021-01-16 14:13:35 -05:00
Jon Chesterfield 214387c2c6 [libomptarget][nvptx] Reduce calls to cuda header
[libomptarget][nvptx] Reduce calls to cuda header

Remove use of clock_t in favour of a builtin. Drop a preprocessor branch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94731
2021-01-15 02:16:33 +00:00
Jon Chesterfield 6e7094c14b [libomptarget][nvptx][nfc] Move target_impl functions out of header
[libomptarget][nvptx][nfc] Move target_impl functions out of header

This removes most of the differences between the two target_impl.h.

Also change name mangling from C to C++ for __kmpc_impl_*_lock.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D94728
2021-01-15 00:19:48 +00:00
Shilei Tian 547b032ccc [OpenMP] Remove omptarget-nvptx from deps as it is no longer a valid target
`omptarget-nvptx` is still a dependence for `check-libomptarget-nvtpx`
although it has been removed by D94573.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94725
2021-01-14 19:16:11 -05:00
Shilei Tian 64e9e9aeee [OpenMP] Dropped unnecessary define when compiling deviceRTLs for NVPTX
The comment said CUDA 9 header files use the `nv_weak` attribute which
`clang` is not yet prepared to handle. It's three years ago and now things have
changed. Based on my test, removing the definition doesn't have any problem on
my machine with CUDA 11.1 installed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94700
2021-01-14 13:55:12 -05:00
Shilei Tian 763c1f9933 [OpenMP] Drop the static library libomptarget-nvptx
For NVPTX target, OpenMP provides a static library `libomptarget-nvptx`
built by NVCC, and another bitcode `libomptarget-nvptx-sm_{$sm}.bc` generated by
Clang. When compiling an OpenMP program, the `.bc` file will be fed to `clang`
in the second run on the program that compiles the target part. Then the generated
PTX file will be fed to `ptxas` to generate the object file, and finally the driver
invokes `nvlink` to generate the binary, where the static library will be appened
to `nvlink`.

One question is, why do we need two libraries? The only difference is, the static
library contains `omp_data.cu` and the bitcode library doesn't. It's unclear why
they were implemented in this way, but per D94565, there is no issue if we also
include the file into the bitcode library. Therefore, we can safely drop the
static library.

This patch is about the change in OpenMP. The driver will be updated as well if
this patch is accepted.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D94573
2021-01-14 13:34:25 -05:00
Jon Chesterfield 5d165f0b89 [libomptarget][amdgpu] Fix kernel launch tracing to match previous behavior
Restore control of kernel launch tracing to be >= 1 as it was before

export LIBOMPTARGET_KERNEL_TRACE=1

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94695
2021-01-14 18:13:22 +00:00
Terry Wilmarth 4fe17ada55 [OpenMP] Fix hierarchical barrier
Hierarchical barrier is an experimental barrier algorithm that uses aspects
of machine hierarchy to define the barrier tree structure. This patch fixes
offset calculation in hierarchical barrier. The offset is used to store info
on a flag about sleeping threads waiting on a location stored in the flag.
This commit also fixes a potential deadlock in hierarchical barrier when
using infinite blocktime by adjusting the offset value of leaf kids so that
it matches the value of leaf state. It also adds testing of default barriers
with infinite blocktime, and also tests hierarchical barrier algorithm with
both default and infinite blocktime.

Patch by Terry Wilmarth and Nawrin Sultana.

Differential Revision: https://reviews.llvm.org/D94241
2021-01-13 10:22:57 -06:00
Joseph Huber a957634942 [OpenMP] Add documentation for error messages and release notes
Add extra information to the runtime page describing the error messages and add information to the release notes for clang 12.0

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94562
2021-01-13 11:00:41 -05:00
Jon Chesterfield 84e0b14a0a [libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL
[libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D94565
2021-01-13 03:51:11 +00:00
Hansang Bae bba3a82b56 [OpenMP] Use persistent memory for omp_large_cap_mem
This change enables volatile use of persistent memory for omp_large_cap_mem*
on supported systems. It depends on libmemkind's support for persistent memory,
and requirements/details can be found at the following url.

https://pmem.io/2020/01/20/memkind-dax-kmem.html

Differential Revision: https://reviews.llvm.org/D94353
2021-01-12 20:35:27 -06:00
Hansang Bae 6f0f022038 [OpenMP] Update allocator trait key/value definitions
Use new definitions introduced in 5.1 specification.

Differential Revision: https://reviews.llvm.org/D94277
2021-01-12 20:09:45 -06:00
Shilei Tian 01f1273fe2 [OpenMP] Fixed a typo in openmp/CMakeLists.txt 2021-01-12 17:00:49 -05:00
Shilei Tian 68ff52ffea [OpenMP] Fixed the link error that cannot find static data member
Constant static data member can be defined in the class without another
define after the class in C++17. Although it is C++17, Clang can still handle it
even w/o the flag for C++17. Unluckily, GCC cannot handle that.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D94541
2021-01-12 16:48:28 -05:00
Jon Chesterfield 33e2494bea [libomptarget][amdgpu][nfc] Fix build on centos
[libomptarget][amdgpu][nfc] Fix build on centos

rtl.cpp replaced 224 with a #define from elf.h, but that
doesn't work on a centos 7 build machine with an old elf.h

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D94528
2021-01-12 19:40:03 +00:00
Shilei Tian bdd1ad5e5c [OpenMP] Fixed include directories for OpenMP when building OpenMP with LLVM_ENABLE_RUNTIMES
Some LLVM headers are generated by CMake. Before the installation,
LLVM's headers are distributed everywhere, some of which are in
`${LLVM_SRC_ROOT}/llvm/include/llvm`, and some are in
`${LLVM_BINARY_ROOT}/include/llvm`. After intallation, they're all in
`${LLVM_INSTALLATION_ROOT}/include/llvm`.

OpenMP now depends on LLVM headers. Some headers depend on headers generated
by CMake. When building OpenMP along with LLVM, a.k.a via `LLVM_ENABLE_RUNTIMES`,
we need to tell OpenMP where it can find those headers, especially those still
have not been copied/installed.

Reviewed By: jdoerfert, jhuber6

Differential Revision: https://reviews.llvm.org/D94534
2021-01-12 14:32:38 -05:00
Shilei Tian 0871d6d516 [OpenMP] Move memory manager to plugin and make it a common interface
The lifetime of `libomptarget` and its opened plugins are not aligned
and it's hard for `libomptarget` to determine when the plugins are destroyed.
As a result, some issues (see D94256 for details) occur on some platforms.
Actually, if we take target memory as target resources, same as other resources,
such as CUDA streams, in each plugin, then the memory manager should also be in
the plugin. Also considering some platforms may want to opt out the feature, it
makes sense to move the memory manager to plugin, make it a common interface, and
let plguin developers determine whether they need it. This is what this patch does.
CUDA plugin is taken as example to show how to integrate it. In this way, we can
also get a bonus that different thresholds can be set for different platforms.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D94379
2021-01-11 21:33:42 -05:00
Shilei Tian a81c68ae6b [OpenMP] Take elf_common.c as a interface library
For now `elf_common.c` is taken as a common part included into
different plugin implementations directly via
`#include "../../common/elf_common.c"`, which is not a best practice. Since it
is simple enough such that we don't need to create a real library for it, we just
take it as a interface library so that other targets can link it directly. Another
advantage of this method is, we don't need to add the folder into header search
path which can potentially pollute the search path.

VE and AMD platforms have not been tested because I don't have target machines.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94443
2021-01-11 17:34:26 -05:00