The OpenMP runtime is not instrumented, so entering the runtime leaves no hint
on the source line of the pragma on ThreadSanitizer's function stack.
This patch adds function entry/exit annotations for OpenMP parallel regions,
and synchronization regions (barrier, taskwait, taskgroup).
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D70408
If the openmp project is built standalone, the test compiler is feature tested for an available -fsanitize=thread flag.
If the openmp project is built as part of llvm, the target tsan is needed to test archer.
An additional line (requires tsan) was introduced to the tests, this patch updates the line numbers for the race.
Follow-up for 77ad98c
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D71914
Summary:
If the dynamically loaded module has been compiled with -fopenmp-targets
and has no target regions, it has empty target descriptor. It leads to a
crash at the runtime if another module has at least one target region
and at least one entry in its descriptor. The runtime library is unable
to load the empty binary descriptor and terminates the execution.
Caused by a clang-offload-wrapper.
Reviewers: grokos, jdoerfert
Subscribers: caomhin, kkwli0, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D72472
Including two tests
These callbacks were added late to the 5.0 specification, an implementation is missing.
Reviewed By: jdoerfert
Differential Review: https://reviews.llvm.org/D70395
Summary:
[libomptarget][nfc] Introduce atomic wrapper function
Wraps atomic functions in a template prefixed __kmpc_atomic that
dispatches to cuda or hip atomic functions. Intended to be easily extended
to dispatch to OpenCL or C++ atomics for a third target.
Reviewers: ABataev, jdoerfert, grokos
Reviewed By: jdoerfert
Subscribers: Anastasia, jvesely, mgrang, dexonsmith, llvm-commits, mgorny, jfb, openmp-commits
Tags: #openmp, #llvm
Differential Revision: https://reviews.llvm.org/D71404
Summary:
[libomptarget][nfc] Extract function from data_sharing, move to common
Finding the first active thread in the warp is different on nvptx and amdgcn,
mostly due to warp size and the desire for efficiency.
Reviewers: ABataev, jdoerfert, grokos
Reviewed By: jdoerfert
Subscribers: jvesely, mgorny, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D71643
Summary:
[libomptarget][nfc] Move three files under common, build them for amdgcn
Change to reduction.cu to remove two dead includes, otherwise no code change.
Reviewers: jdoerfert, ABataev, grokos
Reviewed By: jdoerfert
Subscribers: jvesely, mgorny, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D71601
Summary:
[libomptarget][nfc] Move omp locks under target_impl
These are likely to be target specific, even down to the lock_t which is
correspondingly moved out of interface.h. The alternative is to include
interface.h in target_impl which substantiatially increases the scope of
those symbols.
The current nvptx implementation deadlocks on amdgcn. The preferred
implementation for that arch is still under discussion - this change
leaves declarations in target_impl.
The functions could be inline for nvptx. I'd prefer to keep the internals
hidden in the target_impl translation unit, but will add the (possibly renamed)
macros to target_impl.h if preferred.
Reviewers: ABataev, jdoerfert, grokos
Reviewed By: jdoerfert
Subscribers: jvesely, mgorny, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D71574
Summary:
[libomptarget][nfc] Wrap cuda min() in target_impl
nvptx forwards to cuda min, amdgcn implements directly.
Sufficient to build parallel.cu for amdgcn, added to CMakeLists.
All call sites are homogenous except one that passes a uint32_t and an
int32_t. This could be smoothed over by taking two type parameters
and some care over the return type, but overall I think the inline
<uint32_t> calling attention to what was an implicit sign conversion
is cleaner.
Reviewers: ABataev, jdoerfert
Reviewed By: jdoerfert
Subscribers: jvesely, mgorny, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D71580
Summary:
This reverts commit dd8a7fcdd7.
Alexey reports undefined symbols for the new inline functions defined in target_impl.h
This does not reproduce for me for nvptx, or amdgcn, under release or debug builds.
I believe the patch is fine, based on:
- the semantics of an inline function in C++ (the cuda INLINE functions end
up as linkonce_odr in IR), which are only legal to drop if they have no uses
- the code generated from a debug build of clang 9 does not show these undef symbols
- the tests pass
- the code is trivial
To progress from here I either need:
- A tie break - someone to play the role of CI in determining whether the patch works
- Alexey to provide sufficient information about his build for me to reproduce the failure
- Alexey to debug why the symbols are disappearing for him and report back
Reviewers: ABataev, jdoerfert, grokos
Subscribers: jvesely, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D71502
Summary:
[libomptarget] Build most of common/src for amdgcn
Excluding parallel.cu, which uses an integer min() from cuda,
Excluding support.cu, which calls malloc that is not yet available for amdgcn
Reviewers: jdoerfert, ABataev, grokos
Reviewed By: jdoerfert
Subscribers: gregrodgers, ronlieb, jvesely, mgorny, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D71446
Summary:
[libomptarget][nfc] Add declarations of atomic functions for amdgcn
This enables building more source for amdgcn. The functions are usually available
in a hip runtime header, but are duplicated here to decouple the implementation
Reviewers: jdoerfert, ABataev, grokos
Reviewed By: jdoerfert
Subscribers: jvesely, mgorny, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D71412
Summary:
[libomptarget][nfc] Move cuda threadfence functions behind kmpc_impl
Part of building code under common/ without requiring a cuda compiler
Reviewers: ABataev, jdoerfert, grokos
Reviewed By: ABataev
Subscribers: jvesely, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D71102
Summary:
[libomptarget][nfc] Move omptarget-nvptx under common
Almost all files depend on require omptarget-nvptx, which no longer
contains any obviously architecture dependent code. Moving it under
common unblocks task/loop for amdgcn, and allows moving other code.
At some point there should probably be a widespread symbol renaming to
replace the nvptx string. I'd prefer to get things working first.
Building this (and task.cu, loop.cu) without a cuda library requires
some more refactoring, e.g. wrap threadfence(), use DEVICE macro more
consistently. Patches for that are orthogonal and will be posted shortly.
Reviewers: jdoerfert, ABataev, grokos
Reviewed By: ABataev
Subscribers: mgorny, fedor.sergeev, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D71073
Summary:
[libomptarget] Build a minimal deviceRTL for amdgcn
Repeat of D70414, with an include path fixed. Diff for sanity checking.
The CMakeLists.txt file is functionally identical to the one used in the aomp fork.
Whitespace changes were made based on nvptx/CMakeLists.txt, plus the
copyright notice updated to match (Greg was the original author so would
like his sign off on that here).
This change will build a small subset of the deviceRTL if an appropriate toolchain is
available, e.g. a local install of rocm. Support.h is moved from nvptx as a dependency
of debug.h.
Reviewers: ABataev, jdoerfert
Reviewed By: ABataev
Subscribers: jvesely, mgorny, jfb, openmp-commits, jdoerfert
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D70971
Summary:
[libomptarget] Build a minimal deviceRTL for amdgcn
The CMakeLists.txt file is functionally identical to the one used in the aomp fork.
Whitespace changes were made based on nvptx/CMakeLists.txt, plus the
copyright notice updated to match (Greg was the original author so would
like his sign off on that here).
This change will build a small subset of the deviceRTL if an appropriate toolchain is
available, e.g. a local install of rocm. Support.h is moved from nvptx as a dependency
of debug.h.
Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers
Reviewed By: jdoerfert
Subscribers: jfb, Hahnfeld, jvesely, mgorny, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D70414
Summary:
"make check-all" or "make check-libomptarget" would attempt to run offloading
tests before the offload plugins are built. This patch corrects that by adding
dependencies to the libomptarget CMake rules.
Reviewers: jdoerfert
Subscribers: mgorny, guansong, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D70803
Summary:
The termination function duplicated the functionality of the
__attribute((destructor))-annotated function __kmp_internal_end_fini,
and we have no indication that this doesn't work.
The function might cause issues with link-time optimization turned on:
until very recently, none of the usual linkers was reporting functions
named in -Wl,-fini as used to the LTO plugin, so it might be dropped.
If the function is dropped, -Wl,-fini=__kmp_internal_end_fini doesn't
do what we want: with ld.bfd and lld it drops the FINI attribute from
.dynamic and with gold we get FINI = 0x0, which leads to a crash on
cleanup. This can be reproduced by building with
-DLLVM_ENABLE_PROJECTS="clang;openmp" \
-DLLVM_ENABLE_LTO=Thin \
-DLLVM_USE_LINKER=gold
The issue in lld has been fixed in f95273f75a, but gold remains without
fix so far.
Fixes PR43927.
Reviewers: JonChesterfield, jdoerfert, AndreyChurbanov
Reviewed By: AndreyChurbanov
Differential Revision: https://reviews.llvm.org/D69927
Summary:
[libomptarget][nfc] Move some source into common from nvptx
Moves some source that compiles cleanly under amdgcn into a common subdirectory
Includes some non-trivial files and some headers. Keeps the cuda file extension.
The build systems for different architectures seem unlikely to have much in
common. The idea is therefore to set include paths such that files under
common/src compile as if they were under arch/src as the mechanism for sharing.
In particular, files under common/src need to be able to include target_impl.h.
The corresponding -Icommon is left out in favour of explicit includes on the
basis that the it makes it clearer which files under common are used by a given
architecture.
Reviewers: jdoerfert, ABataev, grokos
Reviewed By: ABataev
Subscribers: jfb, mgorny, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D70328
The tool provides TSAN annotations for OpenMP synchronization. The tool
is activated if no other OMPT tool is loaded.
The tool detects whether the application was built with TSan and rejects
activation according to the OMPT protocol if there is no TSan-rt.
Differential Revision: https://reviews.llvm.org/D45890
Summary:
[libomptarget][nfc] Use cuda variable wrappers from support.h
Reimplementation of D69693, after the revert of D69885
Use the wrappers in support.h for cuda builtin variables at all call sites.
Localises use of cuda and removes WARPSIZE==32 assumption in debug.h.
Reviewers: ABataev, jdoerfert, grokos
Reviewed By: jdoerfert
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D70186
Summary:
[libomptarget] Move supporti.h to support.cu
Reimplementation of D69652, without the unity build and refactors.
Will need a clean build of libomptarget as the cmakelists changed.
Reviewers: ABataev, jdoerfert
Reviewed By: jdoerfert
Subscribers: mgorny, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D70131