The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks. We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.
Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.
Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D77609
[libomptarget][cuda] Gracefully handle missing cuda library
If using dynamic cuda, and it failed to load, it is not safe to call
cuGetErrorString.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95412
`omp_is_initial_device` in device code was implemented as a builtin
function in D38968 for a better performance. Therefore there is no chance that
this function will be called to `deviceRTLs`. As we're moving to build `deviceRTLs`
with OpenMP compiler, this function can lead to a compilation error. This patch
just simply removes it.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D95397
This patch makes prep for dropping CUDA when compiling `deviceRTLs`.
CUDA intrinsics are replaced by NVVM intrinsics which refers to code in
`__clang_cuda_intrinsics.h`. We don't want to directly include it because in the
near future we're going to switch to OpenMP and by then the header cannot be
used anymore.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D95327
D95161 removed the option `--libomptarget-nvptx-path`, which is used in
the tests for `libomptarget-nvptx`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95293
[libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics
Tested by diff of IR generated for target_impl.cu before and after. NFC. Part
of removing deviceRTL build time dependency on cuda SDK.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D95294
[libomptarget][cuda] Call v2 functions explicitly
rtl.cpp calls functions like cuMemFree that are replaced by a macro
in cuda.h with cuMemFree_v2. This patch changes the source to use
the v2 names consistently.
See also D95104, D95155 for the idea. Alternatives are to use a mixture,
e.g. call the macro names and explictly dlopen the _v2 names, or to keep
the current status where the symbols are replaced by macros in both files
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95274
[libomptarget] Build cuda plugin without cuda installed locally
Compiles a new file, `plugins/cuda/dynamic_cuda/cuda.cpp`, to an object file that exposes the same symbols that the plugin presently uses from libcuda. The object file contains dlopen of libcuda and cached dlsym calls. Also provides a cuda.h containing the subset that is used.
This lets the cmake file choose between the system cuda and a dlopen shim, with no changes to rtl.cpp.
The corresponding change to amdgpu is postponed until after a refactor of the plugin to reduce the size of the hsa.h stub required
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95155
The buckets are initialized in __kmp_dephash_create but when they are extended
the memory is allocated but not NULL'd, potentially leaving some buckets
uninitialized after all entries have been copied into the new allocation.
This commit makes sure the buckets are properly initialized with NULL before
copying the entries.
Differential Revision: https://reviews.llvm.org/D95167
[libomptarget][devicertl] Drop templated atomic functions
The five __kmpc_atomic templates are instantiated a total of seven times.
This change replaces the template with explictly typed functions, which
have the same prototype for amdgcn and nvptx, and implements them with
the same code presently in use.
Rolls in the accepted but not yet landed D95085.
The unsigned long long type can be replaced with uint64_t when replacing
the cuda function. Until then, clang warns on casting a pointer to one to
a pointer to the other.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D95093
Summary:
Prior to D91261 the information checked the OMP_MAP_TARGET_PARAM flag, change this as it has been removed. The INFO macro was changed to accept a flag as input to make conditionally printing information easier.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95133
Profiling has been recently implemented in libomptarget (D93055). This patch enables time profiling support for libomptarget in libomp, to support profiling of multi-threaded execution of offloaded regions.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94855
Pretty similar to D95058, this patch added forward declaration for
CUDA atomic functions. We already have definitions with right mangled names in
internal CUDA headers so the forward declaration here can work properly.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D95085
Summary:
The custom mapper API did not previously support the mapping names added previously. This means they were not present if a user requested debugging information while using the mapper functions. This adds basic support for passing the mapped names to the runtime library.
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D94806
Once we switch to build deviceRTLs with OpenMP, primitives and CUDA
intrinsics cannot be used directly anymore because `__device__` is not recognized
by OpenMP compiler. To avoid involving all CUDA internal headers we had in `clang`,
we forward declared these functions. Eventually they will be transformed into
right LLVM instrinsics.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D95058
[libomptarget][devicertl][nfc] Simplify target_atomic abstraction
Atomic functions were implemented as a shim around cuda's atomics, with
amdgcn implementing those symbols as a shim around gcc style intrinsics.
This patch folds target_atomic.h into target_impl.h and folds amdgcn.
Further work is likely to be useful here, either changing to openmp's atomic
interface or instantiating the templates on the few used types in order to
move them into a cuda/c++ implementation file. This change is mostly to
group the remaining uses of the cuda api under nvptx' target_impl abstraction.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95062
[libomptarget][devicertl][nfc] Remove some cuda intrinsics, simplify
Replace __popc, __ffs with clang intrinsics. Move kmpc_impl_min to only file
that uses it and replace template with explictly typed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95060
Replaced CUDA builtin vars with LLVM intrinsics such that we don't need
definitions of those intrinsics.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D95013
[libomptarget][devicertl] Wrap source in declare target pragmas
Factored out of D93135 / D94745. C++ and cuda ignore unknown pragmas
so this is a NFC for the current implementation language. Removes noise
from patches for building deviceRTL as openmp.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D95048
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks. We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.
Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.
Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D77609
[libomptarget][nvptx] Reduce calls to cuda header
Remove use of clock_t in favour of a builtin. Drop a preprocessor branch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94731
[libomptarget][nvptx][nfc] Move target_impl functions out of header
This removes most of the differences between the two target_impl.h.
Also change name mangling from C to C++ for __kmpc_impl_*_lock.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D94728
`omptarget-nvptx` is still a dependence for `check-libomptarget-nvtpx`
although it has been removed by D94573.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D94725
The comment said CUDA 9 header files use the `nv_weak` attribute which
`clang` is not yet prepared to handle. It's three years ago and now things have
changed. Based on my test, removing the definition doesn't have any problem on
my machine with CUDA 11.1 installed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94700
For NVPTX target, OpenMP provides a static library `libomptarget-nvptx`
built by NVCC, and another bitcode `libomptarget-nvptx-sm_{$sm}.bc` generated by
Clang. When compiling an OpenMP program, the `.bc` file will be fed to `clang`
in the second run on the program that compiles the target part. Then the generated
PTX file will be fed to `ptxas` to generate the object file, and finally the driver
invokes `nvlink` to generate the binary, where the static library will be appened
to `nvlink`.
One question is, why do we need two libraries? The only difference is, the static
library contains `omp_data.cu` and the bitcode library doesn't. It's unclear why
they were implemented in this way, but per D94565, there is no issue if we also
include the file into the bitcode library. Therefore, we can safely drop the
static library.
This patch is about the change in OpenMP. The driver will be updated as well if
this patch is accepted.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D94573
Restore control of kernel launch tracing to be >= 1 as it was before
export LIBOMPTARGET_KERNEL_TRACE=1
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D94695
Hierarchical barrier is an experimental barrier algorithm that uses aspects
of machine hierarchy to define the barrier tree structure. This patch fixes
offset calculation in hierarchical barrier. The offset is used to store info
on a flag about sleeping threads waiting on a location stored in the flag.
This commit also fixes a potential deadlock in hierarchical barrier when
using infinite blocktime by adjusting the offset value of leaf kids so that
it matches the value of leaf state. It also adds testing of default barriers
with infinite blocktime, and also tests hierarchical barrier algorithm with
both default and infinite blocktime.
Patch by Terry Wilmarth and Nawrin Sultana.
Differential Revision: https://reviews.llvm.org/D94241
Add extra information to the runtime page describing the error messages and add information to the release notes for clang 12.0
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94562
This change enables volatile use of persistent memory for omp_large_cap_mem*
on supported systems. It depends on libmemkind's support for persistent memory,
and requirements/details can be found at the following url.
https://pmem.io/2020/01/20/memkind-dax-kmem.html
Differential Revision: https://reviews.llvm.org/D94353
Constant static data member can be defined in the class without another
define after the class in C++17. Although it is C++17, Clang can still handle it
even w/o the flag for C++17. Unluckily, GCC cannot handle that.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D94541
[libomptarget][amdgpu][nfc] Fix build on centos
rtl.cpp replaced 224 with a #define from elf.h, but that
doesn't work on a centos 7 build machine with an old elf.h
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D94528
Some LLVM headers are generated by CMake. Before the installation,
LLVM's headers are distributed everywhere, some of which are in
`${LLVM_SRC_ROOT}/llvm/include/llvm`, and some are in
`${LLVM_BINARY_ROOT}/include/llvm`. After intallation, they're all in
`${LLVM_INSTALLATION_ROOT}/include/llvm`.
OpenMP now depends on LLVM headers. Some headers depend on headers generated
by CMake. When building OpenMP along with LLVM, a.k.a via `LLVM_ENABLE_RUNTIMES`,
we need to tell OpenMP where it can find those headers, especially those still
have not been copied/installed.
Reviewed By: jdoerfert, jhuber6
Differential Revision: https://reviews.llvm.org/D94534
The lifetime of `libomptarget` and its opened plugins are not aligned
and it's hard for `libomptarget` to determine when the plugins are destroyed.
As a result, some issues (see D94256 for details) occur on some platforms.
Actually, if we take target memory as target resources, same as other resources,
such as CUDA streams, in each plugin, then the memory manager should also be in
the plugin. Also considering some platforms may want to opt out the feature, it
makes sense to move the memory manager to plugin, make it a common interface, and
let plguin developers determine whether they need it. This is what this patch does.
CUDA plugin is taken as example to show how to integrate it. In this way, we can
also get a bonus that different thresholds can be set for different platforms.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D94379
For now `elf_common.c` is taken as a common part included into
different plugin implementations directly via
`#include "../../common/elf_common.c"`, which is not a best practice. Since it
is simple enough such that we don't need to create a real library for it, we just
take it as a interface library so that other targets can link it directly. Another
advantage of this method is, we don't need to add the folder into header search
path which can potentially pollute the search path.
VE and AMD platforms have not been tested because I don't have target machines.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D94443
For now, `*_STANDALONE_BUILD` is set to ON even if they're built along
with LLVM because of issues mentioned in the comments. This can cause some issues.
For example, if we build OpenMP along with LLVM, we'd like to copy those OpenMP
headers to `<prefix>/lib/clang/<version>/include` such that `clang` can find
those headers without using `-I <prefix>/include` because those headers will be
copied to `<prefix>/include` if it is built standalone.
In this patch, we fixed the dependence issue in OpenMP such that it can be built
correctly even with `OPENMP_STANDALONE_BUILD=OFF`. The issue is in the call to
`add_lit_testsuite`, where `clang` and `clang-resource-headers` are passed as
`DEPENDS`. Since we're building OpenMP along with LLVM, `clang` is set by CMake
to be the C/C++ compiler, therefore these two dependences are no longer needed,
where caused the dependence issue.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93738
Multiple `RTLInfoTy` objects are stored in a list `AllRTLs`. Since
`RTLInfoTy` contains a `std::mutex`, it is by default not a copyable object.
In order to support `AllRTLs.push_back(...)` which is currently used, a customized
copy constructor is provided. Every time we need to add a new data member into
`RTLInfoTy`, we should keep in mind not forgetting to add corresponding assignment
in the copy constructor. In fact, the only use of the copy constructor is to push
the object into the list, we can of course write it in a way that first emplace
a new object back, and then use the reference to the last element. In this way we
don't need the copy constructor anymore. If the element is invalid, we just need
to pop it, and that's what this patch does.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D94361
Fugaku supercomputer is built with the Fujitsu A64FX microprocessor, whose cache line is 256. In current libomp, we only have cache line size 128 for PPC64 and otherwise 64. This patch added the support of cache line 256 for A64FX. It's worth noting that although A64FX is a variant of AArch64, this property is not shared. As a result, in light of UCX source code (392443ab92/src/ucs/arch/aarch64/cpu.c (L17)), we can only determine by checking whether the CPU is FUJITSU A64FX.
Reviewed By: jdoerfert, Hahnfeld
Differential Revision: https://reviews.llvm.org/D93169
Summary:
Currently error messages from the CUDA plugins are only printed to the user if they have debugging enabled. Change this behaviour to always print the messages that result in offloading failure. This improves the error messages by indidcating what happened when the error occurs in the plugin library, such as a segmentation fault on the device.
Reviewed by: jdoerfert
Differential Revision: https://reviews.llvm.org/D94263
Add an example to the OpenMP Documentation on the LIBOMPTARGET_INFO environment variable
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94246
Wrong LLVM headers might be included if we don't set `include_directories`
to a right place. This will cause a compilation error if LLVM is installed in
system directories.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93737
Currently all built libraries in OpenMP are anywhere if building along
with LLVM. It is not an issue if we don't execute any test. However, almost all
tests for `libomptarget` fails because in the lit configuration, we only set
`<build_dir>/libomptarget` to `LD_LIBRARY_PATH` and `LIBRARY_PATH`. Since those
libraries are everywhere, `clang` can no longer find `libomptarget.so` or those
deviceRTLs anymore.
In this patch, we set a unified path for all built libraries, no matter whether
it is built along with LLVM or not. In this way, our lit configuration can work
propoerly.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93736
Summary:
This patch adds more fine-grained support over which information is output from the libomptarget runtime when run with the environment variable LIBOMPTARGET_INFO set. An extensible set of flags can be used to pick and choose which information the user is interested in.
Reviewers: jdoerfert JonChesterfield grokos
Differential Revision: https://reviews.llvm.org/D93727
[libomptarget][amdgpu] Call into deviceRTL instead of ockl
Amdgpu codegen presently emits a call into ockl. The same functionality
is already present in the deviceRTL. Adds an amdgpu specific entry point
to avoid the dependency. This lets simple openmp code (specifically, that
which doesn't use libm) run without rocm device libraries installed.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D93356
This patch partially prepares the runtime source code to be built with
-Wconversion, which should trigger warnings if any implicit conversions
can possibly change a value. For builds done with icc or gcc, all such
warnings are handled in this patch. clang gives a much longer list of
warnings, particularly for sign conversions, which the other compilers
don't report. The -Wconversion flag is commented into cmake files, but
I'm not going to turn it on. If someone thinks it is important, and wants
to fix all the clang warnings, they are welcome to.
Types of changes made here involve either improving the consistency of types
used so that no conversion is needed, or else performing careful explicit
conversions, when we're sure a problem won't arise.
Patch is a combination of changes by Terry Wilmarth and Johnny Peyton.
Differential Revision: https://reviews.llvm.org/D92942
Add support to the OpenMP web pages for environment variables supported
by Libomptarget and their usage.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93723
When setting `LLVM_ENABLE_RUNTIMES`, lower case word should be used;
otherwise, it can cause a CMake error that specific path is not found.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D93719
After some issues about building runtimes along with LLVM were fixed,
building an OpenMP offloading capable compiler is pretty simple. This patch updates
the FAQ part in the doc.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93671
If a GPU function is externally reachable we give up trying to find the
(unique) kernel it is called from. This can hinder optimizations. Emit a
remark and explain mitigation strategies.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D93439
This patchs adds CMake variables to add subdirectories and include
directories for libomptarget and explicitly gives the location of source
files.
Differential Revision: https://reviews.llvm.org/D93290
[libomptarget][nfc] Replace static const with enum
Semantically identical. Replaces 0xff... with ~0 to spare counting the f.
Has the advantage that the compiler doesn't need to prove the 4/8 byte
value dead before discarding it, and sidesteps the compilation question
associated with what static means for a single source language.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93328
Introduce new kmp_safe_raii_file_t class with RAII semantics for file
open/close. It is essentially a wrapper around the C-style FILE* object.
This also unifies the way we error report if a file can't be opened.
Differential Revision: https://reviews.llvm.org/D92604
This patch enables serial initialization in the forked child process
to fix unstable runtime behavior when used with Python-based AI tools.
Differential Revision: https://reviews.llvm.org/D93230
[libomptarget][nfc] Remove data_sharing type aliasing
Libomptarget previous used __kmpc_data_sharing_slot to access values of type
__kmpc_data_sharing_{worker,master}_slot_static. This aliasing violation was
benign in practice. The master type has since been removed, so a single type
can be used instead.
This is particularly helpful for the transition to an openmp deviceRTL, as the
c++/openmp compiler for amdgcn currently rejects the flexible array member for
being an incomplete type. Serves the same purpose as abandoned D86324.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93075
This patch introduces a new RTM lock type based on spin lock which is
used for OMP lock with speculative hint on supported architecture.
Differential Revision: https://reviews.llvm.org/D92615
This patch adds new API __kmpc_taskloop_5 to accomadate strict
modifier (introduced in OpenMP 5.1) in num_tasks and grainsize
clause.
Differential Revision: https://reviews.llvm.org/D92352
KMP_AFFINITY=norespect was triggering an error because the underlying
process affinity mask was not updated to include the entire machine.
The Windows documentation states that the thread affinities must be
subsets of the process affinity. This patch also moves the printing
(for KMP_AFFINITY=verbose) of whether the initial mask was respected
out of each topology detection function and to one location where the
initial affinity mask is read.
Differential Revision: https://reviews.llvm.org/D92587
Check pointer returned by strchr, as it can be NULL in case of broken
format of input string. Introduced new function __kmp_str_loc_numbers
for fast parsing of numbers only in the location string.
Also made some cleanup of __kmp_str_loc_init declaration and usage:
- changed type of init_fname parameter to bool;
- changed input from true to false in places where fname is not used.
Differential Revision: https://reviews.llvm.org/D90962
D91692 missed various locations in kmp_gsupport, where the scope for
OMPT_STORE_RETURN_ADDRESS is too narrow, i.e. the scope ends before the OMPT
callback is called in some nested function.
This patch fixes the scoping issue, so that all OMPT tests pass, when the
tests are built with gcc.
Differential Revision: https://reviews.llvm.org/D92121
[libomptarget][amdgpu] Address compiler warnings, drive by fixes
Initialize some variables, remove unused ones.
Changes the debug printing condition to align with the aomp test suite.
Differential Revision: https://reviews.llvm.org/D92559
These changes add support for Intel's umonitor/umwait usage in wait
code, for architectures that support those intrinsic functions. Usage of
umonitor/umwait is off by default, but can be turned on by setting the
KMP_USER_LEVEL_MWAIT environment variable.
Differential Revision: https://reviews.llvm.org/D91189
Added UNLIKELY hint to one-time or rarely executed branches.
This improves performance of the library on some tasking benchmarks.
Differential Revision: https://reviews.llvm.org/D92322
With the change to using shared memory, there were a few problems that need to be fixed.
- The previous filename that was used for SHM only used process id. Given that process is
usually based on 16bit number, this was causing some conflicts on machines. Thus we add
UID to the name to prevent this.
- It appears under some conditions (SIGTERM, etc) the shared memory files were not getting
cleaned up. Added a call to clean up the shm files under those conditions. For this user
needs to set envirable KMP_HANDLE_SIGNALS to true.
Patch by Erdner, Todd <todd.erdner@intel.com>
Differential Revision: https://reviews.llvm.org/D91869
Once __kmp_task_finish is not executed for proxy tasks,
move mutexinoutset dependency code to __kmp_release_deps
which is executed for all task kinds.
Differential Revision: https://reviews.llvm.org/D92326
[libomptarget][cuda] Detect missing symbols in plugin at build time
Passes -z,defs to the linker. Error on unresolved symbol references.
Otherwise, those unresolved symbols present as target code running on the host
as the plugin fails to load. This is significantly harder to debug than a link
time error. Flag matches that passed by amdgcn and ve plugins.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D92143
This is an alternative approach to address inconsistencies pointed out in: D90078
This patch makes sure that the return address is reset, when leaving the scope.
In some cases, I had to move the macro out of an if-statement to have it in the
right scope, in some cases I added an additional block to restrict the scope.
This patch does not handle inconsistencies, which might occur if the return
address is still set when we call into the application.
Test case (repeated_calls.c) provided by @hbae
Differential Revision: https://reviews.llvm.org/D91692
OpenMP 5.1 introduces the new env variable
OMP_TOOL_VERBOSE_INIT=(disabled|stdout|stderr|<filename>) to enable verbose
loading and initialization of OMPT tools.
This env variable helps to understand the cause when loading of a tool fails
(e.g., undefined symbols or dependency not in LD_LIBRARY_PATH)
Output of OMP_TOOL_VERBOSE_INIT is added for OMP_DISPLAY_ENV
Tests for this patch are integrated into the different existing tool loading
tests, making these tests more verbose. An Archer specific verbose test is
integrated into an existing Archer test.
Patch prepared by: Isabel Thärigen
Differential Revision: https://reviews.llvm.org/D91464
Adjusted external reference for Darwin/AARCH64 link compatibility.
Made size directive conditional only if __ELF__ defined.
Patch by Michael_Pique <mpique@icloud.com>
Differential Revision: https://reviews.llvm.org/D88252
This patch is the runtime support for https://reviews.llvm.org/D84192.
In order not to modify the tgt_target_data_update information but still be
able to pass the extra information for non-contiguous map item (offset,
count, and stride for each dimension), this patch overload arg when
the maptype is set as OMP_TGT_MAPTYPE_DESCRIPTOR. The origin arg is for
passing the pointer information, however, the overloaded arg is an
array of descriptor_dim:
```
struct descriptor_dim {
int64_t offset;
int64_t count;
int64_t stride
};
```
and the array size is the dimension size. In addition, since we
have count and stride information in descriptor_dim, we can replace/overload the
arg_size parameter by using dimension size.
Reviewed By: grokos, tianshilei1992
Differential Revision: https://reviews.llvm.org/D82245
Summary:
Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values.
Reviewers: jdoerfert grokos
Differential Revision: https://reviews.llvm.org/D87946
Summary:
This patch adds basic support for priting the source location and names for the mapped variables. This patch does not support names for custom mappers. This is based on D89802.
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D90172
Summary:
This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;"
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D89802
This introduces the new `ARCHER_OPTIONS` flag `ignore_serial=0|1` to disable
analysis and logging of memory accesses in the sequential part of the OpenMP
application.
In the sequential part of an OpenMP program no data race is possible, unless
there is non-OpenMP concurrency (such as pthreads, MPI, ...). For the latter
reason, this is not active by default.
Besides reducing the runtime overhead for the sequential part of the program,
this reduces the memory overhead for sequential initialization. In combination
with `flush_shadow=1` this can allow analysis of applications, which run close
to the limit of available memory, but only access smaller parts of shared
memory during each OpenMP parallel region.
A problem for this approach is that Archer only gets active, when the OpenMP
runtime gets initialized, which might be after serial initialization of the
application. In such case, it helps to call for example `omp_get_max_threads()`
at the beginning of main.
Differential Revision: https://reviews.llvm.org/D90473
OpenMP 5.1 adds an extra enum entry for ompt_scope_t, which makes the related
switch statement incomplete.
Also adding cases for newly added barrier variants.
Differential Revision: https://reviews.llvm.org/D90758
Currently the affinity format string has initial value. When users set
the format via OMP_AFFINITY_FORMAT, it will overwrite the format string. However,
when copying the format, the tailing null is missing. As a result, if the user
format string is shorter than default value, the remaining part in the default
value still makes effort. This bug is not exposed because the test case doesn't
check the end of a string. It only checks whether given output "contains" the
check string.
Reviewed By: AndreyChurbanov
Differential Revision: https://reviews.llvm.org/D91309
Summary:
This patch begins to add support for a set of scripts that can be used to get information from OpenMP programs to better describe problems and eventually show the data to the user in formatted output. Right now the only support is forformatting the register and memory usage reports from ptxas and nvlink. This is simply done as a wrapper around clang and clang++.
Reviewers: jdoerfert
DIfferential Revision: https://reviews.llvm.org/D91085
The deadlock/race happens when primary thread gets initz lock and tries to join
the worker thread which waits for the same lock in TLS key destructor.
The patch removes the lock and the code of setting TLS value which needed
the lock. Also removed setting TLS from __kmp_unregister_root_current_thread.
Differential Revision: https://reviews.llvm.org/D90647
This patch allows to pass the OpenMP runtime tests after configuring with
`cmake . -DOPENMP_TEST_FLAGS:STRING="-Werror"`.
The warnings for OMPT tests are addressed in D90752.
Differential Revision: https://reviews.llvm.org/D91280
This doesn't add functionality, but just adds the new types and renames the
master callback to masked callback.
Differential Revision: https://reviews.llvm.org/D90752
Modern Fortran compilers support Fortran 90, so we do not need to use
the source code for Fortran compilers that do not support Fortran 90.
Differential Revision: https://reviews.llvm.org/D90077
This patch is the runtime support for https://reviews.llvm.org/D84192.
In order not to modify the tgt_target_data_update information but still be
able to pass the extra information for non-contiguous map item (offset,
count, and stride for each dimension), this patch overload arg when
the maptype is set as OMP_TGT_MAPTYPE_DESCRIPTOR. The origin arg is for
passing the pointer information, however, the overloaded arg is an
array of descriptor_dim:
```
struct descriptor_dim {
int64_t offset;
int64_t count;
int64_t stride
};
```
and the array size is the dimension size. In addition, since we
have count and stride information in descriptor_dim, we can replace/overload the
arg_size parameter by using dimension size.
Reviewed By: grokos
Differential Revision: https://reviews.llvm.org/D82245
This patch fixes potential division by 0 in case hwloc does not
recognize cores (or architecture has no cores).
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D90954
This patch adds the mask and ident_t function to get the
openmp version. It also adds logic to force monotonic:dynamic
behavior when OpenMP version less than 5.0.
The OpenMP version is stored in the format:
major*10+minor e.g., OpenMP 5.0 = 50
Differential Revision: https://reviews.llvm.org/D90632
The macros are used in several places with an if(macro) pattern. This results
in several warnings about extraneous parenteses in equality comparison.
Having the constant at the lhs of the comparison, avoids this warning.
Differential Revision: https://reviews.llvm.org/D90756
There is a non-conforming use of variable-sized array in the test case `parallel_offloading_map.c`. This patch fixed it.
Reviewed By: protze.joachim
Differential Revision: https://reviews.llvm.org/D90642
As reported by @ronlieb, the test shows intermittent fails.
The test failed, if the dependent task was already finished, when the depending
task was to be created. We have other tests to check for the dependences pair.
Since detached tasks are supported by clang and the OpenMP runtime, Archer
must expect to receive the corresponding callbacks.
This patch adds support to interpret the synchronization semantics of
omp_fulfill_event and cleans up the handling of task switches.
Presently, there a number of global variables in libomptarget (devices,
RTLs, tables, mutexes, etc.) that are not placed within a struct. This
patch places them into a struct ``PluginManager``. All of the functions
that act on this data remain free.
Differential Revision: https://reviews.llvm.org/D90519
This adds some initial content as well as structure to the new OpenMP
Sphinx documentation hosted at http://openmp.llvm.org/docs/ .
The content contains some useful links but most pages are still empty.
This uses a "custom" theme which is a copy of the default "agogo" one
with minor modifications to get a nicer table of content in the sidebar.
This way we can also adjust the theme as we go.
Reviewed By: jhuber6, JonChesterfield
Differential Revision: https://reviews.llvm.org/D90256
[AMDGPU] Add __builtin_amdgcn_grid_size
Similar to D76772, loads the data from the dispatch pointer. Marked invariant.
Patch also updates the openmp devicertl to use this builtin.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D90251