OPENMP_INSTALL_LIBDIR is set to the installation path of shared and static
libompd.This should avoid the mixing of 32 and 64 bit on same path in
multi-lib set-up.
Reviewed By: @mceier
Differential Revision: https://reviews.llvm.org/D109352
We peform runtime folding, but do not currently emit remarks when it is
performed. This is because it comes from the runtime library and is
beyond the users control. However, people may still wish to view this
and similar information easily, so we can enable this behaviour using a
special flag to enable verbose remarks.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D109627
The defintion of OFFLOAD_SUCCESS and OFFLOAD_FAIL used in plugin APIs and libomptarget public APIs are not consistent.
Create __tgt_target_return_t for libomptarget public APIs.
Differential Revision: https://reviews.llvm.org/D109304
This should have happened a long time ago, now that openmp.llvm.org
redirects to openmp.llvm.org/docs we completely switched over to the
sphinx documentation page instead.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D108588
The hsa library must be initialized before any calls into it and
destructed after the last call into it. There have been a number of bugs in
this area related to member variables which would like to use raii to manage
resources acquired from hsa.
This patch moves the init/shutdown of hsa into a class, such that when used as
the first member variable (could be a base), the lifetime of other member
variables are reliably scoped within it. This will allow other classes to use
raii reliably when used as member variables within the global.
Reviewed By: pdhaliwal
Differential Revision: https://reviews.llvm.org/D109512
Given D109057, change test runner to use the libomptarget-x-bc-path
argument instead of the LIBRARY_PATH environment variable to find the device
library.
Also drop the use of LIBRARY_PATH environment variable as it is far
too easy to pull in the device library from an unrelated toolchain by accident
with the current setup. No loss in flexibility to developers as the clang
commandline used here is still available.
Reviewed By: jdoerfert, tianshilei1992
Differential Revision: https://reviews.llvm.org/D109061
New omp_all_memory task dependence type is implemented.
Library recognizes the new type via either
(dependence_address == NULL && dependence_flag == 0x80)
or
(dependence_address == SIZE_MAX).
A task with new dependence type depends on each preceding task
with any dependence type (kind of a dependence barrier).
Differential Revision: https://reviews.llvm.org/D108574
The new interface only marks begin/end of a scope construct for
corresponding OMPT events, and we can use existing interfaces for
reduction operations.
Differential Revision: https://reviews.llvm.org/D108062
This patch changes the default monotonicity of dynamic schedule from
monotonic to non-monotonic when no modifier is specified.
Differential Revision: https://reviews.llvm.org/D109026
Using std::vector<DeviceTy> requires implementing copy constructor and copied assign operator for DeviceTy.
Indeed DeviceTy should never be copied. After changing to std::vector<std::unique_ptr<DeviceTy>>,
All the unsafe copy constructor and copy assign operator implementations can be removed.
Compilers mark them deleted due to mutex or underlying objects and this is the desired behavior.
Differential Revision: https://reviews.llvm.org/D109276
Use the same debug print as the rest of libomptarget plugins with
the same environment control. Also drop the max queue size debugging hook as
I don't believe it is still in use, can bring it back near the rest of the env
handling in rtl.cpp if someone objects.
That makes most of rt.h and all of utils.cpp unused. Clean that up and simplify
control flow in a couple of places.
Behaviour change is that debug prints that used to use the old environment
variable now use the new one and print in slightly different format, and the
removal of the max queue size variable.
Reviewed By: pdhaliwal
Differential Revision: https://reviews.llvm.org/D108784
Use unique_ptr to achieve the effect of mutable.
Remove mutable keyword of DynRefCount and HoldRefCount
Remove std::shared_ptr from UpdateMtx
Reviewed By: tianshilei1992, grokos
Differential Revision: https://reviews.llvm.org/D109007
As started in D107925, this patch replaces the remaining occurrences
of `UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin` in
`omptarget.cpp` with `IsHostPtr`. The former condition is broken in
the rare case that the device and host happen to use the same address
for their mapped allocations. I don't know how to write a test that's
likely to reveal this case.
Reviewed By: grokos
Differential Revision: https://reviews.llvm.org/D107928
As discussed in D105990, without this patch, `targetDataBegin`
determines whether to transfer data (as opposed to assuming it's in
shared memory) using the condition `!UseUSM || HasCloseModifier`.
However, this condition is broken if use of discrete memory was forced
by `omp_target_associate_ptr`. This patch extends
`unified_shared_memory/associate_ptr.c` to reveal this case, and it
fixes it using `!IsHostPtr` in `DeviceTy::getTargetPointer` to replace
this condition.
Reviewed By: grokos
Differential Revision: https://reviews.llvm.org/D107927
This patch is based on comments in D105990. It is NFC according to
the following observations:
1. `CopyMember` is computed as `!IsHostPtr && IsLast`.
2. `DelEntry` is true only if `IsLast` is true.
We apply those observations in order:
```
if ((DelEntry || Always || CopyMember) && !IsHostPtr)
if ((DelEntry || Always || IsLast) && !IsHostPtr)
if ((Always || IsLast) && !IsHostPtr)
```
Reviewed By: grokos
Differential Revision: https://reviews.llvm.org/D107926
As discussed in D105990, without this patch, `targetDataEnd`
determines whether to transfer data or delete a device mapping (as
opposed to assuming it's in shared memory) using two different
conditions, each of which is broken for some cases:
1. `!(UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin)`: The
broken case is rare: the device and host might happen to use the
same address for their mapped allocations. I don't know how to
write a test that's likely to reveal this case, but this patch does
fix it, as discussed below.
2. `!UNIFIED_SHARED_MEMORY || HasCloseModifier`: There are at least
two broken cases:
1. The `close` modifier might have been specified on an `omp
target enter data` but not the corresponding `omp target exit
data`, which thus might falsely assume a mapping is in shared
memory. The test `unified_shared_memory/close_enter_exit.c`
already has a missing deletion as a result, and this patch adds
a check for that. This patch also adds the new test
`close_member.c` to reveal a missing transfer and deletion.
2. Use of discrete memory might have been forced by
`omp_target_associate_ptr`, as in the test
`unified_shared_memory/api.c`. In the current `targetDataEnd`
implementation, this condition turns out not be used for this
case: because the reference count is infinite, a transfer is
possible only with an `always` modifier, and this condition is
never used in that case. To ensure it's never used for that
case in the future, this patch adds the test
`unified_shared_memory/associate_ptr.c`.
Fortunately, `DeviceTy::getTgtPtrBegin` already has a solution: it
reports whether the allocation was found in shared memory via the
variable `IsHostPtr`.
After this patch, `HasCloseModifier` is no longer used in
`targetDataEnd`, and I wonder if the `close` modifier is ever useful
on an `omp target data end`.
Reviewed By: grokos
Differential Revision: https://reviews.llvm.org/D107925
Given D109057, change test runner to use the libomptarget-x-bc-path
argument instead of the LIBRARY_PATH environment variable to find the device
library.
Also drop the use of LIBRARY_PATH environment variable as it is far
too easy to pull in the device library from an unrelated toolchain by accident
with the current setup. No loss in flexibility to developers as the clang
commandline used here is still available.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D109061
Using rpath instead of LD_LIBRARY_PATH to find libomp.so and
libomptarget.so lets one rerun the already built test executables without
setting environment variables and removes the risk of the test runner picking
up different libraries to the developer debugging the failure.
rpath usually means runpath, which is not transitive, so set runpath on
libomptarget itself so that it can find the plugins located next to it,
spelled $ORIGIN. This provides sufficient functionality to drop D102043
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D109071
This is a continuation of the review: https://reviews.llvm.org/D100181
Creates a new directory "libompd" under openmp.
"TargetValue" provides operational access to the OpenMP runtime memory
for OMPD APIs.
With TargetValue, using "pointer" a user can do multiple operations
from casting, dereferencing to accessing an element for structure.
The member functions are designed to concatenate the operations that
are needed to access values from structures.
e.g., _a[6]->_b._c would read like :
TValue(ctx, "_a").cast("A",2)
.getArrayElement(6).access("_b").cast("B").access("_c")
For example:
If you have a pointer "ThreadHandle" of a running program then you can
access/retrieve "threadID" from the memory using TargetValue as below.
TValue(context, thread_handle->th) /*__kmp_threads[t]->th*/
.cast("kmp_base_info_t")
.access("th_info") /*__kmp_threads[t]->th.th_info*/
.cast("kmp_desc_t")
.access("ds") /*__kmp_threads[t]->th.th_info.ds*/
.cast("kmp_desc_base_t")
.access("ds_thread") /*__kmp_threads[t]->th.th_info.ds.ds_thread*/
.cast("kmp_thread_t")
.getRawValue(thread_id, 1);
Reviewed By: @hbae
Differential Revision: https://reviews.llvm.org/D100182
This patch implements OpenMP runtime support for an original OpenMP
extension we have developed to support OpenACC: the `ompx_hold` map
type modifier. The previous patch in this series, D106509, implements
Clang support and documents the new functionality in detail.
Reviewed By: grokos
Differential Revision: https://reviews.llvm.org/D106510
This patch implements Clang support for an original OpenMP extension
we have developed to support OpenACC: the `ompx_hold` map type
modifier. The next patch in this series, D106510, implements OpenMP
runtime support.
Consider the following example:
```
#pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x
{
foo(); // might have map(delete: x)
#pragma omp target map(present, alloc: x) // x is guaranteed to be present
printf("%d\n", x);
}
```
The `ompx_hold` map type modifier above specifies that the `target
data` directive holds onto the mapping for `x` throughout the
associated region regardless of any `target exit data` directives
executed during the call to `foo`. Thus, the presence assertion for
`x` at the enclosed `target` construct cannot fail. (As usual, the
standard OpenMP reference count for `x` must also reach zero before
the data is unmapped.)
Justification for inclusion in Clang and LLVM's OpenMP runtime:
* The `ompx_hold` modifier supports OpenACC functionality (structured
reference count) that cannot be achieved in standard OpenMP, as of
5.1.
* The runtime implementation for `ompx_hold` (next patch) will thus be
used by Flang's OpenACC support.
* The Clang implementation for `ompx_hold` (this patch) as well as the
runtime implementation are required for the Clang OpenACC support
being developed as part of the ECP Clacc project, which translates
OpenACC to OpenMP at the directive AST level. These patches are the
first step in upstreaming OpenACC functionality from Clacc.
* The Clang implementation for `ompx_hold` is also used by the tests
in the runtime implementation. That syntactic support makes the
tests more readable than low-level runtime calls can. Moreover,
upstream Flang and Clang do not yet support OpenACC syntax
sufficiently for writing the tests.
* More generally, the Clang implementation enables a clean separation
of concerns between OpenACC and OpenMP development in LLVM. That
is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's
extended OpenMP implementation and test suite without directly
considering OpenACC's language and execution model, which can be
handled by LLVM's OpenACC developers.
* OpenMP users might find the `ompx_hold` modifier useful, as in the
above example.
See new documentation introduced by this patch in `openmp/docs` for
more detail on the functionality of this extension and its
relationship with OpenACC. For example, it explains how the runtime
must support two reference counts, as specified by OpenACC.
Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new
command-line option introduced by this patch, is specified.
Reviewed By: ABataev, jdoerfert, protze.joachim, grokos
Differential Revision: https://reviews.llvm.org/D106509
As discussed in D107121, task wait doesn't work when a regular task T depends on
a detached task or a hidden helper task T' in a serialized team. The root cause is,
since the team is serialized, the last task will not be tracked by
`td_incomplete_child_tasks`. When T' is finished, it first releases its
dependences, and then decrements its parent counter. So far so good. For the thread
that is running task wait, if at the moment it is still spinning and trying to
execute tasks, it is fine because it can detect the new task and execute it.
However, if it happends to finish the function `flag.execute_tasks(...)`, it will
be broken because `td_incomplete_child_tasks` is 0 now.
In this patch, we update the rule to track children tasks a little bit. If the
task team encounters a proxy task or a hidden helper task, all following tasks
will be tracked.
Reviewed By: AndreyChurbanov
Differential Revision: https://reviews.llvm.org/D107496
In some build configurations, the target we depend on is not available for declaring the build dependency.
We only need to declare the build dependency, if the build target is available in the same build.
Fixes the issue raised in https://reviews.llvm.org/D107156#2969862
This patch should go into release/13 together with D108404
Differential Revision: https://reviews.llvm.org/D108868
`CU_EVENT_DEFAULT` is defined in CUDA header. It should be added to
`openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` for CUDA free build.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D108878
This patch adds the support form event related interfaces, which will be used
later to fix data race. See D104418 for more details.
Reviewed By: jdoerfert, ye-luo
Differential Revision: https://reviews.llvm.org/D108528
Lets wavefront size be 32 for amdgpu openmp, as well as 64.
Fixes up as little as possible to pass that through the libraries. This change
is end to end, as opposed to updating clang/devicertl/plugin separately. It can
be broken up for review/commit if preferred. Posting as-is so that others with
a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are
probably bugs remaining as well as the todo: for letting grid values vary more.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D108708
Lets the amdgpu plugin write to omptarget_device_environment
to enable debugging. Intend to use in the near future to record the
wavesize that a given deviceRTL was compiled with for running on hardware
that supports 32 or 64.
Patch sets all the attributes that are useful. Notably .data means the variable
is set by writing to host memory before copying to the GPU instead of launching
a kernel to update the image. Can simplify the plugin slightly to drop the
code for patching after load if this is used consistently.
NFC on nvptx, cuda plugin seems to work fine without any annotations.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108698
Move most debug printing in rtl.cpp behind DP() macro
Adjust the print output for gpu arch mismatch when the architectures match
Convert an assert into graceful failure
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108562
The use of `$<TARGET_FILE:clang>` was adapted too broadly from D101265.
Fixes llvm.org/PR51579
Also see discussion in D108534.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D108640
With uses of g_atl_machine gone, a significant portion of dead
code has been removed.
This patch depends on D104691 and D104695.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D104696
Compiled nvptx devicertl as freestanding, breaking the
dependency on host glibc and gcc-multilibs. Thus build it by default.
Comes at the cost of #defining out printf. Tried mapping it onto
__builtin_printf but that gets transformed back to printf instead
of hitting the cuda/openmp lowering transform.
Printf could be preserved by one of:
- dropping all the standard headers and ffreestanding
- providing a header only printf implementation
- changing the compiler handling of printf
Reviewed By: grokos
Differential Revision: https://reviews.llvm.org/D108349
Add include path to the cmakefiles and set the target_impl enums
from the llvm constants instead of copying the values.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108391