This patch fixed the issue that target memory might be deallocated when
they're still being used or before they're used.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D84996
This is to address the issue reported at:
https://bugs.llvm.org/show_bug.cgi?id=46863
Since weak is meaningless for a shared library interface function, this patch
disables the attribute, when the OpenMP library is built as shared library.
ompt_start_tool is not an interface function, but a internally called function
possibly implemented by an OMPT tool.
This function needs to be weak if possible to allow overwriting ompt_start_tool
with a function implementation built into the application.
Differential Revision: https://reviews.llvm.org/D84871
Refactored the function `targetDataEnd` to make preparation of fixing
the issue of ahead-of-time target memory deallocation. This patch only
renamed `targetDataEnd` related variables and functions to conform
with LLVM code standard.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D84991
Refactored the function `target` to make preparation for fixing the
issue of ahead-of-time device memory deallocation.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D84816
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D84767
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.
It applies only for global pointers.
Differential Revision: https://reviews.llvm.org/D84767
This patch implements OpenMP runtime support for the OpenMP TR8
`present` motion modifier for `omp target update` directives. The
previous patch in this series implements Clang front end support.
Reviewed By: grokos
Differential Revision: https://reviews.llvm.org/D84712
This patch implements OpenMP runtime support for the OpenMP TR8
`present` motion modifier for `omp target update` directives. The
previous patch in this series implements Clang front end support.
Reviewed By: grokos
Differential Revision: https://reviews.llvm.org/D84712
On runtime failures, D83963 causes the runtime to abort instead of
merely exiting with a non-zero value, but many tests in the
libomptarget test suite still expect the former behavior. This patch
updates the test suite and was discussed in post-commit comments on
D83963 and D84557.
Summary:
1. Add DeviceTy::data_alloc, DeviceTy::data_delete, DeviceTy::data_alloc, DeviceTy::synchronize pass-through functions. Avoid directly accessing Device.RTL
2. Fix the type of the first argument of synchronize_ty in rth.h, device id is int32_t which is consistent with other functions.
Reviewers: tianshilei1992, jdoerfert
Reviewed By: tianshilei1992
Subscribers: yaxunl, guansong, sstefan1, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D84487
See PR46515 for the rational but generally, we want to *really* abort
not gracefully shut down.
Reviewed By: grokos, ABataev
Differential Revision: https://reviews.llvm.org/D83963
Additionally fix the copy if enabled on multi-config targets.
Summary:
This changes the copy command for libomp.so to use the output of the target as
the source of the copy, rather than trying to find it based on
${LIBOMP_LIBRARY_DIR}, which appears to be incorrect in multi-config generator
builds.
Reviewers: jdoerfert
Subscribers: mgorny, yaxunl, guansong, sstefan1, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D84148
Summary:
In the function `target`, memory deallocation and `target_data_end` is called
immediately returning from launching kernel. This might cause a race condition
that the corresponding memory is still being used by the kernel and a potential
issue that when the kernel starts to execute, its required data have already
been deallocated, especially when multiple kernels running concurrently. Since
nevertheless, we will block the thread issuing the target offloading at the end
of the target, we just move the synchronization ahead a little bit to make sure
the correctness.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: yaxunl, guansong, sstefan1, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D84381
This implements OpenMP runtime support for the OpenMP TR8 `present`
map type modifier. The previous patch in this series implements Clang
front end support. See that patch summary for behaviors that are not
yet supported.
Reviewed By: grokos, jdoerfert
Differential Revision: https://reviews.llvm.org/D83062
This implements OpenMP runtime support for the OpenMP TR8 `present`
map type modifier. The previous patch in this series implements Clang
front end support. See that patch summary for behaviors that are not
yet supported.
Reviewed By: grokos, jdoerfert
Differential Revision: https://reviews.llvm.org/D83062
Following tests were disabled for clang-11 after upgrading to
version 5.0 in D82963:
1. openmp/runtime/test/env/kmp_set_dispatch_buf.c
2. openmp/runtime/test/worksharing/for/kmp_set_dispatch_buf.c
They are also failing for clang-12. Thus this temporary disabling
until they are fixed.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D84241
Add check of frm to prevent array out-of-bound access;
add check of new_nproc to prevent access of unallocated hot_teams array;
add check of location info pointer to prevent NULL dereference;
add check of d_tn pointer to prevent NULL dereference in release build.
These checks make static analyzers happier.
This is second part of the patch from https://reviews.llvm.org/D84062.
Add check of negative gtid before indexing __kmp_threads.
This makes static analyzers happier.
This is the first part of the patch split in two parts.
Differential Revision: https://reviews.llvm.org/D84062
hwloc documentation guarantees the only object that is always present
in the topology is PU. We can check the presence of other objects
in the topology, just in case.
Differential Revision: https://reviews.llvm.org/D84065
Add barrier/region notification for parallel inside teams construct
when number of teams is 1, as VTune only shows outer level regions for
simplicity.
Differential Revision: https://reviews.llvm.org/D84024
Libomptarget patch adding runtime support for "declare mapper".
Patch co-developed by Lingda Li and George Rokos.
Differential revision: https://reviews.llvm.org/D68100
There are various runtime calls in the device runtime with unused, or
always fixed, arguments. This is bad for all sorts of reasons. Clean up
two before as we match them in OpenMPOpt now.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83268
Summary:
Retaining per device primary context is preferred to creating a context owned by the plugin.
From CUDA documentation
1. Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used." from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html
2. Right under cuCtxCreate. In most cases it is recommended to use cuDevicePrimaryCtxRetain. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf
3. The primary context is unique per device and shared with the CUDA runtime API. These functions allow integration with other libraries using CUDA. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX
Two issues are addressed by this patch:
1. Not using the primary context caused interoperability issue with libraries like cublas, cusolver. CUBLAS_STATUS_EXECUTION_FAILED and cudaErrorInvalidResourceHandle
2. On OLCF summit, "Error returned from cuCtxCreate" and "CUDA error is: invalid device ordinal"
Regarding the flags of the primary context. If it is inactive, we set CU_CTX_SCHED_BLOCKING_SYNC. If it is already active, we respect the current flags.
Reviewers: grokos, ABataev, jdoerfert, protze.joachim, AndreyChurbanov, Hahnfeld
Reviewed By: jdoerfert
Subscribers: openmp-commits, yaxunl, guansong, sstefan1, tianshilei1992
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D82718
This function uses __builtin_amdgcn_atomic_inc32():
uint32_t atomicInc(uint32_t *address, uint32_t max);
These functions use __builtin_amdgcn_fence():
__kmpc_impl_threadfence()
__kmpc_impl_threadfence_block()
__kmpc_impl_threadfence_system()
They will take place of current mechanism of directly calling IR functions.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83132
This patch adds missing GOMP_5.0 loop entry points which incorporate
new non-monotonic default into entry point name. Since monotonic
schedules are a subset of nonmonotonic, it is acceptable to use
monotonic as the implementation. This patch simply has the nonmonotonic
(and possibly non-monontonic) versions of the loop entry points as
wrappers around the monotonic ones.
Differential Revision: https://reviews.llvm.org/D73922
Following tests are failing after upgrading to version 5.0 but are passing
for version 4.5:
1. openmp/runtime/test/env/kmp_set_dispatch_buf.c
2. openmp/runtime/test/worksharing/for/kmp_set_dispatch_buf.c
To be enabled as soon as these tests are fixed.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D82963
If the compilation fails, the test is marked as unsupported.
-> This will never change for a specific version of gcc
If the linking fails, the test is marked as expected to fail.
-> This might change as LLVM/OpenMP implements the missing GOMP interface function
Reviewed by: Hahnfeld
Differential Revision: https://reviews.llvm.org/D83077