This is done at call-site and does not need to be handled in
__kmp_invoke_microtask. It was already absent from the x86
and x86_64 assembly, this patch removes it from the generic
implementation in z_Linux_util.cpp and adds documentation for
AArch64 and PPC64 that it's actually not needed. I can't test
on these architectures, so I don't want to change the code just
because it looks right :)
While at it, rename some variables for consistency and add a
check in test/ompt/parallel/normal.c that the pointer was reset
before entering the barrier.
Differential Revision: https://reviews.llvm.org/D64442
llvm-svn: 366721
Remove loopTripCnt from threaded device stack after consuming it.
Added a libomptarget DP message to aid in future debugging and to
validate the added testcase, which only runs in Debug build.
Differential Revision: https://reviews.llvm.org/D64808
llvm-svn: 366349
This is a follow up patch to D64534 (r365963) which removed all OMP
spec versioning within the OpenMP runtime codebase. This patch removes
REQUIRES: openmp-x.y lines from lit tests.
llvm-svn: 366341
This leads to problems when compiling C++ code with libc++ for Nvidia GPUs
because Clang now uses wrappers for math functions that might include
C++ templates not allowed in 'extern "C"'.
Differentiel Revision: https://reviews.llvm.org/D64625
llvm-svn: 366229
Summary:
We used CUDART_VERSION macro to check for the installed cuda version
but this macro is defined in cuda_runtime_api.h, which is not used by
project. Better to use CUDA_VERSION macro, which is defined in cuda.h.
Also, added the check if this macro is defined. If macro is undefined,
there is something wrong with the cuda configuration and we should not
continue the compilation.
This also fixes problems with runtime building in cuda 10+.
Reviewers: grokos
Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D64648
llvm-svn: 366224
Summary:
We used to call __kmpc_omp_taskwait function with global threadid set to
0. It may crash the application at the runtime if the thread executing
target region is not a master thread.
Reviewers: grokos, kkwli0
Subscribers: guansong, jdoerfert, caomhin, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D64571
llvm-svn: 366220
Remove all older OMP spec versioning from the runtime and build system.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D64534
llvm-svn: 365963
These entry points are never called by Clang trunk nor clang-ykt. If
XL doesn't use them either, they can finally go away.
Differential Revision: https://reviews.llvm.org/D52700
llvm-svn: 365817
Summary:
__kmpc_push_tripcount function is not thread safe and may lead to data
race when the target regions are executed in parallel threads. The patch
makes loopTripCnt counter thread aware and stores the tripcount value
per thread in the map. Access to map is guarded by mutex to prevent
data race in the map itself.
Test is for NVPTX target because it does not work correctly on the
host. Seems to me, there is a problem in libomp with target regions in
the parallel threads.
Reviewers: grokos
Subscribers: guansong, jfb, jdoerfert, openmp-commits, kkwli0, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D64080
llvm-svn: 365332
Summary:
According to the OpenMP standard, flush makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied.
According to the Cuda toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#memory-fence-functions), __threadfence() functions provides required functionality.
__threadfence_system() also provides required functionality, but it also
includes some extra functionality, like synchronization of page-locked
host memory, synchronization for the host, etc. It is not required per
the standard and we can use more relaxed version of memory fence
operation.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D62397
llvm-svn: 364572
Bug reported in https://bugs.llvm.org/show_bug.cgi?id=42269.
Freeing of the contention group (CG) stucture by master thread looks wrong,
because workers can leave the CG later on. Intead the freeing
is now done by the last thread leaving the CG.
Differential Revision: https://reviews.llvm.org/D63599
llvm-svn: 364456
Summary:
This patch adds support for handling variables under the:
```
#pragma omp declare target to()
```
clause when the
```
#pragma omp requires unified_shared_memory
```
is used.
The address of the host variable is copied into the device pointer just like for the declare target link case.
Reviewers: ABataev, caomhin, grokos, AlexEichenberger
Reviewed By: grokos
Subscribers: jcownie, guansong, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D63106
llvm-svn: 363825
Summary:
The problems with __syncthreads() were fixed in clang >= 9.0 and the
original __syncthreads() can be used instead of the ptx instruction.
Reviewers: grokos
Subscribers: guansong, jdoerfert, openmp-commits, kkwli0, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D63515
llvm-svn: 363807
Summary: This patch enables the usage of a host variable on the device for declare target link variables when unified memory is available.
Reviewers: ABataev, caomhin, grokos
Reviewed By: grokos
Subscribers: Hahnfeld, guansong, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60884
llvm-svn: 362505
Made type of depth of hwloc object to correapond with
change from unsigned in hwloc 1,x to int in hwloc 2.x.
This eliminates the warning on signed-unsigned comparison.
Differential Revision: https://reviews.llvm.org/D62332
llvm-svn: 362401
Current parsing allows trailing string after the permitted value,
MANDATORY|DISABLED|DEFAULT -- e.g., "mandatorynot" is also recognized
as "MANDATORY". Such cases should be recognized as incorrect/unknown
value.
Differential Revision: https://reviews.llvm.org/D62431
llvm-svn: 362125
This change adds checks before dereferencing a pointer returned from a
function.
Differential Revision: https://reviews.llvm.org/D62224
llvm-svn: 362111
The omp_taskloop_num_tasks and omp_taskwait have deadlooped
on the NetBSD buildbot previously, practically hanging the host running
it. Disable them until we can find a good solution, or make the kernel
less fragile.
llvm-svn: 361825
Summary:
Parallel level counter should be volatile to prevent some dangerous
optimiations by the ptxas. Otherwise, ptxas optimizations lead to
undefined behaviour in some cases.
Also, use __threadfence() for #pragma omp flush and if the barrier
should not be used (we have only one thread in the team), still perform
flush operation since the standard requires implicit flush when
executing barriers.
Reviewers: gtbercea, kkwli0, grokos
Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D62199
llvm-svn: 361421
This change adds implementation to ompt_finalize_tool() and
ompt_get_task_memory().
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D61657
llvm-svn: 361309
Summary:
Target link variables are currently implemented by creating a copy of the variables on the device side and unified memory never gets exploited.
When the prgram uses the:
```
#pragma omp requires unified_shared_memory
```
directive in conjunction with a declare target link, the linked variable is no longer allocated on the device and the host version is used instead.
This behavior is overridden by performing an explicit mapping.
A Clang side patch is required.
Reviewers: ABataev, AlexEichenberger, grokos, Hahnfeld
Reviewed By: AlexEichenberger, grokos, Hahnfeld
Subscribers: Hahnfeld, jfb, guansong, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60223
llvm-svn: 361294
https://reviews.llvm.org/D58454 did not fix the problem for a typical use
case of building LLVM with gcc or icc and then testing with the newly built
clang compiler.
The compilers do not agree on how to extend a 32-bit pointer to uint64, so
make the pointer unsigned first, before adjusting the size.
Patch by Joachim Protze
Differential Revision: https://reviews.llvm.org/D58506
llvm-svn: 361158
OpenMP 5.0 says that the callback for the events initial-task-begin and
initial-task-end has to be ompt_callback_implicit_task.
Patch by Tim Cramer
Differential Revision: https://reviews.llvm.org/D58776
llvm-svn: 361157
Added synchronization for possible concurrent initialization of mutexes
by multiple threads. The need of synchronization caused by commit r357927
which added the use of mutexes at threads movement to/from common pool
(earlier the mutexes were used only at suspend/resume).
Patch by Johnny Peyton.
Differential Revision: https://reviews.llvm.org/D61995
llvm-svn: 360919
Currently cores within package that share the same L2 cache are grouped together.
The current logic behind this assumes that the L2 cache is always at deeper
(or the same) level than the package itself. In case when L2 cache is common
for all packages (and the packages are at deeper level than L2 cache) the whole of
the further topology discovery fails to find any computational units resulting in
following assertion:
Assertion failure at kmp_affinity.cpp(715): nActiveThreads == __kmp_avail_proc.
OMP: Error #13: Assertion failure at kmp_affinity.cpp(715).
This patch adds a bit of a logic that prevents such situation from occurring.
Differential Revision: https://reviews.llvm.org/D61796
llvm-svn: 360890
Removed unconditional and unsafe decrement of counter
of active threads in pool at shutdown time.
Differential Revision: https://reviews.llvm.org/D61944
llvm-svn: 360784
The implementation should be done by compiler, user can only declare
objects of this type and use them in OpenMP directives.
Differential Revision: https://reviews.llvm.org/D61860
llvm-svn: 360774
The code is currently using the ambiguous instruction
"sub sp, sp, w9, lsl #4". The ARM reference manual says this isn't
valid, and it's not clear whether it's supposed to mean uxtw or uxtx.
It doesn't matter which instruction we use here, since the high
bits of the operand are zero anyway, so I arbitrarily choose uxtw, to
preserve the register name.
See https://reviews.llvm.org/D60840 for the LLVM patch.
Differential Revision: https://reviews.llvm.org/D61770
llvm-svn: 360711
Changed file extension of the destination of the copy of libomp.lib
(it was mistakely .dll, now it is .lib) in installation on Windows.
Differential Revision: https://reviews.llvm.org/D61673
llvm-svn: 360595
Summary:
Patch improves performance of the full runtime mode by moving
threads limit counter to the shared memory. It also allows to save
global memory.
Reviewers: grokos, kkwli0, gtbercea
Subscribers: guansong, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61801
llvm-svn: 360584
Summary:
Patch improves performance of the full runtime mode by moving
number-of-threads counter to the shared memory. It also allows to save
global memory.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61785
llvm-svn: 360457
This patch provides workaround to allow gfortran to compile the
OpenMP Fortran modules.
From the gfortran manual:
https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gfortran/BOZ-literal-constants.html
"Note that initializing an INTEGER variable with a statement such as
DATA i/Z'FFFFFFFF'/ will give an integer overflow error rather than the desired
result of -1 when i is a 32-bit integer on a system that supports 64-bit
integers. The -fno-range-check option can be used as a workaround for legacy
code that initializes integers in this manner."
Bug filed: https://bugs.llvm.org/show_bug.cgi?id=41755
Differential Revision: https://reviews.llvm.org/D61603
llvm-svn: 360299
Summary:
To be able to successfully build OpenMP on FreeBSD/i386, which still
uses i486 as its default processor, I had to provide wrappers for the
`__kmp_load_mxcsr` and `__kmp_store_mxcsr` functions.
If the compiler signals that SSE is not available, loading and storing
mxcsr does not make sense anway, so in that case the inline functions
are empty. This gives the minimum amount of code churn.
See also https://svnweb.freebsd.org/changeset/base/345283
Reviewers: emaste, jlpeyton, Hahnfeld
Reviewed By: jlpeyton
Subscribers: hfinkel, krytarowski, jdoerfert, openmp-commits, llvm-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60916
llvm-svn: 360062
Summary:
Patch improves performance of the full runtime mode by moving
thread-limit counter to the shared memory. It also allows to save
global memory.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jdoerfert, caomhin, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61526
llvm-svn: 359922
Summary:
Used parallelLevel[] counter to simplify and improve implementation of
the existing standard OpenMP functions. Functions are tested already in
several tests, the patch is NFC.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jdoerfert, caomhin, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61459
llvm-svn: 359892
Summary:
Previously for the different purposes we need to get the active/common
parallel level and with full runtime we iterated over all the records to
calculate this level. Instead, we can used the warp-based parallel level
counters used in no-runtime mode.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jfb, jdoerfert, caomhin, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61395
llvm-svn: 359822