Summary:
According to the OpenMP standard, flush makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied.
According to the Cuda toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#memory-fence-functions), __threadfence() functions provides required functionality.
__threadfence_system() also provides required functionality, but it also
includes some extra functionality, like synchronization of page-locked
host memory, synchronization for the host, etc. It is not required per
the standard and we can use more relaxed version of memory fence
operation.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D62397
llvm-svn: 364572
Bug reported in https://bugs.llvm.org/show_bug.cgi?id=42269.
Freeing of the contention group (CG) stucture by master thread looks wrong,
because workers can leave the CG later on. Intead the freeing
is now done by the last thread leaving the CG.
Differential Revision: https://reviews.llvm.org/D63599
llvm-svn: 364456
Summary:
This patch adds support for handling variables under the:
```
#pragma omp declare target to()
```
clause when the
```
#pragma omp requires unified_shared_memory
```
is used.
The address of the host variable is copied into the device pointer just like for the declare target link case.
Reviewers: ABataev, caomhin, grokos, AlexEichenberger
Reviewed By: grokos
Subscribers: jcownie, guansong, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D63106
llvm-svn: 363825
Summary:
The problems with __syncthreads() were fixed in clang >= 9.0 and the
original __syncthreads() can be used instead of the ptx instruction.
Reviewers: grokos
Subscribers: guansong, jdoerfert, openmp-commits, kkwli0, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D63515
llvm-svn: 363807
Summary: This patch enables the usage of a host variable on the device for declare target link variables when unified memory is available.
Reviewers: ABataev, caomhin, grokos
Reviewed By: grokos
Subscribers: Hahnfeld, guansong, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60884
llvm-svn: 362505
Made type of depth of hwloc object to correapond with
change from unsigned in hwloc 1,x to int in hwloc 2.x.
This eliminates the warning on signed-unsigned comparison.
Differential Revision: https://reviews.llvm.org/D62332
llvm-svn: 362401
Current parsing allows trailing string after the permitted value,
MANDATORY|DISABLED|DEFAULT -- e.g., "mandatorynot" is also recognized
as "MANDATORY". Such cases should be recognized as incorrect/unknown
value.
Differential Revision: https://reviews.llvm.org/D62431
llvm-svn: 362125
This change adds checks before dereferencing a pointer returned from a
function.
Differential Revision: https://reviews.llvm.org/D62224
llvm-svn: 362111
The omp_taskloop_num_tasks and omp_taskwait have deadlooped
on the NetBSD buildbot previously, practically hanging the host running
it. Disable them until we can find a good solution, or make the kernel
less fragile.
llvm-svn: 361825
Summary:
Parallel level counter should be volatile to prevent some dangerous
optimiations by the ptxas. Otherwise, ptxas optimizations lead to
undefined behaviour in some cases.
Also, use __threadfence() for #pragma omp flush and if the barrier
should not be used (we have only one thread in the team), still perform
flush operation since the standard requires implicit flush when
executing barriers.
Reviewers: gtbercea, kkwli0, grokos
Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D62199
llvm-svn: 361421
This change adds implementation to ompt_finalize_tool() and
ompt_get_task_memory().
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D61657
llvm-svn: 361309
Summary:
Target link variables are currently implemented by creating a copy of the variables on the device side and unified memory never gets exploited.
When the prgram uses the:
```
#pragma omp requires unified_shared_memory
```
directive in conjunction with a declare target link, the linked variable is no longer allocated on the device and the host version is used instead.
This behavior is overridden by performing an explicit mapping.
A Clang side patch is required.
Reviewers: ABataev, AlexEichenberger, grokos, Hahnfeld
Reviewed By: AlexEichenberger, grokos, Hahnfeld
Subscribers: Hahnfeld, jfb, guansong, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60223
llvm-svn: 361294
https://reviews.llvm.org/D58454 did not fix the problem for a typical use
case of building LLVM with gcc or icc and then testing with the newly built
clang compiler.
The compilers do not agree on how to extend a 32-bit pointer to uint64, so
make the pointer unsigned first, before adjusting the size.
Patch by Joachim Protze
Differential Revision: https://reviews.llvm.org/D58506
llvm-svn: 361158
OpenMP 5.0 says that the callback for the events initial-task-begin and
initial-task-end has to be ompt_callback_implicit_task.
Patch by Tim Cramer
Differential Revision: https://reviews.llvm.org/D58776
llvm-svn: 361157
Added synchronization for possible concurrent initialization of mutexes
by multiple threads. The need of synchronization caused by commit r357927
which added the use of mutexes at threads movement to/from common pool
(earlier the mutexes were used only at suspend/resume).
Patch by Johnny Peyton.
Differential Revision: https://reviews.llvm.org/D61995
llvm-svn: 360919
Currently cores within package that share the same L2 cache are grouped together.
The current logic behind this assumes that the L2 cache is always at deeper
(or the same) level than the package itself. In case when L2 cache is common
for all packages (and the packages are at deeper level than L2 cache) the whole of
the further topology discovery fails to find any computational units resulting in
following assertion:
Assertion failure at kmp_affinity.cpp(715): nActiveThreads == __kmp_avail_proc.
OMP: Error #13: Assertion failure at kmp_affinity.cpp(715).
This patch adds a bit of a logic that prevents such situation from occurring.
Differential Revision: https://reviews.llvm.org/D61796
llvm-svn: 360890
Removed unconditional and unsafe decrement of counter
of active threads in pool at shutdown time.
Differential Revision: https://reviews.llvm.org/D61944
llvm-svn: 360784
The implementation should be done by compiler, user can only declare
objects of this type and use them in OpenMP directives.
Differential Revision: https://reviews.llvm.org/D61860
llvm-svn: 360774
The code is currently using the ambiguous instruction
"sub sp, sp, w9, lsl #4". The ARM reference manual says this isn't
valid, and it's not clear whether it's supposed to mean uxtw or uxtx.
It doesn't matter which instruction we use here, since the high
bits of the operand are zero anyway, so I arbitrarily choose uxtw, to
preserve the register name.
See https://reviews.llvm.org/D60840 for the LLVM patch.
Differential Revision: https://reviews.llvm.org/D61770
llvm-svn: 360711
Changed file extension of the destination of the copy of libomp.lib
(it was mistakely .dll, now it is .lib) in installation on Windows.
Differential Revision: https://reviews.llvm.org/D61673
llvm-svn: 360595
Summary:
Patch improves performance of the full runtime mode by moving
threads limit counter to the shared memory. It also allows to save
global memory.
Reviewers: grokos, kkwli0, gtbercea
Subscribers: guansong, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61801
llvm-svn: 360584
Summary:
Patch improves performance of the full runtime mode by moving
number-of-threads counter to the shared memory. It also allows to save
global memory.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61785
llvm-svn: 360457
This patch provides workaround to allow gfortran to compile the
OpenMP Fortran modules.
From the gfortran manual:
https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gfortran/BOZ-literal-constants.html
"Note that initializing an INTEGER variable with a statement such as
DATA i/Z'FFFFFFFF'/ will give an integer overflow error rather than the desired
result of -1 when i is a 32-bit integer on a system that supports 64-bit
integers. The -fno-range-check option can be used as a workaround for legacy
code that initializes integers in this manner."
Bug filed: https://bugs.llvm.org/show_bug.cgi?id=41755
Differential Revision: https://reviews.llvm.org/D61603
llvm-svn: 360299
Summary:
To be able to successfully build OpenMP on FreeBSD/i386, which still
uses i486 as its default processor, I had to provide wrappers for the
`__kmp_load_mxcsr` and `__kmp_store_mxcsr` functions.
If the compiler signals that SSE is not available, loading and storing
mxcsr does not make sense anway, so in that case the inline functions
are empty. This gives the minimum amount of code churn.
See also https://svnweb.freebsd.org/changeset/base/345283
Reviewers: emaste, jlpeyton, Hahnfeld
Reviewed By: jlpeyton
Subscribers: hfinkel, krytarowski, jdoerfert, openmp-commits, llvm-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60916
llvm-svn: 360062
Summary:
Patch improves performance of the full runtime mode by moving
thread-limit counter to the shared memory. It also allows to save
global memory.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jdoerfert, caomhin, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61526
llvm-svn: 359922
Summary:
Used parallelLevel[] counter to simplify and improve implementation of
the existing standard OpenMP functions. Functions are tested already in
several tests, the patch is NFC.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jdoerfert, caomhin, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61459
llvm-svn: 359892
Summary:
Previously for the different purposes we need to get the active/common
parallel level and with full runtime we iterated over all the records to
calculate this level. Instead, we can used the warp-based parallel level
counters used in no-runtime mode.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jfb, jdoerfert, caomhin, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61395
llvm-svn: 359822
Summary:
Function omp_get_thread_limit() in SPMD mode can return the maximum
available number of threads as a result.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61378
llvm-svn: 359790
Summary:
To be able to successfully build OpenMP on 32-bit FreeBSD, such as
FreeBSD/i386, I first had to provide a few wrappers (see D60916), and
then add `KMP_OS_FREEBSD` to the list of defines checked for 32-bit
architectures in `kmp_runtime.cpp`.
I have successfully built libomp.so and ran a bunch of test programs on
FreeBSD/i386 with this.
See also https://svnweb.freebsd.org/changeset/base/345283
Reviewers: emaste, jlpeyton, Hahnfeld
Reviewed By: jlpeyton
Subscribers: krytarowski, guansong, jdoerfert, openmp-commits, llvm-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60917
llvm-svn: 359716
Implemented task modifier in two versions - one without taking into account
omp_orig variable (the omp_orig still can be processed by compiler without help
of the library, but each reduction object will need separate initializer with
global access to omp_orig), another with omp_orig variable included into
interface (single initializer can be used for multiple reduction objects of
the same type). Second version can be used when the omp_orig is not globally
accessible, or to optimize code in case of multiple reduction objects
of the same type.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D60976
llvm-svn: 359710
This patch adds:
* New omp_sched_monotonic flag to omp_sched_t which is handled within the runtime
* Parsing of monotonic/nonmonotonic in OMP_SCHEDULE
* Tests for the monotonic flag and envirable parsing
* Logic to force monotonic when hierarchical scheduling is used
Differential Revision: https://reviews.llvm.org/D60979
llvm-svn: 359601
Summary:
The parallelLevel counter must be on per-thread basis to fully support
L2+ parallelism, otherwise we may end up with undefined behavior.
Introduce the parallelLevel on per-warp basis using shared memory. It
allows to avoid the problems with the synchronization and allows fully
support L2+ parallelism in SPMD mode with no runtime.
Reviewers: gtbercea, grokos
Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60918
llvm-svn: 359341
Summary:
I ran into some issues after rOMP355687, where __atomic_fetch_add was
being used incorrectly on x86, and this turns out to be caused by the
following added conditionals:
```
#if defined(KMP_ARCH_MIPS)
```
The problem is, these macros are always defined, and are either 0 or 1
depending on the architecture. E.g. the correct way to test for MIPS
is:
```
#if KMP_ARCH_MIPS
```
Reviewers: petarj, jlpeyton, Hahnfeld, AndreyChurbanov
Reviewed By: petarj, AndreyChurbanov
Subscribers: AndreyChurbanov, sdardis, arichardson, atanasyan, jfb, jdoerfert, openmp-commits, llvm-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60938
llvm-svn: 358911
Fix the test to run it really in SPMD mode without runtime. Previously
it was run in SPMD + full runtime mode and does not allow to cehck the
functionality correctly.
llvm-svn: 358902
Summary:
If the kernel is executed in SPMD mode and the L2+ parallel for region
with the dynamic scheduling is executed, dynamic scheduling functions
are called. They expect full runtime support, but SPMD kernels may be
executed without the full runtime. It leads to the runtime crash of the
compiled program. Patch fixes this problem + fixes handling of the
parallelism level in SPMD mode, which is required as part of this patch.
Reviewers: gtbercea, kkwli0, grokos
Subscribers: guansong, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60578
llvm-svn: 358442
This change replaces some of the assembly functions in z_Linux_asm.S
for inline asm in kmp.h. This allows better interaction with compiler
tools and sanitizers.
Differential Revision: https://reviews.llvm.org/D60423
llvm-svn: 358438
* Replace HBWMALLOC API with more general MEMKIND API, new functions
and variables added.
* Have libmemkind.so loaded when accessible.
* Redirect memspaces to default one except for high bandwidth which
is processed separately.
* Ignore some allocator traits e.g., sync_hint, access, pinned, while
others are processed normally e.g., alignment, pool_size, fallback,
fb_data, partition.
* Add tests for memory management
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D59783
llvm-svn: 357929
This patch cleans up the bookkeeping code for the load balancing dynamic mode.
When a thread is moved to or from the thread pool, the th_active_in_pool flag
and the __kmp_thread_pool_active_nth global counter are both updated. This
removes the need for the corrective code in the main wait loop. Another global
counter, __kmp_thread_pool_nth, was removed completely, as it was only used for
debugging, but was not under KMP_DEBUG.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D59508
llvm-svn: 357927