Commit Graph

1211 Commits

Author SHA1 Message Date
Ye Luo 262289c103 [OpenMP] mark target task untied
OpenMP specification Tasking Terminology
target task :A mergeable and untied task that ...

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D107686
2021-08-07 12:31:20 -04:00
Shilei Tian 680c71b127 [OpenMP] Clean up for hidden helper task
This patch makes some clean up for code of hidden helper task.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D107008
2021-08-04 12:36:44 -04:00
Shilei Tian 9f5d6ea52e [OpenMP] Fix performance regression reported in bug #51235
This patch fixes the "performance regression" reported in https://bugs.llvm.org/show_bug.cgi?id=51235. In fact it has nothing to do with performance. The root cause is, the stolen task is not allowed to execute by another thread because by default it is tied task. Since hidden helper task will always be executed by hidden helper threads, it should be untied.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D107121
2021-08-04 12:34:49 -04:00
Lechen Yu 3bc8ce5dd7 [openmp] Add OMPT initialization in libomptarget
When loading libomptarget, the init function in libomptarget/src/rtl.cpp
will search for the libomptarget_start_tool function using libdl.
libomptarget_start_tool will pass those OMPT callbacks related to target
constructs to libomptarget

Differential Revision: https://reviews.llvm.org/D99803
2021-08-04 18:00:11 +02:00
AndreyChurbanov 8e29b4b323 [OpenMP] libomp: taskwait depend implementation fixed.
Fix for https://bugs.llvm.org/show_bug.cgi?id=49723.
Eliminated references from task dependency hash to node allocated on stack,
thus eliminated accesses to stale memory. So the node now never freed.
Uncommented assertion which triggered when stale memory accessed.
Removed unneeded ref count increment for stack allocated node.

Differential Revision: https://reviews.llvm.org/D106705
2021-08-03 15:45:20 +03:00
AndreyChurbanov 8b81524c6d [OpenMP][NFC] libomp: silence warnings on unused variables.
Put declarations/definitions of unused variables under corresponding macros
to silence clang build warnings.

Differential Revision: https://reviews.llvm.org/D106608
2021-07-30 17:04:42 +03:00
Terry Wilmarth d8e4cb9121 [OpenMP] libomp: Add new experimental barrier: two-level distributed barrier
Two-level distributed barrier is a new experimental barrier designed
for Intel hardware that has better performance in some cases than the
default hyper barrier.

This barrier is designed to handle fine granularity parallelism where
barriers are used frequently with little compute and memory access
between barriers. There is no need to use it for codes with few
barriers and large granularity compute, or memory intensive
applications, as little difference will be seen between this barrier
and the default hyper barrier. This barrier is designed to work
optimally with a fixed number of threads, and has a significant setup
time, so should NOT be used in situations where the number of threads
in a team is varied frequently.

The two-level distributed barrier is off by default -- hyper barrier
is used by default. To use this barrier, you must set all barrier
patterns to use this type, because it will not work with other barrier
patterns. Thus, to turn it on, the following settings are required:

KMP_FORKJOIN_BARRIER_PATTERN=dist,dist
KMP_PLAIN_BARRIER_PATTERN=dist,dist
KMP_REDUCTION_BARRIER_PATTERN=dist,dist

Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER,
and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed
barrier.

Patch fixed for ITTNotify disabled builds and non-x86 builds

Co-authored-by: Jonathan Peyton <jonathan.l.peyton@intel.com>
Co-authored-by: Vladislav Vinogradov <vlad.vinogradov@intel.com>

Differential Revision: https://reviews.llvm.org/D103121
2021-07-29 14:09:26 -05:00
Joachim Protze e32e1dae61 [OpenMP][Tests] Fix test compatibility
gcc and clang disagree in how the event handle needs to be handled.
According to OpenMP LC, gcc is right. Will open clang bug report
2021-07-28 00:08:32 +02:00
Joachim Protze 3c76e99291 [OpenMP] Fix deadlock for detachable task with child tasks
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=49066.

For detachable tasks, the assumption breaks that the proxy task cannot have
remaining child tasks when the proxy completes.
In stead of increment/decrement the incomplete task count, a high-order bit
is flipped to mark and wait for the incomplete proxy task.

Differential Revision: https://reviews.llvm.org/D101082
2021-07-28 00:01:35 +02:00
Vignesh Balasubramanian 23eced9ead Convert the error to warning for enabling OMPD in non-Linux platform
OMPD is enabled by default on Linux machines and disabled on others.
However, if explicitly enabled it throws an error and exit while configuring.

It is mentioned in Bug: https://bugs.llvm.org/show_bug.cgi?id=51121

This patch, instead of throwing error, disables OMPD support with a warning message,
so configuration can continue.

Reviewed By: @protze.joachim
Differential Revision: https://reviews.llvm.org/D106682
2021-07-27 17:25:27 +05:30
Joachim Protze c46ccb8538 [OpenMP][tests][NFC] Update test status for gcc 11 and 12
gcc 11 introduced support for depend clause, but the gomp interface of libomp
does not yet handle the information.
Also remove -fopenmp-version=50, which is no longer needed for clang, but not
supported by gcc.
2021-07-25 18:56:36 +02:00
Shilei Tian c2c43132f6 [OpenMP] Fix bug 50022
Bug 50022 [0] reports target nowait fails in certain case, which is added in this
patch. The root cause of the failure is, when the second task is created, its
parent's `td_incomplete_child_tasks` will not be incremented because there is no
parallel region here thus its team is serialized. Therefore, when the initial
thread is waiting for its unfinished children tasks, it thought there is only
one, the first task, because it is hidden helper task, so it is tracked. The
second task will only be pushed to the queue when the first task is finished.
However, when the first task finishes, it first decrements the counter of its
parent, and then release dependences. Once the counter is decremented, the thread
will move on because its counter is reset, but actually, the second task has not
been executed at all. As a result, since in this case, the main function finishes,
then `libomp` starts to destroy. When the second task is pushed somewhere, all
some of the structures might already have already been destroyed, then anything
could happen.

This patch simply moves `__kmp_release_deps` ahead of decrement of the counter.
In this way, we can make sure that the initial thread is aware of the existence
of another task(s) so it will not move on. In addition, in order to tackle
dependence chain starting with hidden helper thread, when hidden helper task is
encountered, we force the task to release dependences.

Reference:
[0] https://bugs.llvm.org/show_bug.cgi?id=50022

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D106519
2021-07-23 16:54:11 -04:00
Shilei Tian ea452353c0 [OpenMP] Refined the logic to give a regular task from a hidden helper task
In current implementation, if a regular task depends on a hidden helper task,
and when the hidden helper task is releasing its dependences, it directly calls
`__kmp_omp_task`. This could cause a problem that if `__kmp_push_task` returns
`TASK_NOT_PUSHED`, the task will be executed immediately. However, the hidden
helper threads are assumed to only execute hidden helper tasks. This could cause
problems because when calling `__kmp_omp_task`, the encountering gtid, which is
not the real one of the thread, is passed.

This patch uses `__kmp_give_task`, but because it is a static function, a new
wrapper `__kmpc_give_task` is added.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D106572
2021-07-22 19:21:29 -04:00
Shilei Tian 996baa58a4 [OpenMP] Fixed a segmentation fault when using taskloop and target nowait
The synchronization of task loop misses hidden helper tasks, causing segmentation
fault reported in https://bugs.llvm.org/show_bug.cgi?id=50002.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D106220
2021-07-19 21:09:05 -04:00
Peyton, Jonathan L 424f14f0d2 [OpenMP] Fix one sign-compare warning from GCC 2021-07-13 12:36:12 -05:00
Peyton, Jonathan L 405eefe464 [OpenMP][NFC] Change comment style to eliminate warnings from GCC
Standalone build for OpenMP runtime using GCC is giving -Wcomment
warnings where a backslash newline is encountered in the // style
comment. This switches the // style for /* style to silence the
warnings.
2021-07-13 12:27:08 -05:00
Hansang Bae db635a28e6 [OpenMP] Minor improvement in task allocation
This patch includes a few changes to improve task allocation
performance slightly. These changes are enough to restore performance
drop observed after introducing hidden helper.

Differential Revision: https://reviews.llvm.org/D105715
2021-07-13 09:07:14 -05:00
Roman Lebedev 4709d9d5be
[libomp] ompd_init(): fix heap-buffer-overflow when constructing libompd.so path
There is no guarantee that the space allocated in `libname`
is enough to accomodate the whole `dl_info.dli_fname`,
because it could e.g. have an suffix  - `.5`,
and that highlights another problem - what it should do about suffxies,
and should it do anything to resolve the symlinks before changing the filename?

```
$ LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"  ./src/utilities/rstest/rstest -c /tmp/f49137920.NEF
dl_info.dli_fname "/usr/local/lib/libomp.so.5"
strlen(dl_info.dli_fname) 26
lib_path_length 14
lib_path_length + 12 26
=================================================================
==30949==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000002a at pc 0x000000548648 bp 0x7ffdfa0aa780 sp 0x7ffdfa0a9f40
WRITE of size 27 at 0x60300000002a thread T0
    #0 0x548647 in strcpy (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x548647)
    #1 0x7fb9e3e3d234 in ompd_init() /repositories/llvm-project/openmp/runtime/src/ompd-specific.cpp:102:5
    #2 0x7fb9e3dcb446 in __kmp_do_serial_initialize() /repositories/llvm-project/openmp/runtime/src/kmp_runtime.cpp:6742:3
    #3 0x7fb9e3dcb40b in __kmp_get_global_thread_id_reg /repositories/llvm-project/openmp/runtime/src/kmp_runtime.cpp:251:7
    #4 0x59e035 in main /home/lebedevri/rawspeed/build-Clang-SANITIZE/../src/utilities/rstest/rstest.cpp:491
    #5 0x7fb9e3762d09 in __libc_start_main csu/../csu/libc-start.c:308:16
    #6 0x4df449 in _start (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x4df449)

0x60300000002a is located 0 bytes to the right of 26-byte region [0x603000000010,0x60300000002a)
allocated by thread T0 here:
    #0 0x55cc5d in malloc (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x55cc5d)
    #1 0x7fb9e3e3d224 in ompd_init() /repositories/llvm-project/openmp/runtime/src/ompd-specific.cpp:101:17
    #2 0x7fb9e3762d09 in __libc_start_main csu/../csu/libc-start.c:308:16

SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x548647) in strcpy
Shadow bytes around the buggy address:
  0x0c067fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c067fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c067fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c067fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c067fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c067fff8000: fa fa 00 00 00[02]fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==30949==ABORTING
Aborted
```
2021-07-13 15:36:46 +03:00
Joachim Protze 681055ea69 [OpenMP] Remove TSAN annotations from libomp
The annotations in libomp were never built by default. The annotations are
also superseded by the annotations which the OMPT tool libarcher.so provides.
With respect to libarcher, libomp behaves as if libarcher would be the last
element of OMP_TOOL_LIBARARIES. I.e., if no other OMPT tool gets active,
libarcher will check if an OpenMP application is built with TSan.

Since libarcher gets loaded by default, enabling LIBOMP_TSAN_SUPPORT would
result in redundant annotations for TSan, which slightly differ in details
and coverage (e.g. task dependencies are not handled well by the annotations
in libomp).

This patch removes all TSan annotations from the OpenMP runtime code.

Differential Revision: https://reviews.llvm.org/D103767
2021-07-12 18:49:11 +02:00
Michał Górny 2b0d95fb58 [openmp] [test] Add missing <limits> include to capacity_nthreads
Differential Revision: https://reviews.llvm.org/D105474
2021-07-06 20:39:53 +02:00
Hansang Bae f1b9ce2736 [OpenMP] Fix a few issues with hidden helper task
This patch includes the following changes to address a few issues when
using hidden helper task.

- Assertion is triggered when there are inadvertent calls to hidden
  helper functions on non-Linux OS
- Added deinit code in __kmp_internal_end_library function to fix random
  shutdown crashes
- Moved task data access into the lock-guarded region in __kmp_push_task

Differential Revision: https://reviews.llvm.org/D105308
2021-07-01 17:10:32 -05:00
Johannes Doerfert 4eb90e893f Revert "[OpenMP] Add Two-level Distributed Barrier"
This reverts commit 25073a4ecf.

This breaks non-x86 OpenMP builds for a while now. Until a solution is
ready to be upstreamed we revert the feature and unblock those builds.
See:
  https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821
and
  https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821

The currently proposed fix (D104788) seems not to be ready yet:
  https://reviews.llvm.org/D104788#2841928
2021-06-29 09:38:27 -05:00
Johannes Doerfert bc8bb3df35 Revert "[omp] Fix build without ITT after D103121 changes"
This reverts commit eab1fd389b.

This commit fixed a problem with 25073a4ecf (D103121) which is the one
we actually need to revert to unblock non-X86 builds of OpenMP. Can be
reapplied, or merged into, D103121 as it goes in again.
2021-06-29 09:38:27 -05:00
AndreyChurbanov b2787945f9 [OpenMP][NFC] libomp: fix wrong debug assertion.
Normalized bounds of chunk of iterations to steal from are inclusive,
so upper bound should not be decremented in expression to check.
Problem was in attempt to steal iterations 0:0, that caused assertion after
wrong decrement. Reported in comment to https://reviews.llvm.org/D103648.

Differential Revision: https://reviews.llvm.org/D104880
2021-06-25 02:02:14 +03:00
AndreyChurbanov 5dd4d0d46f [OpenMP] libomp: fix dynamic loop dispatcher
Restructured dynamic loop dispatcher code.
Fixed use of dispatch buffers for nonmonotonic dynamic (static_steal) schedule:
- eliminated possibility of stealing iterations of the wrong loop when victim
  thread changed its buffer to work on another loop;
- fixed race when victim thread changed its buffer to work in nested parallel;
- eliminated "static" property of the schedule, that is now a single thread can
  execute whole loop.

Differential Revision: https://reviews.llvm.org/D103648
2021-06-22 16:29:01 +03:00
Vladislav Vinogradov eab1fd389b [omp] Fix build without ITT after D103121 changes
Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D104638
2021-06-21 18:17:52 +03:00
Terry Wilmarth 25073a4ecf [OpenMP] Add Two-level Distributed Barrier
Two-level distributed barrier is a new experimental barrier designed
for Intel hardware that has better performance in some cases than the
default hyper barrier.

This barrier is designed to handle fine granularity parallelism where
barriers are used frequently with little compute and memory access
between barriers.  There is no need to use it for codes with few
barriers and large granularity compute, or memory intensive
applications, as little difference will be seen between this barrier
and the default hyper barrier. This barrier is designed to work
optimally with a fixed number of threads, and has a significant setup
time, so should NOT be used in situations where the number of threads
in a team is varied frequently.

The two-level distributed barrier is off by default -- hyper barrier
is used by default. To use this barrier, you must set all barrier
patterns to use this type, because it will not work with other barrier
patterns.  Thus, to turn it on, the following settings are required:

KMP_FORKJOIN_BARRIER_PATTERN=dist,dist
KMP_PLAIN_BARRIER_PATTERN=dist,dist
KMP_REDUCTION_BARRIER_PATTERN=dist,dist

Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER,
and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed
barrier.

Differential Revision: https://reviews.llvm.org/D103121
2021-06-16 15:34:55 -05:00
AndreyChurbanov 610fea65e2 [OpenMP] libomp: fixed implementation of OMP 5.1 inoutset task dependence type
Refactored code of dependence processing and added new inoutset dependence type.
Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps.
All dependence flags library gets so far and corresponding dependence types:
1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET.

Differential Revision: https://reviews.llvm.org/D97085
2021-06-16 14:47:29 +03:00
Joachim Protze d2a7871b5e [OpenMP][NFC] Add back suppression of warning
Commit cff215565e did not fix all unused variables in different builds,
so adding back the suppression for now.
2021-06-16 10:14:59 +02:00
Joachim Protze cff215565e [OpenMP] Remove unused variables from libomp code
Several variables were left unused as a result of different patches removing
their use.

Two variables have some use:
`poll_count` is used by the KMP_BLOCKING macro only under certain conditions.
Adding (void) to tell the compiler to ignore the unused variable.

`padding` is a dummy stack allocation with no intent to be used. Also adding
(void) to make the compiler ignore the unused variable.

Differential Revision: https://reviews.llvm.org/D104303
2021-06-16 09:33:46 +02:00
Peyton, Jonathan L 56da28240f [OpenMP] Add GOMP 5.0 version symbols to API
* Add GOMP versioned pause functions
* Add GOMP versioned affinity format functions

To do the affinity format functions, only attach versioned symbols
to the APPEND Fortran entries (e.g., omp_set_affinity_format_) since
GOMP only exports two symbols (one for Fortran, one for C). Our
affinity format functions have three symbols.
e.g., with omp_set_affinity_format:
1) omp_set_affinity_format (Fortran interface)
2) omp_set_affinity_format_ (Fortran interface)
3) ompc_set_affinity_format (C interface)

Have the GOMP version of the C symbol alias the ompc_* 3) version
instead of the Fortran unappended version 1).

Differential Revision: https://reviews.llvm.org/D103647
2021-06-15 16:25:00 -05:00
Peyton, Jonathan L 92baf414db [OpenMP] Fix affinity determine capable algorithm on Linux
Remove strange checks for syscall() arguments where mask is NULL.
Valgrind reports these as error usages for the syscall.
Instead, just check if CACHE_LINE bytes is long enough. If not, then
search for the size. Also, by limiting the first size detection
attempt to CACHE_LINE bytes, instead of 1MB, we don't use more than one
cache line for the mask size. Before this patch, sometimes the returned
mask size was 640 bytes (10 cache lines) because the initial call to
getaffinity() was limited only by the internal kernel mask size
which can be very large.

Differential Revision: https://reviews.llvm.org/D103637
2021-06-15 16:21:30 -05:00
Peyton, Jonathan L 0ddde4d865 [OpenMP] Lazily assign root affinity
Lazily set affinity for root threads. Previously, the root thread
executing middle initialization would attempt to assign affinity
to other existing root threads. This was not working properly as the
set_system_affinity() function wasn't setting the affinity for the
target thread. Instead, the middle init thread was resetting the
its own affinity using the target thread's affinity mask.

Differential Revision: https://reviews.llvm.org/D103625
2021-06-15 16:21:06 -05:00
AndreyChurbanov 9ce2e5e700 Revert "[OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type"
This reverts commit a1f550e052.

Revert in order to fix backwards compatibility breakage
caused by type size change for task dependence flag.
2021-06-09 17:38:38 +03:00
Vignesh Balasubramanian f61602b0d3 [OpenMP][OMPD] Implementation of OMPD debugging library - libompd.
This is the first of seven patches that implements OMPD, a debugging interface to support debugging of OpenMP programs.
It contains support code required in "openmp/runtime" for OMPD implementation.

Reviewed By: @hbae
Differential Revision: https://reviews.llvm.org/D100181
2021-06-08 16:44:22 +05:30
Peyton, Jonathan L d70e1f1276 [OpenMP][runtime] add .clang-tidy file
Use same checks as compiler-rt which removes checks for readability-*
and llvm-header style.

Differential Revision: https://reviews.llvm.org/D103711
2021-06-07 13:56:39 -05:00
AndreyChurbanov a1f550e052 [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type
Refactored code of dependence processing and added new inoutset dependence type.
Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps.
Size of type of the dependence flag changed from 1 to 4 bytes in clang.
All dependence flags library gets so far and corresponding dependence types:
1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET.

Differential Revision: https://reviews.llvm.org/D97085
2021-06-07 21:42:51 +03:00
Bryan Chan 54f059c900 [OpenMP] Check loc for NULL before dereferencing it
The ident_t * argument in __kmp_get_monotonicity was being used without
a customary NULL check, causing the function to crash in a Debug build.
Release builds were not affected thanks to dead store elimination.
2021-06-07 10:45:48 -04:00
Terry Wilmarth 8ec9aa236e [OpenMP] Add experimental nesting mode feature
Nesting mode is a new experimental feature in the OpenMP
runtime. It allows a user to set up nesting for an application in a
way that corresponds to the hardware topology levels on the machine an
application is being run on.  For example, if a machine has 2 sockets,
each with 12 cores, then use of nesting mode could set up an outer
level of nesting that uses 2 threads per parallel region, and an inner
level of nesting that uses 12 threads per parallel region.

Nesting mode is controlled with the KMP_NESTING_MODE environment
variable as follows:

1) KMP_NESTING_MODE = 0: Nesting mode is off (default); max-active-levels-var
is set to 1 (the default -- nesting is off, nested parallel regions
are serialized).

2) KMP_NESTING_MODE = 1: Nesting mode is on, and a number of threads
will be assigned for each level discovered in the machine topology;
max-active-levels-var is set to the number of levels discovered.

3) KMP_NESTING_MODE = n, n>1: [Note: this option is experimental and may change
or be removed in the future.] Nesting mode is on, and a number of
threads will be assigned for each topology level discovered on the
machine, up to k<=n levels (since there may be fewer than n levels
discovered in the topology), and beyond the kth level, nested parallel
regions will be serialized; NOTE: max-active-levels-var is 1 (the default --
nesting is off, and nested parallel regions are serialized until the
user changes max-active-levels-var.

If the user sets OMP_NUM_THREADS or OMP_MAX_ACTIVE_LEVELS, they will
override KMP_NESTING_MODE settings for the associated environment
variables. The detected topology may be limited by an affinity mask
setting on the initial thread, or if the user sets KMP_HW_SUBSET. See
also: KMP_HOT_TEAMS_MAX_LEVEL for controlling use of hot teams for
nested parallel regions. Note that this feature only sets numbers of
threads used at nesting levels.  The user should make use of
OMP_PLACES and OMP_PROC_BIND or KMP_AFFINITY for affinitizing those
threads, if desired.

Differential Revision: https://reviews.llvm.org/D102188
2021-06-04 16:01:11 -05:00
Peyton, Jonathan L 56dd158c32 [OpenMP] fix spelling error in message-converter.pl 2021-06-04 11:20:32 -05:00
Peyton, Jonathan L f7655f3df3 [OpenMP] Fix improper printf format specifier 2021-06-02 11:04:48 -05:00
Hansang Bae 7ba4e96ede [OpenMP] Use new task type/flag for taskwait depend events.
Differential Revision: https://reviews.llvm.org/D103464
2021-06-02 10:16:38 -05:00
Peyton, Jonathan L 2020c981fa [OpenMP] Add L2-Tile equivalence for KNL
When on KNL and L2 or Tile layer is detected, manually add
the corresponding layer which is equivalent.

Differential Revision: https://reviews.llvm.org/D102865
2021-06-01 14:17:13 -05:00
Hansang Bae cf5c94ef08 [OpenMP] Define named constants for interop's foreign runtime ID
Also added missing Fortran definitions for interop support.

Differential Revision: https://reviews.llvm.org/D102883
2021-06-01 13:06:59 -05:00
Hansang Bae 95cefacfe1 [OpenMP] Fix crashing critical section with hint clause
Runtime was using the default lock type without using the hint.

Differential Revision: https://reviews.llvm.org/D102955
2021-05-24 17:25:01 -05:00
AndreyChurbanov aa6e7e8da8 [OpenMP] libomp: move warnings to after library initialization
Warnings on deprecated api cannot be suppressed if the library is not initialized.
With this change it is possible to set KMP_WARNINGS=false to suppress the warnings.

Differential Revision: https://reviews.llvm.org/D102676
2021-05-21 23:47:23 +03:00
Shilei Tian af6511d730 [OpenMP] Fixed Bug 49356
Bug 49356 (https://bugs.llvm.org/show_bug.cgi?id=49356) reports crash in
the test case `tasking/bug_taskwait_detach.cpp`, which is caused by the wrong
function declaration. `gtid` in `__kmpc_omp_task` should be `kmp_int32`.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D102584
2021-05-17 12:14:54 -04:00
Christopher Pulido 4fb0aaf033 [OpenMP] Changes to enable MSVC ARM64 build of libomp
This is the first in a series of changes to the OpenMP runtime
that have been done internally by Microsoft. This patch makes
the necessary changes to enable libomp.dll to build with
the MSVC compiler targeting ARM64.

Differential Revision: https://reviews.llvm.org/D101173
2021-05-11 23:03:12 +03:00
Peyton, Jonathan L c765d140fe [OpenMP] Fix hidden helper + affinity
When KMP_AFFINITY is set, each worker thread's gtid value is used as an
index into the place list to determine the thread's placement. With hidden
helpers enabled, this gtid value is shifted down leading to unexpected
shifted thread placement. This patch restores the previous behavior by
adjusting the mask index to take the number of hidden helper threads
into account.

Hidden helper threads are given the full initial mask and do not
participate in any of the other affinity mechanisms (place partitioning,
balanced affinity). Their affinity is only printed for debug builds.

Differential Revision: https://reviews.llvm.org/D101882
2021-05-11 08:54:22 -05:00
Peyton, Jonathan L 9982f33e2c [OpenMP] Refactor/Rework topology discovery code
This patch does the following:

1) Introduce kmp_topology_t as the runtime-friendly structure (the
corresponding global variable is __kmp_topology) to determine the
exact machine topology which can vary widely among current and future
architectures. The current design is not easy to expand beyond the assumed
three layer topology: sockets, cores, and threads so a rework capable of
using the existing KMP_AFFINITY mechanisms is required.

This new topology structure has:
* The depth and types of the topology
* Ratio count for each consecutive level (e.g., number of cores per
   socket, number of threads per core)
* Absolute count for each level (e.g., 2 sockets, 16 cores, 32 threads)
* Equivalent topology layer map (e.g., Numa domain is equivalent to
   socket, L1/L2 cache equivalent to core)
* Whether it is uniform or not

The hardware threads are represented with the kmp_hw_thread_t
structure. This structure contains the ids (e.g., socket 0, core 1,
thread 0) and other information grabbed from the previous Address
structure. The kmp_topology_t structure contains an array of these.

2) Generalize the KMP_HW_SUBSET envirable for the new
kmp_topology_t structure. The algorithm doesn't assume any order with
tiles,numa domains,sockets,cores,threads. Instead it just parses the
envirable, makes sure it is consistent with the detected topology
(including taking into account equivalent layers) and then trims away
the unneeded subset of hardware threads. To enable this, a new
kmp_hw_subset_t structure is introduced which contains a vector of
items (hardware type, number user wants, offset). Any keyword within
__kmp_hw_get_keyword() can be used as a name and can be shortened as
well. e.g.,
KMP_HW_SUBSET=1s,2numa,4tile,2c,3t can be used on the KNL SNC-4 machine.

3) Simplify topology detection functions so they only do the singular
task of detecting the machine's topology. Printing, and all
canonicalizing functionality is now done afterwards. So many lines of
duplicated code are eliminated.

4) Add new ll_caches and numa_domains to OMP_PLACES, and
consequently, KMP_AFFINITY's granularity setting. All the names within
__kmp_hw_get_keyword() are available for use in OMP_PLACES or
KMP_AFFINITY's granularity setting.

5) Simplify and future-proof code where explicit lists of allowed
affinity settings keywords inside if() conditions.

6) Add x86 CPUID leaf 4 cache detection to existing x2apic id method
so equivalent caches could be detected (in particular for the ll_caches
place).

Differential Revision: https://reviews.llvm.org/D100997
2021-05-03 18:00:24 -05:00
Martin Storsjö 01d27fc408 [OpenMP] Fix warnings due to redundant semicolons. NFC. 2021-05-02 21:51:06 +03:00
Kevin Athey bc9120047b Correct tiny misspelling (readlef -> readelf).
Getting my feet wet here as a new committer.

Correct misspelling in check-depends.pl.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D101552
2021-04-30 17:20:35 -07:00
Peyton, Jonathan L 4457565757 [OpenMP] Implement GOMP task reductions
Implement the remaining GOMP_* functions to support task reductions
in taskgroup, parallel, loop, and taskloop constructs.  The unused mem
argument to many of the work-sharing constructs has to do with the
scan() directive/ inscan() modifier.  If mem is set, each function
will call KMP_FATAL() and tell the user scan/inscan is unsupported.  The
GOMP reduction implementation is kept separate from our implementation
because of how GOMP presents reduction data and computes the reductions.
GOMP expects the privatized copies to be present even after a #pragma
omp parallel reduction(task:...) region has ended so the data is stored
inside GOMP's uintptr_t* data pseudo-structure.  This style is tightly
coupled with GCC compiler codegen.  There also isn't any init(),
combiner(), fini() functions in GOMP's codegen so the two
implementations were to disparate to try to wrap GOMP's around our own.

Differential Revision: https://reviews.llvm.org/D98806
2021-04-16 16:36:31 -05:00
Peyton, Jonathan L 5ebbb366c4 [OpenMP] Allow affinity to re-detect for child processes
Current atfork() handler for child processes does not reset
the affinity masks array which prevents users from setting their own
affinity in child processes.

Differential Revision: https://reviews.llvm.org/D99218
2021-04-16 16:34:02 -05:00
Hansang Bae 9b98497b44 [OpenMP] Add omp_target_is_accessible() to header files
-- Added omp_target_is_accessible to the header files
-- Added missing const qualifier to device memory routines

Differential Revision: https://reviews.llvm.org/D100420
2021-04-16 07:54:15 -05:00
Hansang Bae 77dc7b4653 [OpenMP] Fix printing routine for OMP_TOOL_VERBOSE_INIT
Also fixed typo in the verbose message.

Differential Revision: https://reviews.llvm.org/D100414
2021-04-14 07:55:26 -05:00
Hansang Bae 3da61ddae7 [OpenMP] Define omp_is_initial_device() variants in omp.h
omp_is_initial_device() is marked as a built-in function in the current
compiler, and user code guarded by this call may be optimized away,
resulting in undesired behavior in some cases. This patch provides a
possible fix for such cases by defining the routine as a variant
function and removing it from builtin list.

Differential Revision: https://reviews.llvm.org/D99447
2021-04-06 16:58:01 -05:00
Peyton, Jonathan L 2aebb7cb3c [OpenMP] Fix incorrect KMP_STRLEN() macro
The second argument to the strnlen_s(str, size) function should be
sizeof(str) when str is a true array of characters with known size
(instead of just a char*). Use type traits to determine if first
parameter is a character array and use the correct size based on that
trait.

Differential Revision: https://reviews.llvm.org/D98209
2021-04-05 09:03:09 -05:00
Hansang Bae 467f39249d [OpenMP] Misc. changes that add or remove pointer/bound checks
-- Added or moved checks to appropriate places.
-- Removed ineffective null check where the pointer is already being
   dereferenced around the code.
-- Initialized variables that can be used without definitions.
-- Added call to dlclose/FreeLibrary in OMPT tool activation.
-- Added a new build compiler definition.

Differential Revision: https://reviews.llvm.org/D98584
2021-03-23 18:55:08 -05:00
Shilei Tian 2df65f87c1 [OpenMP] Fixed a crash in hidden helper thread
It is reported that after enabling hidden helper thread, the program
can hit the assertion `new_gtid < __kmp_threads_capacity` sometimes. The root
cause is explained as follows. Let's say the default `__kmp_threads_capacity` is
`N`. If hidden helper thread is enabled, `__kmp_threads_capacity` will be offset
to `N+8` by default. If the number of threads we need exceeds `N+8`, e.g. via
`num_threads` clause, we need to expand `__kmp_threads`. In
`__kmp_expand_threads`, the expansion starts from `__kmp_threads_capacity`, and
repeatedly doubling it until the new capacity meets the requirement. Let's
assume the new requirement is `Y`.  If `Y` happens to meet the constraint
`(N+8)*2^X=Y` where `X` is the number of iterations, the new capacity is not
enough because we have 8 slots for hidden helper threads.

Here is an example.
```
#include <vector>

int main(int argc, char *argv[]) {
  constexpr const size_t N = 1344;
  std::vector<int> data(N);

#pragma omp parallel for
  for (unsigned i = 0; i < N; ++i) {
    data[i] = i;
  }

#pragma omp parallel for num_threads(N)
  for (unsigned i = 0; i < N; ++i) {
    data[i] += i;
  }

  return 0;
}
```
My CPU is 20C40T, then `__kmp_threads_capacity` is 160. After offset,
`__kmp_threads_capacity` becomes 168. `1344 = (160+8)*2^3`, then the assertions
hit.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D98838
2021-03-18 18:25:36 -04:00
Hansang Bae a6f9cb6adc [OpenMP] Add runtime interface for OpenMP 5.1 error directive
The proposed new interface is for supporting `at(execution)` clause in the
error directive.

Differential Revision: https://reviews.llvm.org/D98448
2021-03-16 08:55:25 -05:00
Peyton, Jonathan L 7085f04573 [OpenMP] Remove unused cpu_stackoffset member 2021-03-15 16:52:04 -05:00
AndreyChurbanov aaf16b80dd [OpenMP] libomp: eliminate pause from atomic CAS loops
For clang this change is NFC cleanup, because clang
never calls atomic functions from runtime library.

Basically, pause is good in spin-loops waiting for something.
Atomic CAS loops do not wait for anything,
each CAS failure means some other thread progressed.

Performance experiments show that the pause only causes unnecessary slowdown
on CPUs with slow pause instruction, no difference on CPUs with fast pause
instruction, removal of the pause gives lesser binary size which is good.

Differential Revision: https://reviews.llvm.org/D97079
2021-03-09 18:30:08 +03:00
AndreyChurbanov e4492b6f31 [OpenMP] NFC: temporarily disable assertion until the bug with dependences is fixed 2021-03-08 22:18:30 +03:00
Peyton, Jonathan L e2738b3758 [OpenMP] Fix potential integer overflow in dynamic schedule code
Restrict the chunk_size * chunk_num to only occur for valid
chunk_nums and reimplement calculating the limit to avoid overflow.

Differential Revision: https://reviews.llvm.org/D96747
2021-03-08 09:43:05 -06:00
tlwilmar 97d000cfc6 Added API for "masked" construct via two entrypoints: __kmpc_masked,
and __kmpc_end_masked. The "master" construct is deprecated. Changed
proc-bind keyword from "master" to "primary". Use of both master
construct and master as proc-bind keyword is still allowed, but
deprecated.

Remove references to "master" in comments and strings, and replace
with "primary" or "primary thread". Function names and variables were
not touched, nor were references to deprecated master construct. These
can be updated over time. No new code should refer to master.
2021-03-05 09:29:57 -06:00
Hansang Bae b6c2f538b2 [OpenMP] Add allocator support for target memory
This is a preview of allocator support for target memory that depends on the
offload runtime API which allocates memory as described below.

llvm_omp_target_alloc_host(size_t size, int device_num);
-- Returns non-migratable memory owned by host.
-- Memory is accessible by host and device(s).

llvm_omp_target_alloc_shared(size_t size, int device_num);
-- Returns migratable memory owned by host and device.
-- Memory is accessible by host and device.

llvm_omp_target_alloc_device(size_t size, int device_num);
-- Returns memory owned by device.
-- Memory is only accessible by device.

New memory space and predefined allocator names are
-- llvm_omp_target_host_mem_space
-- llvm_omp_target_shared_mem_space
-- llvm_omp_target_device_mem_space
-- llvm_omp_target_host_mem_alloc
-- llvm_omp_target_shared_mem_alloc
-- llvm_omp_target_device_mem_alloc

Differential Revision: https://reviews.llvm.org/D96669
2021-03-02 16:45:12 -06:00
Peyton, Jonathan L e83380fccc [OpenMP] Fix clang-cl build error regarding TSX intrinsics
Fix for https://bugs.llvm.org/show_bug.cgi?id=49339

The CMake check for the RTM intrinsics needs the -mrtm flag to be set
during the test. This way clang-cl correctly detects it has the
_xbegin() intrinsic. Otherwise, the CMake check fails.

Differential Revision: https://reviews.llvm.org/D97413
2021-03-02 07:47:42 -06:00
AndreyChurbanov 1df6e58e55 [OpenMP] libomp minor cleanup
Cleanup changes:
- check value read from file;
- remove dead code;
- make unsigned variable to read hexadecimal number to;
- add debug assertion to check ref count.

Differential Revision: https://reviews.llvm.org/D96893
2021-02-26 00:44:51 +03:00
AndreyChurbanov 4932101177 [OpenMP] libomp: fix ittnotify stack stitching for teams construct
Stitching id could be overridden causing reference of destroyed object
when number of teams is 1. The patch separates stitching id store
location for teams and parallel nested in teams.

Differential Revision: https://reviews.llvm.org/D96562
2021-02-26 00:23:24 +03:00
Peyton, Jonathan L d12ae7db99 [OpenMP] Fix accidental addition of use omp_lib_kinds
Fortran header accidentally had use omp_lib_kinds added inside a
subroutine and function. This patch removes the lines.
2021-02-25 12:49:56 -06:00
Harmen Stoppels a54f160b3a Prefer /usr/bin/env xxx over /usr/bin/xxx where xxx = perl, python, awk
Allow users to use a non-system version of perl, python and awk, which is useful
in certain package managers.

Reviewed By: JDevlieghere, MaskRay

Differential Revision: https://reviews.llvm.org/D95119
2021-02-25 11:32:27 +01:00
Shilei Tian e5da63d5a9 [OpenMP] Fixed a crash when offloading to x86_64 with target nowait
PR#49334 reports a crash when offloading to x86_64 with `target nowait`,
which is caused by referencing a nullptr. The root cause of the issue is, when
pushing a hidden helper task in `__kmp_push_task`, it also maps the gtid to its
shadow gtid, which is wrong.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97329
2021-02-24 12:37:30 -05:00
Joachim Protze 35ab6d6390 [OpenMP][Tests][NFC] rename macro to avoid naming clash
When including <ostream>, the register_callback macro of the OMPT callback.h
clashes with a function defined in ostream. This patch renames the macro
and includes ompt into the macro name.
2021-02-24 18:03:54 +01:00
Peyton, Jonathan L 56223b1e91 [OpenMP] Help static loop code avoid over/underflow
This code alleviates some pathological loop parameters (lower,
upper, stride) within calculations involved in the static loop code.  It
bounds the chunk size to the trip count if it is greater than the trip
count and also minimizes problematic code for when trip count < nth.

Differential Revision: https://reviews.llvm.org/D96426
2021-02-22 13:22:01 -06:00
Peyton, Jonathan L 1b968467c0 [OpenMP] Remove shutdown attempt on Windows process detach
Only attempt shutdown if lpReserved is NULL. The Windows documentation
states:
When handling DLL_PROCESS_DETACH, a DLL should free resources such as
heap memory only if the DLL is being unloaded dynamically (the
lpReserved parameter is NULL). If the process is terminating (the
lpReserved parameter is non-NULL), all threads in the process except the
current thread either have exited already or have been explicitly
terminated by a call to the ExitProcess function, which might leave some
process resources such as heaps in an inconsistent state. In this case,
it is not safe for the DLL to clean up the resources. Instead, the DLL
should allow the operating system to reclaim the memory.

Differential Revision: https://reviews.llvm.org/D96750
2021-02-22 13:15:06 -06:00
Peyton, Jonathan L 8c73be9d86 [OpenMP] Limit number of dispatch buffers
This patch limits the number of dispatch buffers (used for
loop worksharing construct) to between 1 and 4096.

Differential Revision: https://reviews.llvm.org/D96749
2021-02-22 13:14:28 -06:00
Peyton, Jonathan L 55dff8b2e4 [OpenMP] Update HWLOC code for die level detection
Differential Revision: https://reviews.llvm.org/D96748
2021-02-22 13:05:55 -06:00
AndreyChurbanov 1611e5473c [OpenMP] libomp: cleanup some resource leaks
Close mutexattr and condattr local objects to eliminate resource leaks.

Differential Revision: https://reviews.llvm.org/D96892
2021-02-20 23:27:37 +03:00
Shilei Tian 309b00a42e [OpenMP][NFC] clang-format the whole openmp project
Same script as D95318. Test files are excluded.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D97088
2021-02-20 12:46:32 -05:00
AndreyChurbanov dab5d6c2eb [OpenMP] fix race condition in test 2021-02-18 02:27:49 +03:00
AndreyChurbanov cf1ddae7e3 [OpenMP][NFC] replaced 'dependencies' with 'dependences' in comments and debug prints 2021-02-18 00:38:18 +03:00
AndreyChurbanov 5631842d18 [OpenMP] NFC: fix test removing the target construct 2021-02-13 04:49:52 +03:00
AndreyChurbanov 091e8daa24 [OpenMP] fix test adding mapping of shared variables 2021-02-13 04:13:54 +03:00
Martin Storsjö 496ca4127e [OpenMP] Silence more warning flags
This silences warnings like these, in mingw builds with clang:

runtime/src/kmp_atomic.h:1021:13: warning: '__kmpc_atomic_cmplx8_rd' has C-linkage specified, but returns user-defined type 'kmp_cmplx64' (aka '__kmp_cmplx64_t') which is incompatible with C [-Wreturn-type-c-linkage]

runtime/src/z_Windows_NT_util.cpp:479:17: warning: cast from 'volatile void *' to 'type-parameter-0-0 *' drops volatile qualifier [-Wcast-qual]
    flag = (C *)th->th.th_sleep_loc;

runtime/src/z_Windows_NT_util.cpp:1321:14: warning: cast to 'void *' from smaller integer type 'DWORD' (aka 'unsigned long') [-Wint-to-void-pointer-cast]
  } else if ((void *)exit_val != (void *)th) {

Differential Revision: https://reviews.llvm.org/D96585
2021-02-12 21:55:32 +02:00
Martin Storsjö 16428a8d91 [OpenMP] Avoid warnings about unused static functions on windows
Add ifdefs around one function that only is used in unix build
configurations.

Add a void cast for a windows specific function that currently is
unused but may be intended to be used at some point.

Differential Revision: https://reviews.llvm.org/D96584
2021-02-12 21:55:31 +02:00
Martin Storsjö b388c84c09 [OpenMP] Remove two entirely unused variables
Differential Revision: https://reviews.llvm.org/D96583
2021-02-12 21:55:31 +02:00
Martin Storsjö b3d84790fa [OpenMP] Add void casts to silence unused variable warnings
These variables are used only in certain build configurations,
or marked with a todo comment indicating that they should be
used/checked/reported.

Differential Revision: https://reviews.llvm.org/D96582
2021-02-12 21:55:31 +02:00
Martin Storsjö 3f9519b768 [OpenMP] Only use #pragma comment(lib, ...) in MSVC build configurations
MinGW build configurations don't support this pragma (unless
compiling with clang, with -fms-extensions, and linking with
lld), and at least clang warns about it.

This library does end up linked by the cmake files anyway (as
long as the check works properly).

Differential Revision: https://reviews.llvm.org/D96581
2021-02-12 21:55:31 +02:00
Martin Storsjö 77632422bc [OpenMP] Fix the check for libpsapi for i386
check_library_exists fails for stdcall functions, because that
check doesn't include the necessary headers (and thus fails with
an undefined reference to _EnumProcessModules, when the import
library symbol actually is called _EnumProcessModules@16).

Merge the two previous checks check_include_files and
check_library_exists into one with check_c_source_compiles, and
merge the variables that indicate whether it succeeded.

Differential Revision: https://reviews.llvm.org/D96580
2021-02-12 21:55:30 +02:00
AndreyChurbanov 838dcdb5fc [OpenMP] libomp: minor changes to improve library performance
Three minor changes in this patch:
- added UNLIKELY hint to few rarely executed branches;
- replaced couple of run time checks with debug assertions;
- moved check of presence of ittnotify tool from inside the function call.

Differential Revision: https://reviews.llvm.org/D95816
2021-02-12 00:43:13 +03:00
Hansang Bae ffb21e7f05 [OpenMP] Enable omp_get_num_devices() on Windows
This patch enables omp_get_num_devices() and omp_get_initial_device() on
Windows by providing an alternative to dlsym on Windows, and proposes to
add a new libomptarget entry, __tgt_get_num_devices().

Differential Revision: https://reviews.llvm.org/D96182
2021-02-11 14:53:48 -06:00
Nawrin Sultana 4692bb4a8a [OpenMP] Add lower and upper bound in num_teams clause
This patch adds lower-bound and upper-bound to num_teams clause
according to OpenMP 5.1 specification. The initial number of teams
created is implementation defined, but it will be greater than or
equal to lower-bound and less than or equal to upper-bound. If
num_teams clause is not specified, the number of teams created is
implementation defined, but it will be greater or equal to 1.

Differential Revision: https://reviews.llvm.org/D95820
2021-02-10 13:58:50 -06:00
Shilei Tian 3c31b78455 [OpenMP] Fixed an issue that taskwait doesn't work on detachable task
D77609 mistakenly changed the bebavior of task waiting on detachable task that a detachable task is not waited, based on https://lists.llvm.org/pipermail/openmp-dev/2021-February/003836.html. This patch fixed it. Thank Raúl for the report.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95798
2021-02-03 13:12:43 -05:00
Peyton, Jonathan L ffca74b8b8 [OpenMP] Fix sign comparison warnings from GCC
New affinity patch introduced legitimate sign-compare warnings that
clang doesn't report but GCC-10 does. This removes the warnings by
changing two variables types to unsigned.

Differential Revision: https://reviews.llvm.org/D95818
2021-02-02 10:52:16 -06:00
AndreyChurbanov d7b12004bd [OpenMP] libomp: implement nteams-var and teams-thread-limit-var ICVs
The change includes OMP_NUM_TEAMS, OMP_TEAMS_THREAD_LIMIT env variables,
omp_set_num_teams, omp_get_max_teams, omp_set_teams_thread_limit,
omp_get_teams_thread_limit routines.

Differential Revision: https://reviews.llvm.org/D95003
2021-02-01 22:54:11 +03:00
Tobias Hieta c3c02d0d5a [OpenMP] Fix python3 compatibility in openmp's lit.cfg
Differential Revision: https://reviews.llvm.org/D95669
2021-02-01 08:20:26 +01:00
Jonathan Peyton 67773681c0 [OpenMP] Add environment variable to force monotonic dynamic scheduling
This patch introduces a new environment variable to force monotonic
behavior for users that absolutely need it.  This is in anticipation
of 5.0 change that uses non-monotonic behavior for dynamic scheduling
by default. Fixes for that and the actual switch are coming soon.

Differential Revision: https://reviews.llvm.org/D95263
2021-01-29 12:23:27 -06:00
AndreyChurbanov 7f5ad0e071 [OpenMP] libomp: fix build by cl with vs2019
Replace VLA with dynamic allocation using alloca().
This fixes https://bugs.llvm.org/show_bug.cgi?id=48919.

Differential Revision: https://reviews.llvm.org/D95627
2021-01-29 13:16:41 +03:00
AndreyChurbanov ac70a53653 [OpenMP] NFC: disabled two flakey tests as the bug in libomp not fixed yet 2021-01-29 00:54:13 +03:00
Shilei Tian c571b16834 [OpenMP] Disabled profiling in `libomp` by default to unblock link errors
Link error occurred when time profiling in libomp is enabled by default
because `libomp` is assumed to be a C library but the dependence on
`libLLVMSupport` for profiling is a C++ library. Currently the issue blocks all
OpenMP tests in Phabricator.

This patch set a new CMake option `OPENMP_ENABLE_LIBOMP_PROFILING` to
enable/disable the feature. By default it is disabled. Note that once time
profiling is enabled for `libomp`, it becomes a C++ library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95585
2021-01-28 07:24:32 -05:00
Peyton, Jonathan L 8e67134364 [OpenMP] Fix misleading warning for OMP_PLACES
When OMP_PLACES contains an invalid value, the warning informs the user
that the fallback is OMP_PLACES=threads, but the actual internal setting
is OMP_PLACES=cores and is detected as such with KMP_SETTINGS=1.
This patch informs the user that OMP_PLACES=cores is being used instead
of OMP_PLACES=threads.

Differential Revision: https://reviews.llvm.org/D95170
2021-01-27 14:27:24 -06:00
Peyton, Jonathan L 598c590b3c [OpenMP] Add cpuid leaf 1f topology discovery
This patch adds the new algorithm for topology discovery using cpuid
leaf 1f.  Only the new die level is detected and integrated into the
current affinity mechanisms including KMP_AFFINITY (granularity level
and compact/scatter algorithm), OMP_PLACES=dies, and KMP_HW_SUBSET.

Differential Revision: https://reviews.llvm.org/D95157
2021-01-27 14:27:23 -06:00
Peyton, Jonathan L 9f87c6b47d [OpenMP] Fix HWLOC topology detection for 2.0.x
HWLOC 2.0 has numa nodes as separate children and are not in the main
parent/child topology tree anymore.  This change takes this into
account.  The main topology detection loop in the create_hwloc_map()
routine starts at a hardware thread within the initial affinity mask and
goes up the topology tree setting the socket/core/thread labels
correctly.

This change also introduces some of the more generic changes that the
future kmp_topology_t structure will take advantage of including a
generic ratio & count array (finding all ratios of topology layers like
threads/core cores/socket and finding all counts of each topology
layer), generic radix1 reduction step, generic uniformity check, and
generic printing of topology (en_US.txt)

Differential Revision: https://reviews.llvm.org/D95156
2021-01-27 14:27:23 -06:00
Giorgis Georgakoudis bb40e67318 [OpenMP] Fix building using LLVM_ENABLE_RUNTIMES
Fix when time profiling is enabled.

Related to: D94855

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95398
2021-01-27 06:43:57 -08:00
AndreyChurbanov 498c4b6fc4 [OpenMP] libomp: fix build by clang-cl with vs2019
Problem reported by Joseph Shen <joseph.smeng@gmail.com>.
The patch changes *(&<atomic-var>) to (&<atomic-var>)->load().

Differential Revision: https://reviews.llvm.org/D95485
2021-01-27 12:18:15 +03:00
Nawrin Sultana 927af4b3c5 [OpenMP] Modify OMP_ALLOCATOR environment variable
This patch sets the def-allocator-var ICV based on the environment variables
provided in OMP_ALLOCATOR. Previously, only allowed value for OMP_ALLOCATOR
was a predefined memory allocator. OpenMP 5.1 specification allows predefined
memory allocator, predefined mem space, or predefined mem space with traits in
OMP_ALLOCATOR. If an allocator can not be created using the provided environment
variables, the def-allocator-var is set to omp_default_mem_alloc.

Differential Revision: https://reviews.llvm.org/D94985
2021-01-26 18:27:39 -06:00
Shilei Tian 9d64275ae0 [OpenMP] Added the support for hidden helper task in RTL
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks.  We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.

Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.

Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D77609
2021-01-25 22:16:17 -05:00
Hansang Bae 480cbed31e [OpenMP] Remove unnecessary pointer checks in a few locations
Also, return NULL from unsuccessful OMPT function lookup.

Differential Revision: https://reviews.llvm.org/D95277
2021-01-22 19:18:50 -06:00
Joseph Schuchart edbcc17b7a [OpenMP] libomp: properly initialize buckets in __kmp_dephash_extend
The buckets are initialized in __kmp_dephash_create but when they are extended
the memory is allocated but not NULL'd, potentially leaving some buckets
uninitialized after all entries have been copied into the new allocation.
This commit makes sure the buckets are properly initialized with NULL before
copying the entries.

Differential Revision: https://reviews.llvm.org/D95167
2021-01-22 20:29:46 +03:00
Giorgis Georgakoudis 6b7645dd31 [OpenMP] Add time profiling support in libomp
Profiling has been recently implemented in libomptarget (D93055). This patch enables time profiling support for libomptarget in libomp, to support profiling of multi-threaded execution of offloaded regions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94855
2021-01-21 09:15:14 -08:00
Hansang Bae 2d911f7c72 [OpenMP] Fix atomic entries for captured logical operation
Added missing code for the captured atomic operation.

Differential Revision: https://reviews.llvm.org/D94848
2021-01-19 09:59:28 -06:00
AndreyChurbanov a60bc55c69 [OpenMP] libomp: cleanup parsing of OMP_ALLOCATOR env variable.
Differential Revision: https://reviews.llvm.org/D94932
2021-01-19 16:21:22 +03:00
AndreyChurbanov aa3a59e0c6 [OpenMP][NFC] Fix test
The test fails if memkind library is accessible.
2021-01-19 00:05:34 +03:00
Shilei Tian 9bf843bdc8 Revert "[OpenMP] Added the support for hidden helper task in RTL"
This reverts commit ed939f853d.
2021-01-18 06:57:52 -05:00
Chandler Carruth f855751c12 Fix openmp CMake build on non-Linux AArch64 systems.
This just checks for `/proc/cpuinfo` existing before reading it.

Tested on an ARM macOS machine.
2021-01-17 16:18:31 -08:00
Shilei Tian ed939f853d [OpenMP] Added the support for hidden helper task in RTL
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks.  We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.

Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.

Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D77609
2021-01-16 14:13:35 -05:00
Terry Wilmarth 4fe17ada55 [OpenMP] Fix hierarchical barrier
Hierarchical barrier is an experimental barrier algorithm that uses aspects
of machine hierarchy to define the barrier tree structure. This patch fixes
offset calculation in hierarchical barrier. The offset is used to store info
on a flag about sleeping threads waiting on a location stored in the flag.
This commit also fixes a potential deadlock in hierarchical barrier when
using infinite blocktime by adjusting the offset value of leaf kids so that
it matches the value of leaf state. It also adds testing of default barriers
with infinite blocktime, and also tests hierarchical barrier algorithm with
both default and infinite blocktime.

Patch by Terry Wilmarth and Nawrin Sultana.

Differential Revision: https://reviews.llvm.org/D94241
2021-01-13 10:22:57 -06:00
Hansang Bae bba3a82b56 [OpenMP] Use persistent memory for omp_large_cap_mem
This change enables volatile use of persistent memory for omp_large_cap_mem*
on supported systems. It depends on libmemkind's support for persistent memory,
and requirements/details can be found at the following url.

https://pmem.io/2020/01/20/memkind-dax-kmem.html

Differential Revision: https://reviews.llvm.org/D94353
2021-01-12 20:35:27 -06:00
Hansang Bae 6f0f022038 [OpenMP] Update allocator trait key/value definitions
Use new definitions introduced in 5.1 specification.

Differential Revision: https://reviews.llvm.org/D94277
2021-01-12 20:09:45 -06:00
Shilei Tian 676c7cb0c0 [OpenMP] Added the support for cache line size 256 for A64FX
Fugaku supercomputer is built with the Fujitsu A64FX microprocessor, whose cache line is 256. In current libomp, we only have cache line size 128 for PPC64 and otherwise 64. This patch added the support of cache line 256 for A64FX. It's worth noting that although A64FX is a variant of AArch64, this property is not shared. As a result, in light of UCX source code (392443ab92/src/ucs/arch/aarch64/cpu.c (L17)), we can only determine by checking whether the CPU is FUJITSU A64FX.

Reviewed By: jdoerfert, Hahnfeld

Differential Revision: https://reviews.llvm.org/D93169
2021-01-09 11:58:47 -05:00
Hansang Bae fb1c528526 [OpenMP] Use c_int/c_size_t in Fortran target memory routine interface
The Fortran interface is now in line with 5.1 specification.

Differential Revision: https://reviews.llvm.org/D94042
2021-01-06 16:28:30 -06:00
Hansang Bae 82a29a62ab [OpenMP] Add definition/interface for target memory routines
The change includes new routines introduced in 5.1 and Fortran
interface.

Differential Revision: https://reviews.llvm.org/D93505
2021-01-04 08:12:57 -06:00
Terry Wilmarth 6b316febb4 [OpenMP] libomp: Handle implicit conversion warnings
This patch partially prepares the runtime source code to be built with
-Wconversion, which should trigger warnings if any implicit conversions
can possibly change a value. For builds done with icc or gcc, all such
warnings are handled in this patch. clang gives a much longer list of
warnings, particularly for sign conversions, which the other compilers
don't report. The -Wconversion flag is commented into cmake files, but
I'm not going to turn it on. If someone thinks it is important, and wants
to fix all the clang warnings, they are welcome to.

Types of changes made here involve either improving the consistency of types
used so that no conversion is needed, or else performing careful explicit
conversions, when we're sure a problem won't arise.

Patch is a combination of changes by Terry Wilmarth and Johnny Peyton.

Differential Revision: https://reviews.llvm.org/D92942
2020-12-31 00:39:57 +03:00
Hansang Bae e1fd202489 [OpenMP] Add definitions for 5.1 interop to omp.h 2020-12-17 13:03:59 -06:00
Peyton, Jonathan L 5aafdd7b88 [OpenMP] Introduce new file wrapper class for runtime
Introduce new kmp_safe_raii_file_t class with RAII semantics for file
open/close. It is essentially a wrapper around the C-style FILE* object.
This also unifies the way we error report if a file can't be opened.

Differential Revision: https://reviews.llvm.org/D92604
2020-12-15 14:46:30 -06:00
Hansang Bae 171ca93c54 [OpenMP] Initialize runtime in the forked child process
This patch enables serial initialization in the forked child process
to fix unstable runtime behavior when used with Python-based AI tools.

Differential Revision: https://reviews.llvm.org/D93230
2020-12-15 07:29:28 -06:00
Hansang Bae c3b5009aa7 [OpenMP] Use RTM lock for OMP lock with synchronization hint
This patch introduces a new RTM lock type based on spin lock which is
used for OMP lock with speculative hint on supported architecture.

Differential Revision: https://reviews.llvm.org/D92615
2020-12-09 19:14:53 -06:00
Nawrin Sultana 540007b427 [OpenMP] Add strict mode in num_tasks and grainsize
This patch adds new API __kmpc_taskloop_5 to accomadate strict
modifier (introduced in OpenMP 5.1) in num_tasks and grainsize
clause.

Differential Revision: https://reviews.llvm.org/D92352
2020-12-09 16:46:30 -06:00
Peyton, Jonathan L fe3b244ef7 [OpenMP] Fix norespect affinity bug for Windows
KMP_AFFINITY=norespect was triggering an error because the underlying
process affinity mask was not updated to include the entire machine.
The Windows documentation states that the thread affinities must be
subsets of the process affinity. This patch also moves the printing
(for KMP_AFFINITY=verbose) of whether the initial mask was respected
out of each topology detection function and to one location where the
initial affinity mask is read.

Differential Revision: https://reviews.llvm.org/D92587
2020-12-09 14:32:48 -06:00
Peyton, Jonathan L 9b7d6a6bff [OpenMP] Fix too long name for shm segment on macOS
Remove the user id component to the shm segment name and just use
the pid like before.

Differential Revision: https://reviews.llvm.org/D92660
2020-12-09 14:31:15 -06:00
AndreyChurbanov fff1abc406 [OpenMP] NFC: comment adjusted 2020-12-07 19:50:14 +03:00
AndreyChurbanov 22558c8501 [OpenMP] libomp: Fix possible NULL dereferences
Check pointer returned by strchr, as it can be NULL in case of broken
format of input string. Introduced new function __kmp_str_loc_numbers
for fast parsing of numbers only in the location string.
Also made some cleanup of __kmp_str_loc_init declaration and usage:
- changed type of init_fname parameter to bool;
- changed input from true to false in places where fname is not used.

Differential Revision: https://reviews.llvm.org/D90962
2020-12-07 19:09:07 +03:00
Joachim Protze a148216b31 [OpenMP][OMPT] Fix OMPT return address guard for gomp interface
D91692 missed various locations in kmp_gsupport, where the scope for
OMPT_STORE_RETURN_ADDRESS is too narrow, i.e. the scope ends before the OMPT
callback is called in some nested function.

This patch fixes the scoping issue, so that all OMPT tests pass, when the
tests are built with gcc.

Differential Revision: https://reviews.llvm.org/D92121
2020-12-05 19:06:28 +01:00
Joachim Protze d3ec512b1d [OpenMP][OMPT] Make sure that 0 is never used as ID in tests (NFC) 2020-12-04 18:41:56 +01:00
Hansang Bae c4a22224d9 [OpenMP] Add __kmpc_omp_target_task_alloc to dllexport
This patch enables use of the entry on Windows.

Differential Revision: https://reviews.llvm.org/D92618
2020-12-04 08:11:14 -06:00
Terry Wilmarth e0665a9050 [OpenMP] Add support for Intel's umonitor/umwait
These changes add support for Intel's umonitor/umwait usage in wait
code, for architectures that support those intrinsic functions. Usage of
umonitor/umwait is off by default, but can be turned on by setting the
KMP_USER_LEVEL_MWAIT environment variable.

Differential Revision: https://reviews.llvm.org/D91189
2020-12-01 14:07:46 -06:00
AndreyChurbanov 6bf84871e9 [OpenMP] libomp: add UNLIKELY hints to rarely executed branches
Added UNLIKELY hint to one-time or rarely executed branches.
This improves performance of the library on some tasking benchmarks.

Differential Revision: https://reviews.llvm.org/D92322
2020-12-01 16:53:21 +03:00
Joachim Protze fd3d1b09c1 [OpenMP][Tests][NFC] Use FileCheck from cmake config 2020-11-30 23:16:56 +01:00
Todd Erdner 9615890db5 [OpenMP] libomp: change shm name to include UID, call unregister_lib on SIGTERM
With the change to using shared memory, there were a few problems that need to be fixed.
- The previous filename that was used for SHM only used process id. Given that process is
  usually based on 16bit number, this was causing some conflicts on machines. Thus we add
  UID to the name to prevent this.
- It appears under some conditions (SIGTERM, etc) the shared memory files were not getting
  cleaned up. Added a call to clean up the shm files under those conditions. For this user
  needs to set envirable KMP_HANDLE_SIGNALS to true.

Patch by Erdner, Todd <todd.erdner@intel.com>

Differential Revision: https://reviews.llvm.org/D91869
2020-12-01 00:40:47 +03:00
AndreyChurbanov f6f28b44ad [OpenMP] libomp: fix mutexinoutset dependence for proxy tasks
Once __kmp_task_finish is not executed for proxy tasks,
move mutexinoutset dependency code to __kmp_release_deps
which is executed for all task kinds.

Differential Revision: https://reviews.llvm.org/D92326
2020-12-01 00:13:31 +03:00
Joachim Protze 723be4042a [OpenMP][OMPT][NFC] Fix failing test
The test would fail for gcc, when built with debug flag.
2020-11-29 19:07:42 +01:00
Joachim Protze cdf9401df8 [OpenMP][OMPT][NFC] Fix flaky test
The test had a chance to finish the first task before the second task is
created. In this case, the dependences-pair event would not trigger.
2020-11-29 19:07:41 +01:00
Martin Storsjö 6b429668de [OpenMP][OMPT] Fix building with OMPT disabled after 6d3b81664a 2020-11-26 10:09:32 +02:00
AndreyChurbanov 9e3e332d27 [OpenMP] libomp: fix non-X86, non-AARCH64 builds
Commit https://reviews.llvm.org/rG7b5254223acbf2ef9cd278070c5a84ab278d7e5f
broke the build for some architectures, because macro KMP_PREFIX_UNDERSCORE
was defined only for x86, x86_64 and aarch64. This patch defines it for other
architectures (as a no-op).

Differential Revision: https://reviews.llvm.org/D92027
2020-11-25 20:40:23 +03:00
Joachim Protze 6d3b81664a [OpenMP][OMPT] Introduce a guard to handle OMPT return address
This is an alternative approach to address inconsistencies pointed out in: D90078
This patch makes sure that the return address is reset, when leaving the scope.
In some cases, I had to move the macro out of an if-statement to have it in the
right scope, in some cases I added an additional block to restrict the scope.

This patch does not handle inconsistencies, which might occur if the return
address is still set when we call into the application.

Test case (repeated_calls.c) provided by @hbae

Differential Revision: https://reviews.llvm.org/D91692
2020-11-25 18:17:44 +01:00
Isabel Thärigen b281a05dac [OpenMP][OMPT] Implement verbose tool loading
OpenMP 5.1 introduces the new env variable
OMP_TOOL_VERBOSE_INIT=(disabled|stdout|stderr|<filename>) to enable verbose
loading and initialization of OMPT tools.
This env variable helps to understand the cause when loading of a tool fails
(e.g., undefined symbols or dependency not in LD_LIBRARY_PATH)
Output of OMP_TOOL_VERBOSE_INIT is added for OMP_DISPLAY_ENV

Tests for this patch are integrated into the different existing tool loading
tests, making these tests more verbose. An Archer specific verbose test is
integrated into an existing Archer test.

Patch prepared by: Isabel Thärigen

Differential Revision: https://reviews.llvm.org/D91464
2020-11-25 18:17:44 +01:00
AndreyChurbanov 7b5254223a [OpenMP] fix asm code for for arm64 (AARCH64) for Darwin/macOS
Adjusted external reference for Darwin/AARCH64 link compatibility.
Made size directive conditional only if __ELF__ defined.

Patch by Michael_Pique <mpique@icloud.com>

Differential Revision: https://reviews.llvm.org/D88252
2020-11-24 13:08:24 +03:00
AndreyChurbanov 5644f734d6 Revert "[OpenMP] Add support for Intel's umonitor/umwait"
This reverts commit 9cfad5f9c5.
2020-11-20 12:16:34 +03:00
AndreyChurbanov 9cfad5f9c5 [OpenMP] Add support for Intel's umonitor/umwait
Patch by tlwilmar (Terry Wilmarth)

Differential Revision: https://reviews.llvm.org/D91189
2020-11-19 22:04:21 +03:00