Commit Graph

2022 Commits

Author SHA1 Message Date
Joachim Protze 5918688248 [OpenMP][Tests][NFC] Flagging OMPT tests as XFAIL for Intel compilers
With Intel 19 compiler the teams tests fail to link while trying to link
liboffload.
2021-10-18 13:50:03 +02:00
Shilei Tian 2c941fa2f9 [OpenMP][deviceRTLs] Fix wrong return value of `__kmpc_is_spmd_exec_mode`
D110279 introduced a bug to the device runtime. In `__kmpc_parallel_51`, we detect
whether we are already in parallel region by `__kmpc_parallel_level() > __kmpc_is_spmd_exec_mode()`.
It is based on the assumption that:
- In SPMD mode, parallel level is initialized to 1.
- In generic mode, parallel level is initialized to 0.
- `__kmpc_is_spmd_exec_mode` returns `1` for SPMD mode, 0 otherwise.

Because the return value type of `__kmpc_is_spmd_exec_mode` is `int8_t`, there
was an implicit cast from `bool` to `int8_t`. We can make sure it is either 0 or
1 since C++14. In D110279, the return value is the result of an `and` operation,
which is 2 in SPMD mode. This breaks the assumption in `__kmpc_parallel_51`.

Reviewed By: carlo.bertolli, dpalermo

Differential Revision: https://reviews.llvm.org/D111905
2021-10-16 12:58:29 -04:00
Joachim Protze 26b675d65e [OpenMP][Tools][NFC] Make an Archer test more robust
The execution order of the tasks is not fixed, so there is no ordering
for the write accesses. Enforce the ordering that is expected in the check.
2021-10-15 17:32:05 +02:00
Peyton, Jonathan L acb3b187c4 [OpenMP][host runtime] Add initial hybrid CPU support
Detect, through CPUID.1A, and show user different core types through
KMP_AFFINITY=verbose mechanism. Offer future runtime optimizations
 __kmp_is_hybrid_cpu() to know whether running on a hybrid system or not.

Differential Revision: https://reviews.llvm.org/D110435
2021-10-14 16:49:42 -05:00
Peyton, Jonathan L b840d3ab0d [OpenMP][host runtime] small fixup of RTM CPUID bit check 2021-10-14 16:49:42 -05:00
Peyton, Jonathan L 50b68a3d03 [OpenMP][host runtime] Add support for teams affinity
This patch implements teams affinity on the host.
The default is spread. A user can specify either spread, close, or
primary using KMP_TEAMS_PROC_BIND environment variable. Unlike
OMP_PROC_BIND, KMP_TEAMS_PROC_BIND is only a single value and is not a
list of values. The values follow the same semantics under the OpenMP
specification for parallel regions except T is the number of teams in
a league instead of the number of threads in a parallel region.

Differential Revision: https://reviews.llvm.org/D109921
2021-10-14 16:30:28 -05:00
AndreyChurbanov 621d7a75b1 [OpenMP] libomp: add atomic functions for new OpenMP 5.1 atomics.
Added functions those implement "atomic compare".
Though clang does not use library interfaces to implement OpenMP atomics,
the functions added for consistency.
Also added missed functions for 80-bit floating min/max atomics.

Differential Revision: https://reviews.llvm.org/D110109
2021-10-13 21:02:18 +03:00
AndreyChurbanov 6e98ec9b20 [OpenMP] libomp: fix ittnotify usage.
Replaced storing of ittnotify domain array index into
location info structure (which is now read-only) with storing of
(location info address + ittnotify domain + team size) into hash map.
Replaced __kmp_itt_barrier_domains and __kmp_itt_imbalance_domains arrays with
__kmp_itt_barrier_domains hash map; __kmp_itt_region_domains and
__kmp_itt_region_team_size arrays with __kmp_itt_region_domains hash map.
Basic functionality did not change (at least tried to not change).

The patch fixes https://bugs.llvm.org/show_bug.cgi?id=48644.

Differential Revision: https://reviews.llvm.org/D111580
2021-10-13 20:49:05 +03:00
AndreyChurbanov 5e58b63b28 [OpenMP] libomp: fix warning on comparison of integer expressions of different signedness
Replaced macro with global variable of correspondent type.

Differential Revision: https://reviews.llvm.org/D111562
2021-10-13 20:11:47 +03:00
AndreyChurbanov f5c0c9179f [OpenMP] libomp: add OpenMP 5.1 memory allocation routines.
Aligned allocation routines added.
Fortran interfaces added for all allocation routines.

Differential Revision: https://reviews.llvm.org/D110923
2021-10-11 19:25:00 +03:00
Ron Lieberman d022f39d9f [libomptarget][amdgpu][NFC] tweak a comment 2021-10-09 12:51:53 -04:00
Joseph Huber bad44d5f39 [OpenMP] Add RTL function for getting number of threads in block.
This patch adds support for the
`__kmpc_get_hardware_num_threads_in_block` function that returns the
number of threads. This was missing in the new runtime and was used by
the AMDGPU plugin which prevented it from using the new runtime. This
patchs also unified the interface for getting the thread numbers in the
frontend.

Originally authored by jdoerfert.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D111475
2021-10-08 22:21:59 -04:00
Joseph Huber 85ad566335 [OpenMP] Avoid calling `isSPMDMode` during RT initialization
Until we hit the first barrier we should not call `mapping::isSPMDMode`
with all threads. Instead, we now have (and use during initialization) a
`mapping::isMainThreadInGenericMode` overload that takes the known
SPMD-mode state and one that queries it.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111381
2021-10-08 22:00:41 -04:00
Joseph Huber 208f900527 [Libomptarget] Add an external interface to dynamic shared memory
This patch adds an external interface to access the dynamic shared
memory buffer in the device runtime. The function introduced is
``llvm_omp_get_dynamic_shared``. This includes a host-side
definition that only returns a null pointer so that it can be used when
host-fallback is enabled without crashing. Support for dynamic shared
memory was also ported to the old device runtime.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110957
2021-10-08 15:36:57 -04:00
Shilei Tian c060c634ef [OpenMP][NVPTX] Fix an error in configuring #teams and #threads
It must be a copy mistake.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D111407
2021-10-08 11:07:43 -04:00
Shilei Tian af4599b8ab [OpenMP][DeviceRTL] Add the support for printf in a freestanding way
For NVPTX, `printf` can be used just with a function declaration. For AMDGCN, an
function definition is added, but it simply returns.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109728
2021-10-07 22:15:37 -04:00
Johannes Doerfert 44710940af [OpenMP][FIX] Data race in the SPMD execution of the new runtime
We need to synchronize the threads *before* we destroy the RAII objects
that hold the old values and not after to avoid threads executing the
parallel region but seeing an inconsistent state.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111369
2021-10-07 21:01:24 -04:00
Jon Chesterfield 1bc3a6e41b [libomptarget] Reapply 2bc4d48a78 which was accidentally reverted 2021-10-07 20:17:48 +01:00
Jon Chesterfield 0c554a4769 [libomptarget] Move device environment to shared header, remove divergence
Follow on to D110006, related to D110957

Where implementations have diverged this resolves to match the new DeviceRTL

- replaces definitions of this struct in deviceRTL and plugins with include
- changes the dynamic_shared_size field from D110006 to 32 bits
- handles stdint being unavailable in DeviceRTL
- adds a zero initializer for the field to amdgpu
- moves the extern declaration for deviceRTL to target_interface
  (omptarget.h is more natural, but doesn't work due to include order
  with debug.h)
- Renames the fields everywhere to match the LLVM format used in DeviceRTL
- Makes debug_level uint32_t everywhere (previously sometimes int32_t)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111069
2021-10-07 12:03:48 +01:00
Michał Górny 0873b9bef4 [openmp] [elf_common] Fix linking against LLVM dylib
The hand-rolled linking logic in elf_common does not account for
the possibility of using LLVM dylib rather than a dozen static
libraries.  Since it does not seem to be easily convertible
to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB.
This is necessary to support stand-alone builds against installed LLVM.

Differential Revision: https://reviews.llvm.org/D111038
2021-10-04 09:29:06 +02:00
Martin Storsjö dec2257f35 [openmp] Fix a typo in a test REQUIRES line
Differential Revision: https://reviews.llvm.org/D110963
2021-10-03 23:51:11 +03:00
Peyton, Jonathan L 343b9e8590 [OpenMP][host runtime] Introduce kmp_cpuinfo_flags_t to replace integer flags
Store CPUID support flags as bits instead of using entire integers.

Differential Revision: https://reviews.llvm.org/D110091
2021-10-01 11:08:39 -05:00
Peyton, Jonathan L 957b4c5750 [OpenMP][testing] increase threshold for omp_get_wtime test 2021-10-01 11:07:41 -05:00
Jon Chesterfield 05ba9ff6a6 [libomptarget][amdgpu] Refactor memory pool collection 2021-10-01 14:58:01 +01:00
Jon Chesterfield 72e8a4c45d [openmp][docs] Describe how the internal components are found
Add a FAQ entry about the names of openmp offloading components
and how they are searched for.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D109619
2021-09-30 22:05:12 +01:00
Jon Chesterfield 3247329107 [openmp] Add addrspacecast to getOrCreateIdent
Fixes 51982. Adds a missing CreatePointerCast and allocates a global in
the correct address space.

Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\
blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting
parts while checking the assertion failure still occurred.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110556
2021-09-30 21:36:31 +01:00
Jon Chesterfield b75a7481ba [libomptarget] Apply D110029 to amdgpu
Use enum for execution mode.

This is partly a port from ROCm and partly a port from D110029. Attempted to
make the same choices as ROCm as far as comments etc go to reduce the merge
conflicts.

There is some cleanup warranted here - in particular I like the cuda patch
factoring out the comparisons into named variables - but I'd like to leave
that for a follow up patch, keeping this one minimal.

Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D110845
2021-09-30 21:29:37 +01:00
Dhruva Chakrabarti 6226270253 [libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed.
Fixes: SWDEV-275232 (With contributions from Ammar Elwazir, Laurent Morichetti, and Tony Tye)

The current code is racy. After the packet is submitted, the GPU will increment the read index. If this wraps around before the memory is read from it'll refer to a signal from an unrelated packet. Change avoids reading from the packet post-submission.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110679
2021-09-29 09:22:07 -07:00
Jon Chesterfield 2bc4d48a78 [libomptarget][amdgpu] Follow on to D110513, empty kernarg pools are not fatal 2021-09-27 22:44:35 +01:00
Jon Chesterfield 738734f655 [libomptarget][amdgpu] Report zero devices if plugin construction fails, instead of segv 2021-09-27 22:13:12 +01:00
Jon Chesterfield 80fa43fe9a Revert "[openmp] Add addrspacecast to getOrCreateIdent"
This reverts commit 1a761e5b7b.
Failed CI, albeit with a different failure mode to BZ51982
2021-09-27 19:27:35 +01:00
Jon Chesterfield 1a761e5b7b [openmp] Add addrspacecast to getOrCreateIdent
Fixes 51982. Minor refactor to remove `return x = y` construct.

Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\
blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting
parts while checking the assertion failure still occurred.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110556
2021-09-27 19:23:12 +01:00
@vladaindjic 5357a98c82 [OpenMP] libomp: Usage of TASK_TIED constant inside kmp_gsupport.cpp
The minor code refactorization introduces the TASK_TIED constant inside
kmp_gsupprot.cpp as a replacement for the literal value 1.
The mentioned constant is now used in both kmp_tasking.cpp and
kmp_gsupport.cpp files.

Differential Revision: https://reviews.llvm.org/D110441
2021-09-27 19:45:56 +03:00
Joseph Huber 74d622dea4 [OpenMP] Add new worksharing definitions into device RTL
This path defines the newly added `__kmpc_disitrute_static_init`
functions in the device runtime library. These functions are currently
exact copies of the current worksharing method but can be tuned later.

Depends on D110429

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110430
2021-09-27 11:36:41 -04:00
Pushpinder Singh b1695c2eb8 [AMDGPU][OpenMP] Add memory pool size check to isValidMemoryPool
Keeping all the checks in one place for future simplification.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110513
2021-09-27 12:29:00 +00:00
Michael Kruse 1b242dccff [OpenMP][CMake] Use in-project clang as CUDA->IR compiler for new DeviceRTL.
Use the in-project clang, llvm-link and opt if available and unless
CMake cache variables specify to use a different compiler. This applies
D101265 to the new DeviceRTL's CMakeLists.txt which was copied before
D101265 was applied.

Fixes the openmp-offloading-cuda-runtime builder which was failing
since D110006.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110251
2021-09-27 07:14:19 -05:00
Pushpinder Singh 9d0eb440ff [libomptarget][nfc][amdgpu] Reorder function to clarify review diff 2021-09-27 09:30:55 +00:00
Jon Chesterfield 726a34f063 [libomptarget][amdgpu] Replace dead exit call with returning error 2021-09-27 09:43:37 +01:00
Vignesh Balu 62fddd5ff5 [OpenMP][OMPD] Implementation of OMPD debugging library - libompd.
This is a continuation of the review: https://reviews.llvm.org/D100182
This patch implements the OMPD API as specified in the standard doc.

Reviewed By: @hbae
Differential Revision: https://reviews.llvm.org/D100183
2021-09-27 12:32:31 +05:30
Jon Chesterfield 8cf93a35d4 [libomptarget][amdgpu] Destruct HSA queues
Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D109511
2021-09-26 15:34:21 +01:00
Joseph Huber d83ca624a1 [OpenMP] Fix data-race in new device RTL
This patch fixes a data-race observed when using the new device runtime
library. The Internal control variable for the parallel level is read in
the `__kmpc_parallel_51` function while it could potentially be written
by other threads. This causes data corruption and will cause
nondetermistic behaviour in the runtime. This patch fixes this by adding
an explicit synchronization before the region starts.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110366
2021-09-23 17:28:07 -04:00
Shilei Tian 423d34f74a [OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit`
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110279
2021-09-22 17:16:41 -04:00
Joseph Huber 60a40cf379 [OpenMP] Fix KeepAlive usage
Summary:
Functions were called the wrong way around, this didn't keep the symbol
alive.
2021-09-22 14:38:19 -04:00
Joseph Huber 277b681ede [OpenMP] Add function tracing debugging to device RTL
This patch adds support for an RAII struct that will print function
traces when placed inside of a function declaration. Each successive
call will increase the indentation to make it easier to visually
inspect.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110202
2021-09-22 12:25:29 -04:00
Shilei Tian ca999f7191 [OpenMP][Offloading] Use bitset to indicate execution mode instead of value
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics

We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.

This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)

In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110029
2021-09-22 11:40:52 -04:00
Joseph Huber 1cf86df883 [OpenMP] Make sure the Thread ID function is not removed
Summary:
The thread ID function was reintroduced in D110195, but could
potentially be removed by the optimizer. Make the function noinline to
preserve the call sites and add it to the externalization RAII so its
definition is not removed by the attributor.
2021-09-22 10:13:18 -04:00
Joseph Huber e95731cca7 [OpenMP] Add thread ID function into new RTL
The new device runtime library currently lacks the
`kmpc_get_hardware_thread_id_in_block` function which is currently used
when doing the SPMDzation optimization. This call would be introduced
through the optimization and then cause a linking error because it was
not present. This patch adds support for this runtime call.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110195
2021-09-21 17:43:50 -04:00
Giorgis Georgakoudis ac90dfc43a Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit 1d66649adf.

Revert to fix AMG GPU issue.
2021-09-21 13:20:39 -07:00
Usman Nadeem 248342b7c7 [OpenMP][OMPD] Fix compile error when OMPD is not supported
Differential Revision: https://reviews.llvm.org/D110120

Change-Id: I9d39dacfab5b7fbab37ee4b4d960d51e0892b24d
2021-09-21 12:45:15 -07:00
Giorgis Georgakoudis 1d66649adf [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert, jhuber6

Differential Revision: https://reviews.llvm.org/D102107
2021-09-21 10:50:04 -07:00
Shilei Tian 49e976c934 [OpenMP][NVPTX] Fix a warning that data argument not used by format string
Reviewed By: jhuber6, grokos

Differential Revision: https://reviews.llvm.org/D110104
2021-09-20 17:22:14 -04:00
Peyton, Jonathan L 1e45cd75df [OpenMP][host runtime] Fix indirect lock table race condition
The indirect lock table can exhibit a race condition during initializing
and setting/unsetting locks. This occurs if the lock table is
resized by one thread (during an omp_init_lock) and accessed (during an
omp_set|unset_lock) by another thread.

The test runtime/test/lock/omp_init_lock.c test exposed this issue and
will fail if run enough times.

This patch restructures the lock table so pointer/iterator validity is
always kept. Instead of reallocating a single table to a larger size, the
lock table begins preallocated to accommodate 8K locks. Each row of the
table is allocated as needed with each row allowing 1K locks. If the 8K
limit is reached for the initial table, then another table, capable of
holding double the number of locks, is allocated and linked
as the next table. The indices stored in the user's locks take this
linked structure into account when finding the lock within the table.

Differential Revision: https://reviews.llvm.org/D109725
2021-09-20 13:01:58 -05:00
Joseph Huber f1c821fa85 [OpenMP] Add support for dynamic shared memory in new RTL
This patch adds support for using dynamic shared memory in the new
device runtime. The new function `__kmpc_get_dynamic_shared` will return a
pointer to the buffer of dynamic shared memory. Currently the amount of memory
allocated is set by an environment variable.

In the future this amount will be added to the amount used for the smart stack
which will be configured in a similar way.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110006
2021-09-17 21:25:36 -04:00
Joseph Huber ec02c34b6d [OpenMP] Add additional fields to device environment
This patch adds fields for the device number and number of devices into
the device environment struct and debugging values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110004
2021-09-17 21:25:32 -04:00
Joseph Huber b266bcb135 [OpenMP] Implement __assert_fail in the new device runtime
This patch implements the `__assert_fail` function in the new device
runtime. This allows users and developers to use the standars assert
function inside of the device.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D109886
2021-09-17 21:25:28 -04:00
Shilei Tian 81a1a91c62 [NFC] clang-format -i /openmp/libomptarget/deviceRTLs/interface.h 2021-09-17 12:55:02 -04:00
AndreyChurbanov 59b877d001 [OpenMP] NFC: add type casts to silence gcc warnings 2021-09-17 19:49:40 +03:00
AndreyChurbanov 7f1a6d891e [OpenMP] libomp: Update third-party sources of ittnotify client code.
The third-party ittnotify sources updated from https://github.com/intel/ittapi.
Changes applied:
- llvm license aded to all files; initial BSD license saved in LICENSE.txt;
- clang-formatted;
- renamed *.c to *.cpp, similar to what we did with all our sources;
- added #include "kmp_config.h" with definition of INTEL_ITTNOTIFY_PREFIX macro
  into ittnotify_static.cpp.

Differential Revision: https://reviews.llvm.org/D109333
2021-09-17 19:38:34 +03:00
Hansang Bae ae2a5facce [OpenMP][libomptarget] Minor fix in x86_64 plugin
Call to remove() was passing invalid address for the file name.

Differential Revision: https://reviews.llvm.org/D109846
2021-09-15 15:57:06 -05:00
Peyton, Jonathan L 258e27aae1 [OpenMP] Add support for GOMP depobj
GOMP depobjs are represented as a two intptr_t array. The first
element is the base address of the dependency and the second element
is the flag indicating the type the depobj represents.

Differential Revision: https://reviews.llvm.org/D108790
2021-09-15 12:47:08 -05:00
Vignesh Balasubramanian 939154125b [OpenMP] [OMPD] OPENMP_INSTALL_LIBDIR is set for the install dir
OPENMP_INSTALL_LIBDIR is set to the installation path of shared and static
libompd.This should avoid the mixing of 32 and 64 bit on same path in
multi-lib set-up.

Reviewed By: @mceier
Differential Revision: https://reviews.llvm.org/D109352
2021-09-13 10:25:50 +05:30
Joseph Huber 7eb899cbcd [OpenMP] Add more verbose remarks for runtime folding
We peform runtime folding, but do not currently emit remarks when it is
performed. This is because it comes from the runtime library and is
beyond the users control. However, people may still wish to view  this
and similar information easily, so we can enable this behaviour using a
special flag to enable verbose remarks.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109627
2021-09-10 17:36:06 -04:00
Ye Luo 2187cbf56f [OpenMP][libomptarget] Add __tgt_target_return_t enum for __tgt_target_XXX return int
The defintion of OFFLOAD_SUCCESS and OFFLOAD_FAIL used in plugin APIs and libomptarget public APIs are not consistent.
Create __tgt_target_return_t for libomptarget public APIs.

Differential Revision: https://reviews.llvm.org/D109304
2021-09-10 16:11:08 -05:00
Jon Chesterfield f244af5c9f [openmp][amdgpu] Update SupportAndFAQ docs 2021-09-10 18:35:29 +01:00
Johannes Doerfert 9f844aeeb4 [OpenMP][Docs] Remove old/outdated webpage
This should have happened a long time ago, now that openmp.llvm.org
redirects to openmp.llvm.org/docs we completely switched over to the
sphinx documentation page instead.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D108588
2021-09-10 12:11:05 -05:00
Jon Chesterfield 6760234e8d [libomptarget][amdgpu] Precisely manage hsa lifetime
The hsa library must be initialized before any calls into it and
destructed after the last call into it. There have been a number of bugs in
this area related to member variables which would like to use raii to manage
resources acquired from hsa.

This patch moves the init/shutdown of hsa into a class, such that when used as
the first member variable (could be a base), the lifetime of other member
variables are reliably scoped within it. This will allow other classes to use
raii reliably when used as member variables within the global.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D109512
2021-09-09 17:28:11 +01:00
Jon Chesterfield 2a581710c1 [openmp] No longer use LIBRARY_PATH to find devicertl
Given D109057, change test runner to use the libomptarget-x-bc-path
argument instead of the LIBRARY_PATH environment variable to find the device
library.

Also drop the use of LIBRARY_PATH environment variable as it is far
too easy to pull in the device library from an unrelated toolchain by accident
with the current setup. No loss in flexibility to developers as the clang
commandline used here is still available.

Reviewed By: jdoerfert, tianshilei1992

Differential Revision: https://reviews.llvm.org/D109061
2021-09-09 17:16:41 +01:00
Jon Chesterfield d642156f8f [libomptarget][nfc] Hoist hsa_init into rtl.cpp 2021-09-09 16:09:34 +01:00
Hansang Bae 3976035d68 [OpenMP] Fix line truncation in omp_lib.h
Fixed code that exceeds 72-column.

Differential Revision: https://reviews.llvm.org/D109469
2021-09-09 09:33:45 -05:00
AndreyChurbanov d40108e0af [OpenMP] libomp: runtime part of omp_all_memory task dependence implementation.
New omp_all_memory task dependence type is implemented.
Library recognizes the new type via either
(dependence_address == NULL && dependence_flag == 0x80)
or
(dependence_address == SIZE_MAX).
A task with new dependence type depends on each preceding task
with any dependence type (kind of a dependence barrier).

Differential Revision: https://reviews.llvm.org/D108574
2021-09-08 16:55:32 +03:00
Ye Luo 2cfe1a09d1 [OpenMP][libomptarget][NFC] Change checkDeviceAndCtors return type to bool.
What is exactly needed is only a boolean. Pulling OFFLOAD_SUCCESS/FAIL only adds confusion.

Differential Revision: https://reviews.llvm.org/D109303
2021-09-07 13:59:27 -05:00
Hansang Bae 224f51d879 [OpenMP] Add interface for 5.1 scope construct
The new interface only marks begin/end of a scope construct for
corresponding OMPT events, and we can use existing interfaces for
reduction operations.

Differential Revision: https://reviews.llvm.org/D108062
2021-09-07 11:22:21 -05:00
Nawrin Sultana c24da72fa4 [OpenMP] Change monotonicity of dynamic schedule
This patch changes the default monotonicity of dynamic schedule from
monotonic to non-monotonic when no modifier is specified.

Differential Revision: https://reviews.llvm.org/D109026
2021-09-07 08:18:46 -05:00
Ye Luo c3aecf87d5 [OpenMP][libomptarget] Change device vector elements to unique_ptr type
Using std::vector<DeviceTy> requires implementing copy constructor and copied assign operator for DeviceTy.
Indeed DeviceTy should never be copied. After changing to std::vector<std::unique_ptr<DeviceTy>>,
All the unsafe copy constructor and copy assign operator implementations can be removed.
Compilers mark them deleted due to mutex or underlying objects and this is the desired behavior.

Differential Revision: https://reviews.llvm.org/D109276
2021-09-06 22:28:49 -05:00
Ye Luo 8e5c1b039e [OpenMP][libomptarget] Change synchronize_ty return type to int32_t
Plugins always return int32_t. Stay consistent with other functions which return error status.

Differential Revision: https://reviews.llvm.org/D109341
2021-09-06 21:38:54 -05:00
Ron Lieberman fdac5adee6 [openmp] NFC add bitcode comment 2021-09-02 18:21:39 -05:00
Jon Chesterfield 201e466eba [libomptarget][amdgpu] Add gfx90a to build list 2021-09-02 18:11:02 +01:00
Jon Chesterfield 3153bdd547 [libomptarget][amdgpu] Drop env variables
Use the same debug print as the rest of libomptarget plugins with
the same environment control. Also drop the max queue size debugging hook as
I don't believe it is still in use, can bring it back near the rest of the env
handling in rtl.cpp if someone objects.

That makes most of rt.h and all of utils.cpp unused. Clean that up and simplify
control flow in a couple of places.

Behaviour change is that debug prints that used to use the old environment
variable now use the new one and print in slightly different format, and the
removal of the max queue size variable.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D108784
2021-09-02 11:02:39 +01:00
Ye Luo 289a1089cd [libomptarget] Move HostDataToTargetTy states into StatesTy
Use unique_ptr to achieve the effect of mutable.

Remove mutable keyword of DynRefCount and HoldRefCount
Remove std::shared_ptr from UpdateMtx

Reviewed By: tianshilei1992, grokos

Differential Revision: https://reviews.llvm.org/D109007
2021-09-01 23:36:05 -05:00
Fangrui Song 4d5220faf9 [OpenMP] Fix -Wunused-but-set-parameter in -DLLVM_ENABLE_ASSERTIONS=off builds. NFC 2021-09-01 17:55:13 -07:00
Joel E. Denny 1f9e437065 [OpenMP][AMDGPU] Remove unneeded XFAILs 2021-09-01 18:00:25 -04:00
Joel E. Denny 786a140650 [OpenMP] Use IsHostPtr where needed in rest of omptarget.cpp
As started in D107925, this patch replaces the remaining occurrences
of `UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin` in
`omptarget.cpp` with `IsHostPtr`.  The former condition is broken in
the rare case that the device and host happen to use the same address
for their mapped allocations.  I don't know how to write a test that's
likely to reveal this case.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D107928
2021-09-01 17:31:42 -04:00
Joel E. Denny d11bab0b73 [OpenMP] Use IsHostPtr where needed for targetDataBegin
As discussed in D105990, without this patch, `targetDataBegin`
determines whether to transfer data (as opposed to assuming it's in
shared memory) using the condition `!UseUSM || HasCloseModifier`.
However, this condition is broken if use of discrete memory was forced
by `omp_target_associate_ptr`.  This patch extends
`unified_shared_memory/associate_ptr.c` to reveal this case, and it
fixes it using `!IsHostPtr` in `DeviceTy::getTargetPointer` to replace
this condition.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D107927
2021-09-01 17:31:42 -04:00
Joel E. Denny fa6c275505 [OpenMP][NFC] Eliminate CopyMember from targetDataEnd
This patch is based on comments in D105990.  It is NFC according to
the following observations:

1. `CopyMember` is computed as `!IsHostPtr && IsLast`.
2. `DelEntry` is true only if `IsLast` is true.

We apply those observations in order:

```
if ((DelEntry || Always || CopyMember) && !IsHostPtr)

if ((DelEntry || Always || IsLast) && !IsHostPtr)

if ((Always || IsLast) && !IsHostPtr)
```

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D107926
2021-09-01 17:31:42 -04:00
Joel E. Denny 8e4836b2a2 [OpenMP] Use IsHostPtr where needed for targetDataEnd
As discussed in D105990, without this patch, `targetDataEnd`
determines whether to transfer data or delete a device mapping (as
opposed to assuming it's in shared memory) using two different
conditions, each of which is broken for some cases:

1. `!(UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin)`: The
   broken case is rare: the device and host might happen to use the
   same address for their mapped allocations.  I don't know how to
   write a test that's likely to reveal this case, but this patch does
   fix it, as discussed below.
2. `!UNIFIED_SHARED_MEMORY || HasCloseModifier`: There are at least
   two broken cases:
    1. The `close` modifier might have been specified on an `omp
      target enter data` but not the corresponding `omp target exit
      data`, which thus might falsely assume a mapping is in shared
      memory.  The test `unified_shared_memory/close_enter_exit.c`
      already has a missing deletion as a result, and this patch adds
      a check for that.  This patch also adds the new test
      `close_member.c` to reveal a missing transfer and deletion.
    2. Use of discrete memory might have been forced by
      `omp_target_associate_ptr`, as in the test
      `unified_shared_memory/api.c`.  In the current `targetDataEnd`
      implementation, this condition turns out not be used for this
      case: because the reference count is infinite, a transfer is
      possible only with an `always` modifier, and this condition is
      never used in that case.  To ensure it's never used for that
      case in the future, this patch adds the test
      `unified_shared_memory/associate_ptr.c`.

Fortunately, `DeviceTy::getTgtPtrBegin` already has a solution: it
reports whether the allocation was found in shared memory via the
variable `IsHostPtr`.

After this patch, `HasCloseModifier` is no longer used in
`targetDataEnd`, and I wonder if the `close` modifier is ever useful
on an `omp target data end`.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D107925
2021-09-01 17:31:42 -04:00
Jon Chesterfield cef1199686 Revert "[openmp] No longer use LIBRARY_PATH to find devicertl"
This reverts commit 7a228f872f.
Failing test case under CI
2021-09-01 20:44:12 +01:00
Jon Chesterfield 7a228f872f [openmp] No longer use LIBRARY_PATH to find devicertl
Given D109057, change test runner to use the libomptarget-x-bc-path
argument instead of the LIBRARY_PATH environment variable to find the device
library.

Also drop the use of LIBRARY_PATH environment variable as it is far
too easy to pull in the device library from an unrelated toolchain by accident
with the current setup. No loss in flexibility to developers as the clang
commandline used here is still available.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109061
2021-09-01 20:24:34 +01:00
Jon Chesterfield 718e5a9883 [libomptarget] Set runpath on libomptarget, use that to drop LD_LIBRARY_PATH from test runner
Using rpath instead of LD_LIBRARY_PATH to find libomp.so and
libomptarget.so lets one rerun the already built test executables without
setting environment variables and removes the risk of the test runner picking
up different libraries to the developer debugging the failure.

rpath usually means runpath, which is not transitive, so set runpath on
libomptarget itself so that it can find the plugins located next to it,
spelled $ORIGIN. This provides sufficient functionality to drop D102043

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D109071
2021-09-01 18:47:56 +01:00
Jon Chesterfield f8bcbb82a7 [libomptarget] Normalise a cmake debug string, checking it triggers CI 2021-09-01 14:24:28 +01:00
Vignesh Balasubramanian b9a27908f9 [OpenMP][OMPD] Implementation of OMPD debugging library - libompd.
This is a continuation of the review: https://reviews.llvm.org/D100181
Creates a new directory "libompd" under openmp.

"TargetValue" provides operational access to the OpenMP runtime memory
 for OMPD APIs.
With TargetValue, using "pointer" a user can do multiple operations
from casting, dereferencing to accessing an element for structure.
The member functions are designed to concatenate the operations that
are needed to access values from structures.

e.g., _a[6]->_b._c would read like :
TValue(ctx, "_a").cast("A",2)
	.getArrayElement(6).access("_b").cast("B").access("_c")
For example:
If you have a pointer "ThreadHandle" of a running program then you can
access/retrieve "threadID" from the memory using TargetValue as below.

  TValue(context, thread_handle->th) /*__kmp_threads[t]->th*/
    .cast("kmp_base_info_t")
    .access("th_info") /*__kmp_threads[t]->th.th_info*/
    .cast("kmp_desc_t")
    .access("ds") /*__kmp_threads[t]->th.th_info.ds*/
    .cast("kmp_desc_base_t")
    .access("ds_thread") /*__kmp_threads[t]->th.th_info.ds.ds_thread*/
                .cast("kmp_thread_t")
.getRawValue(thread_id, 1);

Reviewed By: @hbae
Differential Revision: https://reviews.llvm.org/D100182
2021-09-01 14:50:16 +05:30
Joel E. Denny 1688b4cf8e [OpenMP][AMDGPU] XFAIL test where kernels call printf 2021-08-31 22:11:28 -04:00
Joel E. Denny ec1ebcd302 [OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in runtime (2/2)
This patch implements OpenMP runtime support for an original OpenMP
extension we have developed to support OpenACC: the `ompx_hold` map
type modifier.  The previous patch in this series, D106509, implements
Clang support and documents the new functionality in detail.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D106510
2021-08-31 16:13:49 -04:00
Joel E. Denny 83ddfa0d22 [OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in Clang (1/2)
This patch implements Clang support for an original OpenMP extension
we have developed to support OpenACC: the `ompx_hold` map type
modifier.  The next patch in this series, D106510, implements OpenMP
runtime support.

Consider the following example:

```
 #pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x
 {
   foo(); // might have map(delete: x)
   #pragma omp target map(present, alloc: x) // x is guaranteed to be present
   printf("%d\n", x);
 }
```

The `ompx_hold` map type modifier above specifies that the `target
data` directive holds onto the mapping for `x` throughout the
associated region regardless of any `target exit data` directives
executed during the call to `foo`.  Thus, the presence assertion for
`x` at the enclosed `target` construct cannot fail.  (As usual, the
standard OpenMP reference count for `x` must also reach zero before
the data is unmapped.)

Justification for inclusion in Clang and LLVM's OpenMP runtime:

* The `ompx_hold` modifier supports OpenACC functionality (structured
  reference count) that cannot be achieved in standard OpenMP, as of
  5.1.
* The runtime implementation for `ompx_hold` (next patch) will thus be
  used by Flang's OpenACC support.
* The Clang implementation for `ompx_hold` (this patch) as well as the
  runtime implementation are required for the Clang OpenACC support
  being developed as part of the ECP Clacc project, which translates
  OpenACC to OpenMP at the directive AST level.  These patches are the
  first step in upstreaming OpenACC functionality from Clacc.
* The Clang implementation for `ompx_hold` is also used by the tests
  in the runtime implementation.  That syntactic support makes the
  tests more readable than low-level runtime calls can.  Moreover,
  upstream Flang and Clang do not yet support OpenACC syntax
  sufficiently for writing the tests.
* More generally, the Clang implementation enables a clean separation
  of concerns between OpenACC and OpenMP development in LLVM.  That
  is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's
  extended OpenMP implementation and test suite without directly
  considering OpenACC's language and execution model, which can be
  handled by LLVM's OpenACC developers.
* OpenMP users might find the `ompx_hold` modifier useful, as in the
  above example.

See new documentation introduced by this patch in `openmp/docs` for
more detail on the functionality of this extension and its
relationship with OpenACC.  For example, it explains how the runtime
must support two reference counts, as specified by OpenACC.

Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new
command-line option introduced by this patch, is specified.

Reviewed By: ABataev, jdoerfert, protze.joachim, grokos

Differential Revision: https://reviews.llvm.org/D106509
2021-08-31 16:13:49 -04:00
Shilei Tian 8442967fe3 [OpenMP] Fix task wait doesn't work as expected in serialized team
As discussed in D107121, task wait doesn't work when a regular task T depends on
a detached task or a hidden helper task T' in a serialized team. The root cause is,
since the team is serialized, the last task will not be tracked by
`td_incomplete_child_tasks`. When T' is finished, it first releases its
dependences, and then decrements its parent counter. So far so good. For the thread
that is running task wait, if at the moment it is still spinning and trying to
execute tasks, it is fine because it can detect the new task and execute it.
However, if it happends to finish the function `flag.execute_tasks(...)`, it will
be broken because `td_incomplete_child_tasks` is 0 now.

In this patch, we update the rule to track children tasks a little bit. If the
task team encounters a proxy task or a hidden helper task, all following tasks
will be tracked.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D107496
2021-08-31 12:15:46 -04:00
Joachim Protze 5ea1c37118 [libomptarget][amdcgn] Only add opt/llvm-link dependency if TARGET is available
In some build configurations, the target we depend on is not available for declaring the build dependency.
We only need to declare the build dependency, if the build target is available in the same build.

Fixes the issue raised in https://reviews.llvm.org/D107156#2969862
This patch should go into release/13 together with D108404

Differential Revision: https://reviews.llvm.org/D108868
2021-08-30 17:32:11 +02:00
Shilei Tian e8fdacfd81 [OpenMP][NVPTX] Fixed missing variables for CUDA free compilation in NVPTX plugin
`CU_EVENT_DEFAULT` is defined in CUDA header. It should be added to
`openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` for CUDA free build.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108878
2021-08-28 18:08:10 -04:00
Shilei Tian 29df4ab3f3 [OpenMP][Offloading] Add support for event related interfaces
This patch adds the support form event related interfaces, which will be used
later to fix data race. See D104418 for more details.

Reviewed By: jdoerfert, ye-luo

Differential Revision: https://reviews.llvm.org/D108528
2021-08-28 16:24:14 -04:00
George Rokos a2bd44089e [libomptarget][NFC] Fixed tests which checked for obsolete string "getOrAllocTgtPtr" 2021-08-28 07:35:42 -07:00
Jon Chesterfield 78f92c3810 [openmp][amdgpu] Initial gfx10 offloading implementation
Lets wavefront size be 32 for amdgpu openmp, as well as 64.

Fixes up as little as possible to pass that through the libraries. This change
is end to end, as opposed to updating clang/devicertl/plugin separately. It can
be broken up for review/commit if preferred. Posting as-is so that others with
a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are
probably bugs remaining as well as the todo: for letting grid values vary more.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108708
2021-08-27 12:34:03 +01:00
George Rokos 3819aae6dd [libomptarget][NFC] Replaced obsolete name "getOrAllocTgtPtr" with new "getTargetPointer" in debug messages. 2021-08-26 18:01:18 -07:00