Commit Graph

236 Commits

Author SHA1 Message Date
Jon Chesterfield ca84c43d69 [openmp][amdgpu] Disable tests on old runtime, enable tests on new one 2022-01-19 15:49:47 +00:00
Jon Chesterfield e35c8f541c [openmp][amdgpu] Temporarily disable tests on old runtime 2022-01-19 15:39:00 +00:00
Jon Chesterfield a74826d30a [openmp][amdgpu] Replace unsigned long with uint64_t
Some types need to be 64 bit. Unsigned long is a hazard there.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D116963
2022-01-10 22:19:30 +00:00
Shilei Tian 943d1d83dd [OpenMP][CUDA] Add resource pool for CUevent
Following D111954, this patch adds the resource pool for CUevent.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D116315
2021-12-28 17:42:38 -05:00
Shilei Tian 357c8031ff [OpenMP][Plugin] Minor adjustments to ResourcePool
This patch makes some minor adjustments to `ResourcePool`:
- Don't initialize the resources if `Size` is 0 which can avoid assertion.
- Add a new interface function `clear` to release all hold resources.
- If initial size is 0, resize to 1 when the first request is encountered.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D116340
2021-12-28 16:11:03 -05:00
Shilei Tian a697a0a4b6 [OpenMP][Plugin] Introduce generic resource pool
Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now
we have the need that some resources, such as CUDA stream and event, will be
hold by `libomptarget`. It is always good to buffer those resources. What's more
important, given the way that `libomptarget` and plugins are connected, we cannot
make sure whether plugins are still alive when `libomptarget` is destroyed. That
leads to an issue that those resouces hold by `libomptarget` might not be
released correctly. As a result, we need an unified management of all the resources
that can be shared between `libomptarget` and plugins.

`ResourcePoolTy` is designed to manage the type of resource for one device.
It has to work with an allocator which is supposed to provide `create` and
`destroy`. In this way, when the plugin is destroyed, we can make sure that
all resources allocated from native runtime library will be released correctly,
no matter whether `libomptarget` starts its destroy.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D111954
2021-12-27 11:32:14 -05:00
Jon Chesterfield 38af5b4fd1 [libomptarget][nfc] Refactor dlwrap.h for easier reuse in D115966 and upcoming patches 2021-12-17 22:28:31 +00:00
Jon Chesterfield 91dfb32f2f [openmp][amdgpu][nfc] Mark all external functions extern C to get type checking 2021-12-17 18:46:43 +00:00
Carlo Bertolli d3abb04e14 [OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter
I missed the async info parameter in the first version of this API.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115887
2021-12-17 15:58:18 +00:00
Carlo Bertolli d83dc4c648 [OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin
This patch extends the AMDGPU plugin for OpenMP target offloading from using a single HSA queue to multiple queues (four in this patch) per device. This enables concurrent threads to concurrently submit kernel launches to the same GPU.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115771
2021-12-15 15:33:17 +00:00
Carlo Bertolli 28309c5436 [OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous
and synchronous kernel launch implementations into a single
synchronous version.  This patch prepares the plugin for asynchronous
implementation by:

    Privatizing actual kernel launch code (valid in both cases) into
    an anonymous namespace base function (submitted at D115267)

    - Separating the control flow path of asynchronous and synchronous
      kernel launch functions** (this diff)

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115273
2021-12-10 19:21:05 +00:00
Carlo Bertolli cc8dc5e28b [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115279
2021-12-08 23:02:39 +00:00
Jon Chesterfield 14ff611fe1 Revert "[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version"
This reverts commit 6de698bf10.
It didn't build in the dynamic_hsa configuration
2021-12-08 08:23:12 +00:00
Carlo Bertolli 6de698bf10 [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115279
2021-12-07 23:05:23 +00:00
Carlo Bertolli d9b1d827d2 [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch
At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version.
This patch prepares the plugin for asynchronous implementation by:
- Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function

Actual separation of kernel launch code (async vs sync) is a following patch.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115267
2021-12-07 21:02:45 +00:00
Ye Luo 21a51cebf1 [OpenMP][libomptarget] amdgpu plugin adds runpath for dependencies
amdgpu plugin depends on libhsa-runtime64 library. Add runpath in case it is not on the LD_LIBRARY_PATH.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115198
2021-12-06 18:19:18 -06:00
Jon Chesterfield a05a0c3c2f [libomptarget] Add cmake variables to disable building the amdgpu or cuda plugins
Analogous to the controls on building device runtimes

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115148
2021-12-06 16:42:26 +00:00
Jon Chesterfield 9e08c2054a [openmp] Enable tests on new devicertl on amdgpu
Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D114891
2021-12-06 15:26:18 +00:00
Matt Arsenault 935abeaace OpenMP: Correctly query location for amdgpu-arch
This was trying to figure out the build path for amdgpu-arch, and
making assumptions about where it is which were not working on my
system. Whether a standalone build or not, we should have a proper
imported target to get the location from.
2021-11-29 16:31:32 -05:00
Jon Chesterfield ae5348a38e [openmp][amdgpu] Make plugin robust to presence of explicit implicit arguments
OpenMP (compiler) does not currently request any implicit kernel
arguments. OpenMP (runtime) allocates and initialises a reasonable guess at
the implicit kernel arguments anyway.

This change makes the plugin check the number of explicit arguments, instead
of all arguments, and puts the pointer to hostcall buffer in both the current
location and at the offset expected when implicit arguments are added to the
metadata by D113538.

This is intended to keep things running while fixing the oversight in the
compiler (in D113538). Once that patch lands, and a following one marks
openmp kernels that use printf such that the backend emits an args element
with the right type (instead of hidden_node), the over-allocation can be
removed and the hardcoded 8*e+3 offset replaced with one read from the
.offset of the corresponding metadata element.

Reviewed By: estewart08

Differential Revision: https://reviews.llvm.org/D114274
2021-11-22 23:00:20 +00:00
Jon Chesterfield 04954824ee [openmp][amdgpu][nfc] Simplify implicit args handling
Removes a +x/-x pair on the only store/load of a variable
and deletes some nearby dead code. Also reduces the size of the implicit
struct to reflect the code currently emitted by clang.

Differential Revision: https://reviews.llvm.org/D114270
2021-11-19 20:18:23 +00:00
Jon Chesterfield 9cdaf0b01b [openmp][amdgpu][nfc] Inline interop_hsa_get_kernel_info into only caller 2021-11-19 18:45:17 +00:00
Jon Chesterfield 4f4c826e75 [libomptarget] Drop remote plugin cmake version requirement to match llvm
LLVM docs at https://llvm.org/docs/CMake.html#quick-start state 3.13.4

Reviewed By: atmnpatel

Differential Revision: https://reviews.llvm.org/D113271
2021-11-05 17:34:28 +00:00
Kazu Hirata 3cfc1757c5 Ensure newlines at the end of files (NFC) 2021-10-29 20:26:09 -07:00
Jon Chesterfield 6c7b203d1d Revert "[libomptarget] Build DeviceRTL for amdgpu"
- more tests failing on CI than failed locally when writing this patch

This reverts commit 33427fdb7b.
2021-10-28 01:01:53 +01:00
Jon Chesterfield 33427fdb7b [libomptarget] Build DeviceRTL for amdgpu
Passes same tests as the current deviceRTL. Includes cmake change from D111987.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112227
2021-10-28 00:41:45 +01:00
Kazu Hirata d8e4170b0a Ensure newlines at the end of files (NFC) 2021-10-23 08:45:29 -07:00
Jon Chesterfield bf6f955f39 [libomptarget] Run GPU offloading tests on both new and old runtime
Implemented by patching python config instead of modifying all
the tests so that -generic and XFAIL work as usual. Expectation is for
this to be reverted once the old runtime is deleted.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D112225
2021-10-22 23:28:44 +01:00
Joseph Huber b1ce454930 [OpenMP] Remove macro guards for device debugging
The plugin currently uses a macro to check if this is a debug built
before assigning the debug kind variable to the device environment
struct. This is being deprecated because the new device runtime does not
maintain separate debug builds and should always be availible.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112083
2021-10-19 12:21:43 -04:00
Ron Lieberman d022f39d9f [libomptarget][amdgpu][NFC] tweak a comment 2021-10-09 12:51:53 -04:00
Shilei Tian c060c634ef [OpenMP][NVPTX] Fix an error in configuring #teams and #threads
It must be a copy mistake.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D111407
2021-10-08 11:07:43 -04:00
Jon Chesterfield 1bc3a6e41b [libomptarget] Reapply 2bc4d48a78 which was accidentally reverted 2021-10-07 20:17:48 +01:00
Jon Chesterfield 0c554a4769 [libomptarget] Move device environment to shared header, remove divergence
Follow on to D110006, related to D110957

Where implementations have diverged this resolves to match the new DeviceRTL

- replaces definitions of this struct in deviceRTL and plugins with include
- changes the dynamic_shared_size field from D110006 to 32 bits
- handles stdint being unavailable in DeviceRTL
- adds a zero initializer for the field to amdgpu
- moves the extern declaration for deviceRTL to target_interface
  (omptarget.h is more natural, but doesn't work due to include order
  with debug.h)
- Renames the fields everywhere to match the LLVM format used in DeviceRTL
- Makes debug_level uint32_t everywhere (previously sometimes int32_t)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111069
2021-10-07 12:03:48 +01:00
Michał Górny 0873b9bef4 [openmp] [elf_common] Fix linking against LLVM dylib
The hand-rolled linking logic in elf_common does not account for
the possibility of using LLVM dylib rather than a dozen static
libraries.  Since it does not seem to be easily convertible
to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB.
This is necessary to support stand-alone builds against installed LLVM.

Differential Revision: https://reviews.llvm.org/D111038
2021-10-04 09:29:06 +02:00
Jon Chesterfield 05ba9ff6a6 [libomptarget][amdgpu] Refactor memory pool collection 2021-10-01 14:58:01 +01:00
Jon Chesterfield b75a7481ba [libomptarget] Apply D110029 to amdgpu
Use enum for execution mode.

This is partly a port from ROCm and partly a port from D110029. Attempted to
make the same choices as ROCm as far as comments etc go to reduce the merge
conflicts.

There is some cleanup warranted here - in particular I like the cuda patch
factoring out the comparisons into named variables - but I'd like to leave
that for a follow up patch, keeping this one minimal.

Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D110845
2021-09-30 21:29:37 +01:00
Dhruva Chakrabarti 6226270253 [libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed.
Fixes: SWDEV-275232 (With contributions from Ammar Elwazir, Laurent Morichetti, and Tony Tye)

The current code is racy. After the packet is submitted, the GPU will increment the read index. If this wraps around before the memory is read from it'll refer to a signal from an unrelated packet. Change avoids reading from the packet post-submission.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110679
2021-09-29 09:22:07 -07:00
Jon Chesterfield 2bc4d48a78 [libomptarget][amdgpu] Follow on to D110513, empty kernarg pools are not fatal 2021-09-27 22:44:35 +01:00
Jon Chesterfield 738734f655 [libomptarget][amdgpu] Report zero devices if plugin construction fails, instead of segv 2021-09-27 22:13:12 +01:00
Pushpinder Singh b1695c2eb8 [AMDGPU][OpenMP] Add memory pool size check to isValidMemoryPool
Keeping all the checks in one place for future simplification.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110513
2021-09-27 12:29:00 +00:00
Pushpinder Singh 9d0eb440ff [libomptarget][nfc][amdgpu] Reorder function to clarify review diff 2021-09-27 09:30:55 +00:00
Jon Chesterfield 726a34f063 [libomptarget][amdgpu] Replace dead exit call with returning error 2021-09-27 09:43:37 +01:00
Jon Chesterfield 8cf93a35d4 [libomptarget][amdgpu] Destruct HSA queues
Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D109511
2021-09-26 15:34:21 +01:00
Shilei Tian ca999f7191 [OpenMP][Offloading] Use bitset to indicate execution mode instead of value
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics

We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.

This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)

In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110029
2021-09-22 11:40:52 -04:00
Shilei Tian 49e976c934 [OpenMP][NVPTX] Fix a warning that data argument not used by format string
Reviewed By: jhuber6, grokos

Differential Revision: https://reviews.llvm.org/D110104
2021-09-20 17:22:14 -04:00
Joseph Huber f1c821fa85 [OpenMP] Add support for dynamic shared memory in new RTL
This patch adds support for using dynamic shared memory in the new
device runtime. The new function `__kmpc_get_dynamic_shared` will return a
pointer to the buffer of dynamic shared memory. Currently the amount of memory
allocated is set by an environment variable.

In the future this amount will be added to the amount used for the smart stack
which will be configured in a similar way.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110006
2021-09-17 21:25:36 -04:00
Joseph Huber ec02c34b6d [OpenMP] Add additional fields to device environment
This patch adds fields for the device number and number of devices into
the device environment struct and debugging values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110004
2021-09-17 21:25:32 -04:00
Hansang Bae ae2a5facce [OpenMP][libomptarget] Minor fix in x86_64 plugin
Call to remove() was passing invalid address for the file name.

Differential Revision: https://reviews.llvm.org/D109846
2021-09-15 15:57:06 -05:00
Jon Chesterfield 6760234e8d [libomptarget][amdgpu] Precisely manage hsa lifetime
The hsa library must be initialized before any calls into it and
destructed after the last call into it. There have been a number of bugs in
this area related to member variables which would like to use raii to manage
resources acquired from hsa.

This patch moves the init/shutdown of hsa into a class, such that when used as
the first member variable (could be a base), the lifetime of other member
variables are reliably scoped within it. This will allow other classes to use
raii reliably when used as member variables within the global.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D109512
2021-09-09 17:28:11 +01:00
Jon Chesterfield d642156f8f [libomptarget][nfc] Hoist hsa_init into rtl.cpp 2021-09-09 16:09:34 +01:00