Commit Graph

811 Commits

Author SHA1 Message Date
Shilei Tian 458db51c10 [OpenMP] Add missing `tt_hidden_helper_task_encountered` along with `tt_found_proxy_tasks`
In most cases, hidden helper task behave similar as detached tasks. That means,
for example, if we have to wait for detached tasks, we have to do the same thing
for hidden helper tasks as well. This patch adds the missing condition for hidden
helper task accordingly along with detached task.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D107316
2021-12-29 23:22:53 -05:00
Johannes Doerfert 73104ad65b [OpenMP][NFC] Move headers into include folder 2021-12-28 23:53:28 -06:00
Shilei Tian 943d1d83dd [OpenMP][CUDA] Add resource pool for CUevent
Following D111954, this patch adds the resource pool for CUevent.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D116315
2021-12-28 17:42:38 -05:00
Shilei Tian 357c8031ff [OpenMP][Plugin] Minor adjustments to ResourcePool
This patch makes some minor adjustments to `ResourcePool`:
- Don't initialize the resources if `Size` is 0 which can avoid assertion.
- Add a new interface function `clear` to release all hold resources.
- If initial size is 0, resize to 1 when the first request is encountered.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D116340
2021-12-28 16:11:03 -05:00
Joseph Huber 7cdaa5a94e [OpenMP][FIX] Change globalization alignment to 16
This patch changes the default aligntment from 8 to 16, and encodes this
information in the `__kmpc_alloc_shared` runtime call to communicate it
to the HeapToStack pass. The previous alignment of 8 was not sufficient
for the maximum size of primitive types on 64-bit systems, and needs to
be increaesd. This reduces the amount of space availible in the data
sharing stack, so this implementation will need to be improved later to
include the alignment requirements in the allocation call, and use it
properly in the data sharing stack in the runtime.

Depends on D115888

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115971
2021-12-27 16:58:25 -05:00
Shilei Tian a697a0a4b6 [OpenMP][Plugin] Introduce generic resource pool
Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now
we have the need that some resources, such as CUDA stream and event, will be
hold by `libomptarget`. It is always good to buffer those resources. What's more
important, given the way that `libomptarget` and plugins are connected, we cannot
make sure whether plugins are still alive when `libomptarget` is destroyed. That
leads to an issue that those resouces hold by `libomptarget` might not be
released correctly. As a result, we need an unified management of all the resources
that can be shared between `libomptarget` and plugins.

`ResourcePoolTy` is designed to manage the type of resource for one device.
It has to work with an allocator which is supposed to provide `create` and
`destroy`. In this way, when the plugin is destroyed, we can make sure that
all resources allocated from native runtime library will be released correctly,
no matter whether `libomptarget` starts its destroy.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D111954
2021-12-27 11:32:14 -05:00
Jon Chesterfield 38af5b4fd1 [libomptarget][nfc] Refactor dlwrap.h for easier reuse in D115966 and upcoming patches 2021-12-17 22:28:31 +00:00
Jon Chesterfield 91dfb32f2f [openmp][amdgpu][nfc] Mark all external functions extern C to get type checking 2021-12-17 18:46:43 +00:00
Carlo Bertolli d3abb04e14 [OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter
I missed the async info parameter in the first version of this API.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115887
2021-12-17 15:58:18 +00:00
Carlo Bertolli d83dc4c648 [OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin
This patch extends the AMDGPU plugin for OpenMP target offloading from using a single HSA queue to multiple queues (four in this patch) per device. This enables concurrent threads to concurrently submit kernel launches to the same GPU.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115771
2021-12-15 15:33:17 +00:00
Joseph Huber 8425bde82d Revert "[OpenMP] Avoid costly shadow map traversals whenever possible"
This reverts commit 7c8f4e7b85.
Fails a few OpenMP tests, causes a few updates to segfault.
2021-12-10 15:57:58 -05:00
Joseph Huber 7c8f4e7b85 [OpenMP] Avoid costly shadow map traversals whenever possible
In the OpenMC app we saw `omp target update` spending an awful lot of
time in the shadow map traversal without ever doing any update there.
There are two cases that allow us to avoid the traversal completely.
The simplest thing is that small updates cannot (reasonably) contain
an attached pointer part. The other case requires to track in the
mapping table if an entry might contain an attached pointer as part.
Given that we have a single location shadow map entries are created,
the latter is actually fairly easy as well.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D113124
2021-12-10 14:33:18 -05:00
Carlo Bertolli 28309c5436 [OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous
and synchronous kernel launch implementations into a single
synchronous version.  This patch prepares the plugin for asynchronous
implementation by:

    Privatizing actual kernel launch code (valid in both cases) into
    an anonymous namespace base function (submitted at D115267)

    - Separating the control flow path of asynchronous and synchronous
      kernel launch functions** (this diff)

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115273
2021-12-10 19:21:05 +00:00
Joel E. Denny 51168ce8d5 [OpenMP] Add test for custom state machine if have reduction
D113602 broke the custom state machine when a reduction is present, as
revealed by the reproducer this patch adds to the test suite.  In that
case, openmp-opts changes the return value to undef in
`__kmpc_get_warp_size` (which the custom state machine calls as of
D113602).  Later optimizations then optimize away the custom state
machine code as if all threads are outside the thread block, so the
target region does not execute.  D114802 fixed that but didn't add a
reproducer.

This patch also adds a `__OMP_RTL_ATTRS` entry for
`__kmpc_get_warp_size` to OMPKinds.def, which D113602 missed.  This
change does not seem to have any impact on the reduction problem.

Reviewed By: JonChesterfield, jdoerfert

Differential Revision: https://reviews.llvm.org/D113824
2021-12-10 12:53:54 -05:00
Joseph Huber bc9c4d7216 [OpenMP][FIX] Pass the num_threads value directly to parallel_51
The problem with the old scheme is that we would need to keep track of
the "next region" and reset the num_threads value after it. The new RT
doesn't do it and an assertion is triggered. The old RT doesn't do it
either, I haven't tested it but I assume a num_threads clause might
impact multiple parallel regions "accidentally". Further, in SPMD mode
num_threads was simply ignored, for some reason beyond me.

In any case, parallel_51 is designed to take the clause value directly,
so let's do that instead.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D113623
2021-12-09 16:30:29 -05:00
Carlo Bertolli cc8dc5e28b [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115279
2021-12-08 23:02:39 +00:00
Jon Chesterfield 14ff611fe1 Revert "[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version"
This reverts commit 6de698bf10.
It didn't build in the dynamic_hsa configuration
2021-12-08 08:23:12 +00:00
Carlo Bertolli 6de698bf10 [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115279
2021-12-07 23:05:23 +00:00
Carlo Bertolli d9b1d827d2 [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch
At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version.
This patch prepares the plugin for asynchronous implementation by:
- Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function

Actual separation of kernel launch code (async vs sync) is a following patch.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115267
2021-12-07 21:02:45 +00:00
Ye Luo 21a51cebf1 [OpenMP][libomptarget] amdgpu plugin adds runpath for dependencies
amdgpu plugin depends on libhsa-runtime64 library. Add runpath in case it is not on the LD_LIBRARY_PATH.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115198
2021-12-06 18:19:18 -06:00
Jon Chesterfield a05a0c3c2f [libomptarget] Add cmake variables to disable building the amdgpu or cuda plugins
Analogous to the controls on building device runtimes

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115148
2021-12-06 16:42:26 +00:00
Jon Chesterfield a2b3b4dadc [openmp] Run tests on both runtimes, independent of the default
Minor fix to the lit.cfg. Currently, nvptx runs the tests twice on the new runtime.
Soon, amdgpu will run them on the new runtime as well as the old.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115150
2021-12-06 16:41:23 +00:00
Jon Chesterfield 9e08c2054a [openmp] Enable tests on new devicertl on amdgpu
Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D114891
2021-12-06 15:26:18 +00:00
Jon Chesterfield 1a87a18955 [openmp][amdgpu] Disable tests requiring USM on amdgcn
These tests tend to hang or crash on hardware that doesn't
support USM. Disabling them helps diagnose other issues. To safely
enable we require a means of testing whether USM is expected to work.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115144
2021-12-06 13:25:23 +00:00
Matt Arsenault 90f914c870 OpenMP: Un-xfail tests that pass now
729bf9b26b should have fixed these
2021-12-04 11:25:22 -05:00
Ron Lieberman 8f4013ad46 Restric xfail on openmp/libomptarget/test/mapping/reduction_implicit_map.cpp to amdgcn-amd-amdhsa 2021-12-02 20:58:26 +00:00
Ron Lieberman f87c2c637e xfail: libomptarget reduction_implicit_map.cpp after reapply of Start calling setTargetAttributes 2021-12-02 20:38:25 +00:00
Jon Chesterfield fb9fc3c951 [openmp][amdgpu] Disable three tests in preparation for new runtime 2021-12-02 07:57:14 +00:00
Jon Chesterfield 3ab150f6e4 [openmp][devicertl] Add a missing loader_uninitialized attribute 2021-11-29 23:54:37 +00:00
Matt Arsenault 935abeaace OpenMP: Correctly query location for amdgpu-arch
This was trying to figure out the build path for amdgpu-arch, and
making assumptions about where it is which were not working on my
system. Whether a standalone build or not, we should have a proper
imported target to get the location from.
2021-11-29 16:31:32 -05:00
Jon Chesterfield ae5348a38e [openmp][amdgpu] Make plugin robust to presence of explicit implicit arguments
OpenMP (compiler) does not currently request any implicit kernel
arguments. OpenMP (runtime) allocates and initialises a reasonable guess at
the implicit kernel arguments anyway.

This change makes the plugin check the number of explicit arguments, instead
of all arguments, and puts the pointer to hostcall buffer in both the current
location and at the offset expected when implicit arguments are added to the
metadata by D113538.

This is intended to keep things running while fixing the oversight in the
compiler (in D113538). Once that patch lands, and a following one marks
openmp kernels that use printf such that the backend emits an args element
with the right type (instead of hidden_node), the over-allocation can be
removed and the hardcoded 8*e+3 offset replaced with one read from the
.offset of the corresponding metadata element.

Reviewed By: estewart08

Differential Revision: https://reviews.llvm.org/D114274
2021-11-22 23:00:20 +00:00
Joseph Huber fbfe8fcbc3 [Libomptarget] Remove undefined symbol in old runtime
A function with no definition was left in the old runtime, causing
linker errors when trying to compile.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D114264
2021-11-20 08:26:57 -05:00
Jon Chesterfield 04954824ee [openmp][amdgpu][nfc] Simplify implicit args handling
Removes a +x/-x pair on the only store/load of a variable
and deletes some nearby dead code. Also reduces the size of the implicit
struct to reflect the code currently emitted by clang.

Differential Revision: https://reviews.llvm.org/D114270
2021-11-19 20:18:23 +00:00
Jon Chesterfield 9cdaf0b01b [openmp][amdgpu][nfc] Inline interop_hsa_get_kernel_info into only caller 2021-11-19 18:45:17 +00:00
Joseph Huber 374cd0fb61 [OpenMP] Fix initializer not working on AMDGPU
The RAII class used for debugging RTL entry used a shared variable to
keep track of the current depth. This used a global initializer, which
isn't supported on AMDGPU. This patch removes the initializer and
instead sets it to zero when the state is initialized in the runtime.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D113963
2021-11-16 08:17:15 -05:00
Joel E. Denny c9dfe322ee [OpenMP] Fix main thread barrier for Pascal and amdgpu
Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113602
2021-11-12 11:18:45 -05:00
Jon Chesterfield 27177b82d4 [OpenMP] Lower printf to __llvm_omp_vprintf
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680
2021-11-10 15:30:56 +00:00
Joachim Protze 52da6f562e Revert "[openmp] Add OMPT initialization in libomptarget"
Reverting initial OMPT for target implementation in favor of a
different implementation.

This reverts commit 3bc8ce5dd7.
2021-11-10 12:44:25 +01:00
Atmn Patel 737c4a2673 [clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421
2021-11-09 15:11:05 -05:00
Atmn Patel ef717f3852 Revert "[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files"
This reverts commit 81a7cad2ff.
2021-11-09 02:10:42 -05:00
Atmn Patel 81a7cad2ff [clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421
2021-11-09 01:52:52 -05:00
Vyacheslav Zakharin 1b409df613 [NFC] Initial documentation for declare target indirect support.
Differential Revision: https://reviews.llvm.org/D110193
2021-11-08 15:12:03 -08:00
Jon Chesterfield 0fa45d6d80 Revert "[OpenMP] Lower printf to __llvm_omp_vprintf"
This reverts commit db81d8f6c4.
2021-11-08 20:28:57 +00:00
Jon Chesterfield dc9edc6a6d Revert "[openmp] Fix build, test passes on CI unexpectedly"
This reverts commit c499d690cd.
2021-11-08 20:28:52 +00:00
Jon Chesterfield c499d690cd [openmp] Fix build, test passes on CI unexpectedly 2021-11-08 18:45:27 +00:00
Jon Chesterfield db81d8f6c4 [OpenMP] Lower printf to __llvm_omp_vprintf
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

The exact set of changes to check-openmp probably needs revision before commit

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680
2021-11-08 18:38:00 +00:00
Jon Chesterfield 4f4c826e75 [libomptarget] Drop remote plugin cmake version requirement to match llvm
LLVM docs at https://llvm.org/docs/CMake.html#quick-start state 3.13.4

Reviewed By: atmnpatel

Differential Revision: https://reviews.llvm.org/D113271
2021-11-05 17:34:28 +00:00
Johannes Doerfert d4b1cf8f9c [OpenMP] Build device runtimes for sm_86
Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D113111
2021-11-04 17:54:59 -05:00
Johannes Doerfert ab9f3f5d25 [OpenMP] Introduce the keepAlive function into the old device RT
Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D113110
2021-11-04 17:54:56 -05:00
Johannes Doerfert 93bebdc78f [OpenMP][NFCI] Cleanup new device RT mapping interface
Minimize the `impl` interface and clean up some uses of mapping
functions.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D112154
2021-11-04 17:54:53 -05:00