Commit Graph

559 Commits

Author SHA1 Message Date
Shilei Tian 0029059074 [NFC][OpenMP][Offloading] Unified the construction of mapping table entry
This patch unifies construction of mapping table entry to use `emplace`.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D104580
2021-06-22 12:38:47 -04:00
Joseph Huber 244e98ff48 [Libomptarget] Improve device runtime implementation for globalized variables.
Currently the runtime implementation of `__kmpc_alloc_shared` is extremely slow because it allocated memory for each thread individually. This patch adds a small buffer for the threads to share data and will greatly improve performance for builds where all globalization could not be optimized out. If the shared buffer is full, then memory will not only be allocated per-warp rather than per-thread.

Depends on D97680

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104666
2021-06-22 11:52:49 -04:00
Joseph Huber 952a0f2385 [Libomptarget] Introduce new globalization runtime calls
Summary:
This patch introduces the new globalization runtime to be used by D97680. These
runtime calls will replace the __kmpc_data_sharing_push_stack and
__kmpc_data_sharing_pop_stack functions.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102532
2021-06-22 10:05:42 -04:00
Pushpinder Singh 9d110f9159 [AMDGPU][Libomptarget] Move allow_access_to_all_gpu_agents to rtl.cpp
Moving this method helps eliminate a use of g_atl_machine.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104691
2021-06-22 11:44:52 +00:00
Vyacheslav Zakharin aad9e48c5f [NFC][libomptarget] Remove redundant libelf dependency for elf_common.
Differential Revision: https://reviews.llvm.org/D104549
2021-06-21 07:19:55 -07:00
Pushpinder Singh 7a97cd9da7 [AMDGPU][Libomptarget] Remove redundant functions
There does not seem to be any use of these functions. They just
put the value to a local which is never used again.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104512
2021-06-21 06:13:24 +00:00
Shilei Tian ec97866454 [OpenMP] Make bug49334.cpp more reproducible
`bug49334.cpp` cannot detect data race in `libomptarget` efficiently. It
is reported that with `N = 256` and `BS = 16`, the data race can be reproduced
more steadily. The next coming pathces will fix it so this patch is expected to
fail now.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104552
2021-06-18 18:35:41 -04:00
Vyacheslav Zakharin 836992ab9a [NFC][libomptarget] Build elf_common with PIC.
Differential Revision: https://reviews.llvm.org/D104545
2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin c5b7c7c8f7 [NFC][libomptarget] Fixed -DLLVM_ENABLE_RUNTIMES="openmp" build.
Differential Revision: https://reviews.llvm.org/D104535
2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin b5c4fc0f23 [NFC][libomptarget] Reduce the dependency on libelf
This change-set removes libelf usage from elf_common part of the plugins.
libelf is still used in x86_64 generic plugin code and in some plugins
(e.g. amdgpu) - these will have to be cleaned up in separate checkins.

Differential Revision: https://reviews.llvm.org/D103545
2021-06-16 08:34:23 -07:00
Pushpinder Singh cadcaf3f46 [AMDGPU][Libomptarget] Drop dead code related to g_atl_machine
This patch includes some changes which deletes the code accessing
g_atl_machine global. Some accesses related to memory_pools are
still remaining.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103813
2021-06-15 05:21:35 +00:00
Ron Lieberman 91f147792e [libomptarget][amdgpu] Remove stray fprintf in rtl.cpp
remove unintended fprintf in rtl.cpp

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D104003
2021-06-10 01:57:30 +00:00
Brendon Cahoon 294efbbd3e Reland "[AMDGPU] Add gfx1013 target"
This reverts commit 211e584fa2.

Fixed a use-after-free error that caused the sanitizers to fail.
2021-06-08 21:15:35 -04:00
Joseph Huber df965513a9 [OpenMP] Add an information flag for device data transfers
This patch adds an information flag that indicated when data is being copied to
and from the device. This will be helpful for finding redundant or unnecessary
data transfers in applications.

Reviewed By: jdoerfert, grokos

Differential Revision: https://reviews.llvm.org/D103927
2021-06-08 20:23:27 -04:00
Brendon Cahoon 211e584fa2 Revert "[AMDGPU] Add gfx1013 target"
This reverts commit ea10a86984.

A sanitizer buildbot reports an error.
2021-06-08 16:29:41 -04:00
Brendon Cahoon ea10a86984 [AMDGPU] Add gfx1013 target
Differential Revision: https://reviews.llvm.org/D103663
2021-06-08 12:49:49 -04:00
Pushpinder Singh 4f8bc7caf4 [AMDGPU][Libomptarget] Remove atlc global
This global struct used to hold various flags for monitoring the
initialization of hsa.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103795
2021-06-07 11:09:01 +00:00
Pushpinder Singh f5f329a371 [AMDGPU][Libomptarget] Rework logic for locating kernarg pools
Previous logic was to always use the first kernarg pool found to allocate
kernel args. This patch changes this to use only the kernarg pool which
has non-zero size. This logic is also reworked to not use any globals.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103600
2021-06-07 06:41:37 +00:00
Pushpinder Singh b25546a4b4 [AMDGPU][Libomptarget][NFC] Remove bunch of dead structs
Dropped structs are atmi_machine_t, atmi_device_t and atmi_memory_t

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103509
2021-06-02 10:40:51 +00:00
Pushpinder Singh 2368170a8d [AMDGPU][Libomptarget][NFC] Remove atmi_place_t
atmi_place_t has been replaced with int DeviceId.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103508
2021-06-02 10:35:28 +00:00
Pushpinder Singh fb113264a8 [AMDGPU][Libomptarget] Remove g_atmi_machine global
Turns out the only purpose of this class was verify if device ID
was in range or not which could be done easily by using g_atl_machine.

Still getting rid of g_atl_machine is pending which would be done in
a later patch.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103443
2021-06-01 12:34:24 +00:00
Pushpinder Singh 4fc3286951 [AMDGPU][Libomptarget][NFC] Split host and device malloc
This patch splits the code path for host and device malloc.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103389
2021-05-31 12:09:18 +00:00
Pushpinder Singh 8b79dfb302 [AMDGPU][Libomptarget][NFC] Remove atmi_mem_place_t
This struct was used to specify the device on which memory was
being allocated/free in atmi_malloc/free. It has now been replaced
with int DeviceId.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103239
2021-05-27 11:53:18 +00:00
Jon Chesterfield 2fdf8bbd19 [libomptarget][nfc][amdgpu] Factor out setting upper bounds
Refactor suggested in D103037 to help avoid similar copy-paste errors.
Change is mechanical. Some parts of this would be more robust with unsigned.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D103090
2021-05-26 19:57:49 +01:00
Jon Chesterfield c5c1ec7945 [libomptarget][nfc][amdgpu] Refactor uses of KernelInfoTable
Suggested in D103059. Use a single lookup instead of two, more const, less mutation.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D103093
2021-05-26 19:25:25 +01:00
Jon Chesterfield 07f59baad6 [libomptarget][nfc][amdgpu] Remove atmi_status_t type
ATMI_STATUS_UNKNOWN was unused, deleted references to it.
Replaced ATMI_STATUS_{SUCCESS,ERROR} with HSA_STATUS_{SUCCESS,ERROR}
Replaced atmi_status_t with hsa_status_t

Otherwise no change. In particular, conversions between atmi_status_t and
hsa_status_t will now be conversions between hsa_status_t and itself.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D103115
2021-05-26 17:02:19 +01:00
Pushpinder Singh a2d6ef5876 [AMDGPU][Libomptarget] Inline atmi_init/atmi_finalize
After D102847, these functions can be inlined.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103075
2021-05-26 10:50:08 +00:00
Pushpinder Singh cc8661ac4a [AMDGPU][Libomptarget] Delete g_atmi_initialized
This patch drops g_atmi_initialized and inlines the Initialize &
Finalize methods from Runtime class.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102847
2021-05-26 10:46:54 +00:00
Pushpinder Singh 7648b6978e [AMDGPU][Libomptarget] Move Kernel/Symbol info tables to RTLDeviceInfoTy
Two globals KernelInfoTable & SymbolInfoTable are moved
into RTLDeviceInfoTy class.
This builds on the top of D102691.
[2/2]

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102692
2021-05-26 10:02:28 +00:00
Jon Chesterfield df005fa364 [libomptarget][nfc] Move hostcall required test to rtl
[libomptarget][nfc] Move hostcall required test to rtl

Remove a global, fix minor race. First of N patches to bring up hostcall.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103058
2021-05-25 22:43:17 +01:00
Pushpinder Singh b0d68c7141 [AMDGPU][Libomptarget] Mark lambda_by_value test as XFAIL
Reason: Missing printf definition

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103078
2021-05-25 12:16:54 +00:00
Jon Chesterfield 75492e20fb [libomptarget][nfc] Accept callable for hsa iterate_symbols
[libomptarget][nfc] Accept callable for hsa iterate_symbols
Candidate refactor to simplify D102692

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D103030
2021-05-25 09:29:11 +01:00
Dhruva Chakrabarti 96d70f4d28 [libomptarget] [amdgpu] Added LDS usage to the kernel trace
Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103059
2021-05-24 19:33:48 -07:00
Dhruva Chakrabarti ca17b26d4d [libomptarget] [amdgpu] Fix copy-paste error setting NumThreads for a corner case.
Fix the case where NumTeams was set incorrectly instead of NumThreads

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103037
2021-05-24 15:23:15 -07:00
Pushpinder Singh 486110eb41 [AMDGPU][Libomptarget] Remove global KernelNameMap
KernelNameMap contains entries like "key.kd" => key which clearly
could be replaced by simple logic of removing suffix from the key.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102691
2021-05-24 08:46:08 +00:00
George Rokos d0bc04d6b9 [libomptarget] Fix a bug whereby firstprivates are not copied over to the device
The check for the TO flag when processing firstprivates is missing. As a result,
sometimes the device copy of a firstprivate never gets initialized. Currectly we
try to force lambda structs to be allocated immediately by marking them as a
non-firstprivate, so that PrivateArgumentManagerTy::addArg allocates memory for
them immediately. However, calling addArg with IsFirstPrivate=false makes the
function skip initializing the device copy. Whether an argument is firstprivate
and whether we need to allocate memory immediately are not synonyms, so this
patch introduces one more control variable for immediate allocation and sets it
apart from initialization.

Differential Revision: https://reviews.llvm.org/D102890
2021-05-21 10:52:08 -07:00
Jon Chesterfield d54712ab4d [libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation
[libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation

There are a lot of different ways we might implement the devicertl local alloc
and free functions. Via host, local buffers (stack or arena), specialising per
kernel etc. It is not yet clear what the right design is. This change makes the
alloc and free functions weak, so one can override them from local tests while
comparing options.

Not strictly necessary, as a comparable patch can be applied locally each time,
but would be convenient for out of tree dev. Plan would be to drop the weak
attribute at the same time as introducing a working allocator to trunk.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102499
2021-05-21 16:09:22 +01:00
Jon Chesterfield 68b88ae670 [libomptarget] Improve dlwrap compile time error diagnostic
[libomptarget] Improve dlwrap compile time error diagnostic

The dlwrap interface takes an explict arity, e.g. DLWRAP(cuAlloc, 2);
This probably can't be eliminated as it controls the argument list of an
external symbol, not an inline header function. If the arity given is too
big, the error from clang referring to the line is in the middle of
implementation details.

/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1277:7: error: static_assert failed
      due to requirement '0UL < tuple_size<std::tuple<>>::value' "tuple index is in range"
      static_assert(__i < tuple_size<tuple<>>::value,
      ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1260:7: ...
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1260:7: ...
/home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:93:27 ...

/home/amd/llvm-project/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp:34:1: note: in
      instantiation of template class 'dlwrap::trait<cudaError_enum (*)(unsigned long *, unsigned
      long)>::arg<2>' requested here
DLWRAP(cuMemAlloc, 3);
^
/home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:51:31: ...
/home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:166:3: ...
/home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:133:3: ...
/home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:186:37: ...

If the arity is too small, the diagnostic is better:

cuda/dynamic_cuda/cuda.cpp:34:1: error: too few
      arguments to function call, expected 2, have 1
DLWRAP(cuMemAlloc, 1);

This patch changes the diagnostic to:

cuda/dynamic_cuda/cuda.cpp:34:1: error:
      static_assert failed due to requirement '1 == trait<cudaError_enum (*)(unsigned long *, unsigned
      long)>::nargs' "Arity Error"
DLWRAP(cuMemAlloc, 1);

or

cuda/dynamic_cuda/cuda.cpp:34:1: error:
      static_assert failed due to requirement '3 == trait<cudaError_enum (*)(unsigned long *, unsigned
      long)>::nargs' "Arity Error"
DLWRAP(cuMemAlloc, 3);

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102858
2021-05-20 20:33:36 +01:00
Jon Chesterfield d18fb09c69 [libomptarget][amdgpu] Remove majority of fatal errors
[libomptarget][amdgpu] Remove majority of fatal errors

Replaces most calls to exit() with returning an error to the library entry
point. Minor changes to error handling for clear bugs, remove some dead code.

Each exit() call site replaced is either in a library entry point or a
function that already returns error codes on some paths. The existing handling
is not well tested but replacing exit() with a fallback path should be a strict
improvement.

Remaining two early exit points are an abort() from a callback and exit() from
within msgpack. Fixes for those are less obvious and left for a later patch.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D102346
2021-05-20 16:26:43 +01:00
Jon Chesterfield ea68ad6e26 [libomptarget] Disable test bug49334 on amdgpu
[libomptarget] Disable test bug49334 on amdgpu

Hangs on amdgpu, do not know why. Disable to unblock build.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D102017
2021-05-20 15:46:56 +01:00
Pushpinder Singh d7503c3bce [AMDGPU][Libomptarget] Rename & move g_executables to private
This patch moves g_executables to private member of Runtime class
and is renamed to HSAExecutables following LLVM naming convention.

This movement required making Runtime::Initialize and Runtime::Finalize
non-static. Verified the correctness of this change by running
libomptarget tests on gfx906.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102600
2021-05-18 05:43:23 +00:00
Pushpinder Singh 3bc2b97b34 [AMDGPU][libomptarget] Remove unused global variables
This initial patch removes some unused variables from global namespace.
There will more incoming patches for moving global variables to classes
or static members.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102598
2021-05-18 05:40:49 +00:00
Aakanksha Patil 464e4dc50f [AMDGPU] Add gfx1034 target
Differential Revision: https://reviews.llvm.org/D102306
2021-05-13 14:25:18 -04:00
Jon Chesterfield 10de217209 [libomptarget][amdgpu] Fix truncation error for partial wavefront
[libomptarget][amdgpu] Fix truncation error for partial wavefront

The partial barrier implementation involves one wavefront resetting and N-1
waiting. This change future proofs against launching with a number of threads
that is not a multiple of the wavefront size.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102407
2021-05-13 17:31:57 +01:00
Jon Chesterfield b049870d3b [libomptarget][amdgpu] Convert an assert to print and offload_fail
[libomptarget][amdgpu] Convert an assert to print and offload_fail

The kernel launched is supposed to be present in the binary, but a not yet
diagnosed bug means it is missing for some of the qmcpack test cases. Changing
from assert to print and offload_fail should help diagnose that and similar bugs.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102378
2021-05-13 17:31:36 +01:00
Michael Kruse 34ed3e6337 [OpenMP] Test unified shared memory tests only on systems that support it.
Add a `REQUIRES: unified_shared_memory` option to tests that use `#pragma omp requires unified_shared_memory`.

For CUDA, the feature tag is derived from LIBOMPTARGET_DEP_CUDA_ARCH which itself is derived using [[ https://cmake.org/cmake/help/latest/module/FindCUDA.html#commands | cuda_select_nvcc_arch_flags ]]. The latter determines which compute capability the GPU in the system supports. To ensure that this is the CUDA arch being used, we could also set the `-Xopenmp-target -march=` flag.
In the absence of an NVIDIA GPU, LIBOMPTARGET_DEP_CUDA_ARCH will be 35. That is, in that case we are assuming unified_shared_memory is not available. CUDA plugin testing could be disabled entirely in this case, but this currently depends on `LIBOMPTARGET_CAN_LINK_LIBCUDA OR LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA`, not on whether the hardware is actually available.

For all other targets, nothing changes and we are assuming unified shared memory is available. This might need refinement if not the case.

This tries to fix the [[ http://meinersbur.de:8011/#/builders/143 | OpenMP Offloading Buildbot ]] that, although brand-new, only has a Pascal-generation (sm_61) GPU installed. Hence, tests that require unified shared memory are currently failing. I wish I had known in advance.

Reviewed By: protze.joachim, tianshilei1992

Differential Revision: https://reviews.llvm.org/D101498
2021-05-13 11:08:04 -05:00
Jon Chesterfield 9934571eab [libomptarget][amdgpu][nfc] Expand errorcheck macros
[libomptarget][amdgpu][nfc] Expand errorcheck macros

These macros expand to continue, which is confusing, or exit,
which is incompatible with continuing execution on offloading fail.

Expanding the macros in place makes the code look untidy but the
control flow obvious and amenable to improving. In particular, exit
becomes easier to eliminate.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D102230
2021-05-12 17:30:41 +01:00
Jon Chesterfield 72995a4bdf [libomptarget][nfc] Add hook to easily disable building amdgcn bclib
[libomptarget][nfc] Add hook to easily disable building amdgcn bclib

This is useful when building LLVM with a toolchain that can't emit code
for amdgcn, e.g. because it overrides the include search path with headers
from another architecture, or the clang compiler is missing builtins.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102229
2021-05-11 17:23:09 +01:00
Jon Chesterfield dedca78d48 [libomptarget][nfc] Drop stringify in macro
[libomptarget][nfc] Drop stringify in macro
A step towards deleting the macros entirely.

Differential Revision: https://reviews.llvm.org/D102228
2021-05-11 12:19:55 +01:00
Jon Chesterfield 6da348569c [libomptarget] Add support for target allocators to dynamic cuda RTL
[libomptarget] Add support for target allocators to dynamic cuda RTL

Follow on to D102000 which introduced new calls into libcuda. This patch adds
the corresponding entry points to dynamic_cuda, fixing the build for systems
that do not have the cuda toolkit installed.

Function types and enum from https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D102169
2021-05-10 15:27:50 +01:00