Commit Graph

2232 Commits

Author SHA1 Message Date
serge-sans-paille 40d3a0ba4d [openmp] Fix strict aliasing issue in cmpxchg routine
Avoid warning under -fstrict-aliasing by using a call to memcpy to perform type
punning.

Differential Revision: https://reviews.llvm.org/D125467
2022-05-12 16:14:48 +02:00
AndreyChurbanov 52d0ef3c00 [OpenMP] libomp: Add itt notifications to sync dependent tasks.
Intel Inspector uses itt notifications to analyze code execution, and it
reports race conditions in dependent tasks.
This patch fixes the issue notifying Inspector on tasks dependency
synchronizations.

Differential Revision: https://reviews.llvm.org/D123042
2022-05-05 11:30:59 -05:00
AndreyChurbanov 4a64bed216 [OpenMP] libomp: cleanup - remove duplicate check
The identical check remains 20 lines above in the code.

Differential Revision: https://reviews.llvm.org/D123046
2022-05-05 11:01:20 -05:00
AndreyChurbanov eed0d85152 [OpenMP] libomp: cleanup dead code
Differential Revision: https://reviews.llvm.org/D123047
2022-05-05 10:56:49 -05:00
Hansang Bae 7e23b46ab8 [OpenMP] Possible fix for sporadic test failure from loop_dispatch.c
This patch tries to fix sporadic test failure after the change
https://reviews.llvm.org/D122107.
Made the test wait until every thread has at least one loop iteration.

Differential Revision: https://reviews.llvm.org/D124812
2022-05-03 14:46:32 -05:00
Joseph Huber 5ad07ac400 [Libomptarget] Use entry name for global info
Currently, globals on the device will have an infinite reference count
and an unknown name when using `LIBOMPTARGET_INFO` to print the mapping
table. We already store the name of the global in the offloading entry
so we should be able to use it, although there will be no source
location. To do this we need to create a valid `ident_t` string from a
name only.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D124381
2022-04-25 09:56:43 -04:00
Ye Luo 8a880db519 [libomptarget] Make omp_target_is_present checks storage instead of zero length array.
Consider checking whether a pointer has been mapped can be achieved via omp_get_mapped_ptr.
omp_target_is_present is more needed to check whether the storage being pointed is mapped.
This restore the old behavior of omp_target_is_present before D123093
Fixes https://github.com/llvm/llvm-project/issues/54899

Reviewed By: jdenny

Differential Revision: https://reviews.llvm.org/D123891
2022-04-22 17:37:06 -05:00
Ye Luo 91ccd8248c [Clang][OpenMP] libompd: get libomp hwloc includedir by target_link_libraries
When hwloc is used and is installed outside of the default paths, the omp CMake target
needs to provide the needed include path thru the CMake target by adding it with
target_include_directories to it, so libompd gets it as well when it defines it's cmake
target using target_link_libraries.

As suggested in D122667

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D123888
2022-04-22 17:33:41 -05:00
Joseph Huber f557bb8733 [OpenMP][Docs] Remove usage of deprecated flag in documentation
Summary:
This documentation used the `-fopenmp-target-new-runtime` flag which is
deprecated and has no effect. Remove it.
2022-04-21 18:50:25 -04:00
Atmn Patel c44420e90d [Libomptarget][remote] Add OpenMP linker flag to the plugin
The remote offloading server and plugin rely on OpenMP, so this needs to be added as a linker flag. Without this, applications segfault.

Differential Revision: https://reviews.llvm.org/D124200
2022-04-21 15:45:29 -04:00
Atmn Patel 489894f363 [Libomptarget][remote] Fix compile-time error
This fixes a compile-time error recently introduced within the remote
offloading plugin. This patch also removes some extra linker flags that are unnecessary, and adds an explicit abseil linker flag without which we occasionally get problems.

Differential Revision: https://reviews.llvm.org/D119984
2022-04-19 16:46:01 -04:00
Joseph Huber 80787213ea [Libomptarget] Fix test using old unsupported lit string
Summary:
One test had an old "unsupported" string that used the old `newDriver`
string which was removed. This test should be updated to use the
`oldDriver` one instead.
2022-04-18 23:08:12 -04:00
Joseph Huber ae23be84cb [OpenMP] Make the new offloading driver the default
Previously an opt-in flag `-fopenmp-new-driver` was used to enable the
new offloading driver. After passing tests for a few months it should be
sufficiently mature to flip the switch and make it the default. The new
offloading driver is now enabled if there is OpenMP and OpenMP
offloading present and the new `-fno-openmp-new-driver` is not present.

The new offloading driver has three main benefits over the old method:
- Static library support
- Device-side LTO
- Unified clang driver stages

Depends on D122683

Differential Revision: https://reviews.llvm.org/D122831
2022-04-18 15:05:09 -04:00
Joseph Huber ba01306009 [Libomptarget] Fix LIBOMPTARGET_INFO test
Summary:
A patch added a new line to one of the info outputs without updating
this test. This patch adds the new text to the existing test.
2022-04-18 14:09:02 -04:00
Dhruva Chakrabarti 7086a1db80 [libomptarget] [amdgpu] Hostcall offset check should consider implicit args
Fixed hostcall offset check to compare against kernarg segment size
and implicit arguments. Improved the corresponding debug print.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D123827
2022-04-15 00:53:47 +00:00
Saiyedul Islam 54a6cc3405
[libomptarget][amdgpu] Add hidden_heap_v1 kernarg metadata
Code object version 5 adds support of hidden_heap_v1 kernarg
metadata field [1]. It is a global address space pointer to an
initialized memory buffer that conforms to the requirements of the
malloc/free device library V1 version implementation.

[1] https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v5

Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D123527
2022-04-13 03:57:29 +00:00
Johannes Doerfert a3a42c3ca2 [OpenMP][FIX] Ensure to set the context for wait events if necessary
Differential Revision: https://reviews.llvm.org/D123445
2022-04-12 16:42:50 -05:00
Jonathan Peyton d49ce7c356 [OpenMP][libomp] Replace global variable references with local object
Remove references to global __kmp_topology within a kmp_topology_t
object method. There should just be implicit references to the
private object.
2022-04-12 12:50:41 -05:00
Jonathan Peyton 747a490612 [OpenMP][libomp] Fix some Doxygen issues
Fix spelling of variable names and remove accidental references (#)
in Doxygen comments.
2022-04-12 11:05:30 -05:00
Joseph Huber 2e0cb61570 [OpenMP] Fix linker error when building info tool
Summary:
The changes made in D123177 added new targets to the
`LIBOMPTARGET_TESTED_PLUGINS` variable which was linked against when
building the `llvm-omp-target-info` tool. This caused linker errors on
the export scripts. This patch removes that dependency, it still builds
and runs as expected so I will assume it's correct.
2022-04-08 10:50:31 -04:00
Ye Luo c1a6fe196d [libomptarget] Implement pointer lookup as 5.1 spec.
As described in 5.1 spec
2.21.7.2 Pointer Initialization for Device Data Environments

Reviewed By: RaviNarayanaswamy

Differential Revision: https://reviews.llvm.org/D123093
2022-04-07 23:01:25 -05:00
Joseph Huber a3f423cf57 [OpenMP] Add dynamic memory function to omp.h and add documentation
This patch adds the `llvm_omp_target_dynamic_shared_alloc` function to
the `omp.h` header file so users can access it by default. Also changed
the name to keep it consistent with the other target allocators. Added
some documentation so users know how to use it. Didn't add the interface
for Fortran since there's no way to test it right now.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D123246
2022-04-07 14:23:23 -04:00
Joseph Huber 840c040498 [OpenMP] Change target memory tests to use allocators
The target allocators have been supported for NVPTX offloading for
awhile. The tests should use the allocators instead of calling the
functions manually. Also the comments indicating these being a preview
should be removed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D123242
2022-04-07 14:23:14 -04:00
Michael Kruse 7fa7b0cbd8 [libomptarget] Add device RTL to regression test dependencies.
In a clean build directory, `check-openmp` or `check-libomptarget` will fail because of missing device RTL .bc files. Ensure that the new targets new custom targets `omptarget.devicertl.nvptx` and `omptarget.devicertl.amdgpu` (corresponding to the plugin rtl targets `omptarget.rtl.cuda`, respectively `omptarget.rlt.amdgpu` ) are dependencies of the regression tests.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D123177
2022-04-06 20:01:47 -05:00
Hansang Bae 090309d316 [OpenMP] Fix warnings
Silenced compiler warnings after pushing the following change.
https://reviews.llvm.org/D122107

Differential Revision: https://reviews.llvm.org/D123233
2022-04-06 12:35:05 -05:00
Hansang Bae e4ac11beb7 [OpenMP] Add support for ompt_callback_dispatch
This change adds support for ompt_callback_dispatch with the new
dispatch chunk type introduced in 5.2. Definitions of the new
ompt_work_loop types were also added in the header file.

Differential Revision: https://reviews.llvm.org/D122107
2022-04-06 08:01:02 -05:00
Jonathan Peyton d345fe7c22 [OpenMP][libomp] NFC: Move omp_* functions out of kmp_* section 2022-03-31 13:39:30 -05:00
Joachim Protze 7641e42def [OpenMP][Tools] Fix handling of initial-task-end
Latest OpenMP spec says parallel_data is NULL for initial/implicit-task-end.
We nevertheless need to cleanup the ParallelData here, as there is no other
callback for the end of the implicit parallel region. We can use the reference
stored in the TaskData.

Reviewed By: dreachem

Differential Revision: https://reviews.llvm.org/D114005
2022-03-31 12:33:40 -05:00
Ron Lieberman 95eac47260 [libomptarget] x86 offloading fails map_back_race.cpp intermittently
Differential Revision: https://reviews.llvm.org/D122658
2022-03-29 16:01:17 +00:00
Johannes Doerfert b803f06901 [OpenMP] The test does not have check lines 2022-03-29 00:02:55 -05:00
Johannes Doerfert b309bdb970 [OpenMP][FIX] Use clang++ for the C++ test case 2022-03-28 23:14:24 -05:00
Johannes Doerfert b316126887 [OpenMP][FIX] Avoid races in the handling of to be deleted mapping entries
If we decided to delete a mapping entry we did not act on it right away
but first issued and waited for memory copies. In the meantime some
other thread might reuse the entry. While there was some logic to avoid
colliding on the actual "deletion" part, there were two races happening:

1) The data transfer back of the thread deleting the entry and
   the data transfer back of the thread taking over the entry raced.
2) The update to the shadow map happened regardless if the entry was
   actually reused by another thread which left the shadow map in a
   inconsistent state.

To fix both issues we will now update the shadow map and delete the
entry only if we are sure the thread is responsible for deletion, hence
no other thread took over the entry and reused it. We also wait for a
potential former data transfer from the device to finish before we issue
another one that would race with it.

Fixes https://github.com/llvm/llvm-project/issues/54216

Differential Revision: https://reviews.llvm.org/D121058
2022-03-28 22:33:18 -05:00
Johannes Doerfert ba93e4e33e [OpenMP][NFC] Add missing virtual destructor to silence warning 2022-03-28 22:33:18 -05:00
Johannes Doerfert 7df2eba7fa [Attributor][OpenMP] Add assumption for non-call assembly instructions
Inline assembly is scary but we need to support it for the OpenMP GPU
device runtime. The new assumption expresses the fact that it may not
have call semantics, that is, it will not call another function but
simply perform an operation or side-effect. This is important for
reachability in the presence of inline assembly.

Differential Revision: https://reviews.llvm.org/D109986
2022-03-28 20:57:52 -05:00
Shilei Tian 545fcc3d84 [OpenMP][CUDA] Fix potential program crash caused by double free resources
As we mentioned in the code comments for function `ResourcePoolTy::release`,
at some point there could be two identical resources on the two sides of `Next`
mark. It is usually not an issue, unless the following case:
1. Some resources are not returned.
2. We need to iterate the pool and free the element.

That will cause double free, which is the case for event pool. Since we don't release
events hold by the data map, it can happen that the `Next` mark is not reset, and
we have two identical items in the pool. When the pool is destroyed, we will call
`cuEventDestroy` twice on the same event. In the best case, we can only observe
CUDA errors. In the worst case, it can cause internal failures in CUDART and further
crash.

This patch fixes the issue by tracking all resources that have been given using
an `unordered_set`. We don't remove it when a resource is returned. When the pool
is destroyed, we merge the pool (a `vector`) and the set. In this way, we can make
sure that the set contains all resources allocated from the device. We just need
to iterate the set and free the resource accordingly.

For now, only event pool is set to use it. Stream pool is not because we can make
sure all streams are returned when the plugin is destroyed.

Someone might be wondering, why don't we release all events hold in the data map.
That is because, plugins are determined to be destroyed *before* `libomptarget`.
If we can somehow make the plugin outlast `libomptarget`, life will be much
easier.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D122014
2022-03-25 22:49:32 -04:00
Joseph Huber 9d3550c517 [OpenMP] Add AMDGPU calling convention to ctor / dtor functions
This patch adds the necessary AMDGPU calling convention to the ctor /
dtor kernels. These are fundamentally device kenels called by the host
on image load. Without this calling convention information the AMDGPU
plugin is unable to identify them.

Depends on D122504

Fixes #54091

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D122515
2022-03-25 22:44:20 -04:00
Johannes Doerfert 6c2be885ff Revert "[OpenMP][NFC] Add missing virtual destructor to silence warning"
This reverts commit b9fd8f34ae as it
accidentally contained a unit test change that is not finished (and
unrelated).
2022-03-25 16:07:11 -05:00
Johannes Doerfert 7dfad948f1 [OpenMP][FIX] Repair ExclusiveAccess move semantic snafu 2022-03-25 16:00:53 -05:00
Johannes Doerfert b9fd8f34ae [OpenMP][NFC] Add missing virtual destructor to silence warning 2022-03-25 16:00:53 -05:00
Johannes Doerfert 4e34f061d6 [OpenMP][FIX] Ensure exclusive access to the HDTT map
This patch solves two problems with the `HostDataToTargetMap` (HDTT
map) which caused races and crashes before:

1) Any access to the HDTT map needs to be exclusive access. This was not
   the case for the "dump table" traversals that could collide with
   updates by other threads. The new `Accessor` and `ProtectedObject`
   wrappers will ensure we have a hard time introducing similar races in
   the future. Note that we could allow multiple concurrent
   read-accesses but that feature can be added to the `Accessor` API
   later.
2) The elements of the HDTT map were `HostDataToTargetTy` objects which
   meant that they could be copied/moved/deleted as the map was changed.
   However, we sometimes kept pointers to these elements around after we
   gave up the map lock which caused potential races again. The new
   indirection through `HostDataToTargetMapKeyTy` will allows us to
   modify the map while keeping the (interesting part of the) entries
   valid. To offset potential cost we duplicate the ordering key of the
   entry which avoids an additional indirect lookup.

We should replace more objects with "protected objects" as we go.

Differential Revision: https://reviews.llvm.org/D121057
2022-03-25 11:38:54 -05:00
Joseph Huber a619072c61 [OpenMP] Manually unroll the argument copy loop
The unroll pragma did not properly work as the loop bound was not known
when we optimize the runtime and we then added a "unroll disable"
metadata which prevented unrolling later when the bounds were known.
For now we manually unroll to make sure up to 16 elements are handled
nicely. This helps optimizations to look through the argument passing.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109164
2022-03-21 20:54:11 -04:00
Stanislav Mekhanoshin e0b9364b5c [AMDGPU] Add gfx90a and gfx940 to get_elf_mach_gfx_name.cpp
Differential Revision: https://reviews.llvm.org/D120849
2022-03-17 13:05:07 -07:00
Jon Chesterfield 75779435f3 [nfc][openmp] Swap arguments to remove noise from upcoming diff 2022-03-11 23:08:37 +00:00
Shilei Tian f6639a424b [OpenMP][CUDA] Fix the check of `setContext` 2022-03-09 18:48:44 -05:00
Shilei Tian 39d3283a08 [OpenMP][CUDA] Avoid calling `cuCtxSetCurrent` redundantly
Currently we set ccontext everywhere accordingly, but that causes many
unnecessary function calls. For example, in the resource pool, if we need to
resize the pool, we need to get from allocator. Each call to allocate sets the
current context once, which is unnecessary. In this patch, we set the context
only in the entry interface functions, if needed. Actually in the best way this
should be implemented via RAII, but since `cuCtxSetCurrent` could return error,
and we don't use exception, we can't stop the execution if RAII fails.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D121322
2022-03-09 16:32:47 -05:00
Shilei Tian 5105c7cd78 [OpenMP][CUDA] Fix an issue that multiple `CUmodule` are could be overwritten
This patch fixes the issue introduced in 14de0820e8 and D120089, that
if dynamic libraries are used, the `CUmodule` array could be overwritten.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D121308
2022-03-09 14:55:20 -05:00
Mark Dewing 7e0b0e05af [OpenMP][doc]Minor doc fixes
In SupportAndFAQ.rst, add blank lines before and after a bullet list and
sublist.  This avoids an "Unepxected indentation" warning.

In Runtimes.rst, adjust the suggestion for setting LIBOMPTARGET_INFO.
The right shifts are not necessary as the bit mask values are already
correct.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119595
2022-03-09 14:54:42 -05:00
Johannes Doerfert 14de0820e8 [OpenMP][FIX] Ensure the modules vector is filled as others are
The modules vector was for some reason special which could lead to it
not being of the same size (=num devices). Easiest solution is to treat
it like we do all the other vectors.
2022-03-08 23:45:43 -06:00
Joseph Huber eae306f52c [OpenMP][Docs] Make copy pasting remarks easier 2022-03-08 16:54:12 -05:00
Johannes Doerfert 1660288b28 [OpenMP][CUDA] Use one event pool per device
An event pool, similar to the stream pool, needs to be kept per device.
For one, events are associated with cuda contexts which means we cannot
destroy the former after the latter. Also, CUDA documentation states
streams and events need to be associated with the same context, which
we did not ensure at all.

Differential Revision: https://reviews.llvm.org/D120142
2022-03-07 23:43:05 -06:00