Commit Graph

2227 Commits

Author SHA1 Message Date
Johannes Doerfert ff50e81b50 [OpenMP][NFCI] Use RAII lock guards in libomptarget where possible
Differential Revision: https://reviews.llvm.org/D121060
2022-03-06 19:59:23 -06:00
James Beddek 2d0c9b64a0 [OpenMP][CMake] Ensure linking against libm for Linux
Do the same as is done for NetBSD. Some compiler-rt/lib/builtins files call
libm functions (e.g. fmaxl, fabs). Linking libomp with --rtlib=compiler-rt
references these functions.
Downstream report: https://bugs.gentoo.org/816831

Fixes: https://github.com/llvm/llvm-project/issues/51457
2022-03-05 20:20:28 -08:00
Shilei Tian 7f7c2c34b6 [OpenMP][CMake] Clean up the CMake variable `LIBOMPTARGET_LLVM_INCLUDE_DIRS`
`LIBOMPTARGET_LLVM_INCLUDE_DIRS` is currently checked and included for
multiple times redundantly. This patch is simply a clean up.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D121055
2022-03-05 22:37:59 -05:00
AndreyChurbanov 1fbdb03b1d [OpenMP] libomp: omp_in_explicit_task() implemented.
Differential Revision: https://reviews.llvm.org/D120671
2022-03-05 21:46:39 +03:00
Joseph Huber e2dcc2218c [Libomptarget] Work around bug in initialization of libomptarget
Libomptarget uses some shared variables to track certain internal stated
in the runtime. This causes problems when we have code that contains no
OpenMP kernels. These variables are normally initialized upon kernel
entry, but if there are no kernels we will see no initialization.
Currently we load the runtime into each source file when not running in
LTO mode, so these variables will be erroneously considered undefined or
dead and removed, causing miscompiles. This patch temporarily works
around the most obvious case, but others still exhibit this problem. We
will need to fix this more soundly later.

Fixes #54208.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D121007
2022-03-04 13:13:31 -05:00
Aakanksha 840695814a [AMDGPU] Add gfx1036 target
Differential Revision: https://reviews.llvm.org/D120846
2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin 2e2e64df4a [AMDGPU] Add gfx940 target
This is target definition only.

Differential Revision: https://reviews.llvm.org/D120688
2022-03-02 13:54:48 -08:00
Malhar Jajoo 6d658f37a4 [Openmp]: Missing import statement in openmp interface for Fortran
Essentially removed the "use omp_lib_kinds" statement and replaced it
with import to maintain consistency (and avoid compilation error
in case the omp_lib_kinds.mod file is not accessible) in header file.

The import is required to access entities in host scoping unit.

Differential Revision: https://reviews.llvm.org/D120707
2022-03-01 17:33:06 +00:00
Shilei Tian 75812e7704 [OpenMP][Offloading] Change N back to 256 in bug49334.cpp 2022-02-23 16:10:35 -05:00
Joseph Huber 5dd0c39638 [Libomptarget][NFC} Fix missing newline in error message 2022-02-23 08:10:16 -05:00
Carlo Bertolli 7b731f4d0b [OpenMP][libomptarget] Delay restore of shadow pointers in structs to after H2D memory copies are completed
When using asynchronous plugin calls, shadow pointer restore could happen before the D2H copy for the entire struct has completed, effectively leaving a device pointer in a host struct.
This patch fixes the problem by delaying restore's to after a synchronization happens (target regions) and by calling early synchronization (target update).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119968
2022-02-18 10:09:10 -06:00
Joseph Huber 0870a4f59a [OpenMP] Add flag for disabling thread state in runtime
The runtime uses thread state values to indicate when we use an ICV or
are in nested parallelism. This is done for OpenMP correctness, but it
not needed in the majority of cases. The new flag added is
`-fopenmp-assume-no-thread-state`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120106
2022-02-18 08:35:05 -05:00
Shilei Tian 092a5bb72b [OpenMP][Offloading] Fix test case issues in bug49334.cpp
`bug49334.cpp` has one issue that causes flaky result reported in #53730.
The root cause is `BlockedC` is never initialized but in `BlockMatMul_TargetNowait`
it is directly read and written (via `+=`). Fixes #53730.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D119988
2022-02-17 10:22:48 -05:00
Johannes Doerfert 57b4c5267b [OpenMP][FIX] Eliminate race on the IsSPMD global
The `IsSPMD` global can only be read by threads other than the main
thread *after* initialization is complete. To allow usage of
`mapping::getBlockSize` before initialization is done, we can pass the
`IsSPMD` state explicitly. This is similar to other APIs that take
`IsSPMD` explicitly to avoid such a race, e.g.,
`mapping::isInitialThreadInLevel0(IsSPMD)`

Fixes https://github.com/llvm/llvm-project/issues/53857
2022-02-16 14:44:20 -06:00
Joseph Huber 777039a51c [Libomptarget] Run CPU offloading tests using the new driver
This patch adds a new target to the OpenMP CPU offloading tests. This
tests the usage of the new driver for CPU offloading. If this all works
then we can move to transition to the new driver as the default.

Depends on D119613

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119736
2022-02-15 15:05:32 -05:00
Joseph Huber 48e3dcecc4 [Libomptarget][NFC] Remove constexpr to hide warnings
Currently whenever we compile the device runtime we get the following
'Mapping.cpp:32:32: warning: inline function '_OMP::impl::getGridValue'
is not defined [-Wundefined-inline]' warning. This can be silenced by
removing the constexpr attribute for this function. Doing this doesn't
change the generated bitcode at all but prevents the screen from getting
filled with warnings whenver we build the runtime.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119747
2022-02-14 15:34:18 -05:00
Jonathan Peyton 1234011b80 [OpenMP][libomp] Introduce oneAPI compiler support
Introduce KMP_COMPILER_ICX macro to represent compilation with oneAPI
compiler.

Fixup flag detection and compiler ID detection in CMake. Older CMake's
detect IntelLLVM as Clang.

Fix compiler warnings.

Fixup many of the tests to have non-empty parallel regions as they are
elided by oneAPI compiler.
2022-02-14 14:10:33 -06:00
Shilei Tian c27f530d4c [OpenMP][Offloading] Fix infinite loop in applyToShadowMapEntries
This patch fixes the issue that the for loop in `applyToShadowMapEntries`
is infinite because `Itr` is not incremented in `CB`. Fixes #53727.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119471
2022-02-12 22:02:53 -05:00
AndreyChurbanov cb1bee4725 [OpenMP] libomp: fix UB when LIBOMP_NUM_HIDDEN_HELPER_THREADS=1.
The __kmp_hidden_helper_threads_num set to N+1 if user requested N threads.
Thus number of worker hidden helper threads corresponds to user request,
main thread of helper team excluded as it does not participate in actual work.
This also fixes divide-by-0 issue in the code.

Fixes #48656

Differential Revision: https://reviews.llvm.org/D119586
2022-02-12 03:00:38 +03:00
AndreyChurbanov d84dedc7d3 [OpenMP] libomp: fix bug in implementation of distribute construct.
Fixed mistaken iterations distribution between different target regions.

Differential Revision: https://reviews.llvm.org/D118393
2022-02-11 17:34:26 +03:00
Shilei Tian 702a976c12 [OpenMP][Offloading] Change the way to compare floating point values in bug49334.cpp
`bug49334.cpp` directly uses `!=` to compare two floating point values,
which is almost wrong.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D119485
2022-02-10 18:20:36 -05:00
Shilei Tian aca33b0b37 [OpenMP][CUDA] Remove the hard team limit
Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as it is larger than that hard limit, the final number to launch the kernel will always be that hard limit. It is way less than the actual hardware limit. For example, my workstation has GTX2080, and the hardware limit of grid size is 2147483647, which is exactly the largest number a `int32_t` can represent. There is no limitation mentioned in the spec. This patch simply removes it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119313
2022-02-10 18:07:46 -05:00
Ye Luo 59ad9650cf [Libomptarget][AMDGCN] add gfx90c target
Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D119478
2022-02-10 15:55:44 -06:00
Shilei Tian f6685f7746 [OpenMP][CUDA] Refine the logic to determine grid size
This patch refines the logic to determine grid size as previous method
can escape the check of whether `CudaBlocksPerGrid` could be greater than the actual
hardware limit.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119311
2022-02-10 14:13:32 -05:00
Joseph Huber 9582f09690 [Libomptarget] Increase stack size for bug49779 test
The 'bug49779.cpp' test has been failing recently. This is because the
runtime is sufficiently complex when using nested parallelism without
optimizations that the CUDA tools cannot statically determine the stack
size. Because of this the kernel can exceed the thread stack size and
crash. Work around this using the 'LIBOMPTARGET_STACK_SIZE' environment
variable and add an FAQ entry for this situation.

Fixes #53670

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D119357
2022-02-09 15:37:23 -05:00
Jonathan Peyton 6be7c21b57 [OpenMP][libomp] Replace accidental VLA with KMP_ALLOCA
MSVC does not support variable length arrays. Replace with KMP_ALLOCA
which is already used in the same file for stack-allocated variables.
2022-02-09 08:09:27 -06:00
Joseph Huber 99d72ebddf [Libomptarget] Add header files as a dependency to CMake target
This patch manually adds the runtime include files to the list of
dependencies when we build the bitcode runtime library. Previously if
only the header was changed we would not recompile the source files.
The solution used here isn't optimal because every source file not has a
dependency on each header file regardless of if it was actually used by
that file.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D119254
2022-02-08 12:09:59 -05:00
Joseph Huber f8ffac5987 [OpenMP] Enable new driver tests for AMDGPU
This patch enables running the new driver tests for AMDGPU. Previously
this was disabled because some tests failed. This was only because the
new driver tests hadn't been listed as unsupported or expected to fail.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D119240
2022-02-08 09:55:29 -05:00
Joseph Huber d28051c4ab [Libomptarget] Replace Value RAII with default value
This patch replaces the ValueRAII pointer with a default 'nullptr'
value. Previously this was initialized as a reference to an existing
variable. The use of this variable caused overhead as the compiler could
not look through the uses and determine that it was unused if 'Active'
was not set. Because of this accesses to the variable would be left in
the runtime once compiled.

Fixes #53641

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119187
2022-02-07 17:12:00 -05:00
Igor Kirillov 4ae885b1e3 [OpenMP] kmp_atomic_float10_max_min.c test should only be executed on x86 platform
Differential Revision: https://reviews.llvm.org/D118988
2022-02-07 10:07:54 +00:00
Joseph Huber 034adaf5be [OpenMP] Completely remove old device runtime
This patch completely removes the old OpenMP device runtime. Previously,
the old runtime had the prefix `libomptarget-new-` and the old runtime
was simply called `libomptarget-`. This patch makes the formerly new
runtime the only runtime available. The entire project has been deleted,
and all references to the `libomptarget-new` runtime has been replaced
with `libomptarget-`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D118934
2022-02-04 15:31:33 -05:00
Joseph Huber b4be18219e [Libomptarget] Remove AMDGPU XFAIL from test
Summary;
This test should pass now with AMDGPU. Previously the symbols were
hidden and would fail when read.
2022-02-04 13:40:03 -05:00
Tom Stellard a2601c9887 Bump the trunk major version to 15 2022-02-01 23:54:52 -08:00
Jon Chesterfield f52927c122 Revert "[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned"
This seems to be the root cause of hangs on amdgpu. Reverting while investigating.
This reverts commit 7b9844cc8d.
2022-02-01 14:56:59 +00:00
Jon Chesterfield 8b7e99c41d [openmp] Disable tests that presently hang on CI 2022-02-01 13:01:35 +00:00
Johannes Doerfert 7b9844cc8d [OpenMP][FIX] Explicit barriers in SPMD mode are not aligned
Due to num_threads (probably also other reasons) we cannot assume
explicit barriers are always executed by all threads in an aligned
fashion. We can optimize them if that property can be proven but
that is different.
2022-02-01 01:10:52 -06:00
Johannes Doerfert 3c8a4c6f47 [OpenMP] Eliminate redundant barriers in the same block
Patch originally by Giorgis Georgakoudis (@ggeorgakoudis), typos and
bugs introduced later by me.

This patch allows us to remove redundant barriers if they are part
of a "consecutive" pair of barriers in a basic block with no impacted
memory effect (read or write) in-between them. Memory accesses to
local (=thread private) or constant memory are allowed to appear.
Technically we could also allow any other memory that is not used to
share information between threads, e.g., the result of a malloc that
is also not captured. However, it will be easier to do more reasoning
once the code is put into an AA. That will also allow us to look through
phis/selects reasonably. At that point we should also deal with calls,
barriers in different blocks, and other complexities.

Differential Revision: https://reviews.llvm.org/D118002
2022-02-01 01:07:50 -06:00
Joseph Huber 4d4587d5b0 [OpenMP] Remove new driver tests for AMDGPU
Some of the new driver tests are flaky on AMDGPU, remove for now.
2022-01-31 23:32:33 -05:00
Joseph Huber 0ac799b5c9 [Libomptarget] Run GPU offloading tests using the new drvier
This patch adds a new target to the tests to run using the new driver as
the method for generating offloading code.

Depends on D116541

Differential Revision: https://reviews.llvm.org/D118637
2022-01-31 23:11:43 -05:00
Joachim Protze 0fd5f6964d [OpenMP][tests][NFC] Pin debug info to DWARF v4 for libarcher tests
Temporary solution for #53467, since debian test machines do not support
DWARF v5.
2022-01-31 22:55:29 +01:00
Joseph Huber ad0a306a38 [OpenMP][NFC] Change error message on offloading failure to mention documentation
This patch changes the error message to instead mention the
documentation page for the debugging options provided by libomptarget
and the bitcode runtimes. Add some extra information to the documentation to
help users more quickly identify debugging resources.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118626
2022-01-31 15:19:52 -05:00
Joseph Huber fd5853dae6 [Libomptarget] Reduce shared memory stack size to 512 and a message when it is exceeded
Reduces the shared memory size used for globalization to 512 bytes from
2048 to reduce the pressure on shared memory. This patch ado adds a
debug mesage to indicate when the shared memory was insufficient.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118625
2022-01-31 15:19:48 -05:00
Jon Chesterfield 9b9d08111b Set rpath on openmp executables
Openmp executables need to find libomp and libomptarget at runtime.
This currently requires LD_LIBRARY_PATH or the user to specify rpath. Change
that to set the expected location of the openmp libraries in the install tree.

Whether rpath means rpath or runpath is system dependent. The attached test
shows that the Wl,--disable-new-dtags control interacts correctly with this feature.

The implicit rpath field is appended to any user specified ones which is ideal.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D118493
2022-01-31 16:35:00 +00:00
Jon Chesterfield a841a3a579 Revert "Set rpath on openmp executables"
Failed some buildbots, bad assumptions about structure of install path

This reverts commit a80d5c34e4.
2022-01-31 16:18:03 +00:00
Jon Chesterfield a80d5c34e4 Set rpath on openmp executables
Openmp executables need to find libomp and libomptarget at runtime.
This currently requires LD_LIBRARY_PATH or the user to specify rpath. Change
that to set the expected location of the openmp libraries in the install tree.

Whether rpath means rpath or runpath is system dependent. The attached test
shows that the Wl,--disable-new-dtags control interacts correctly with this feature.

The implicit rpath field is appended to any user specified ones which is ideal.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D118493
2022-01-31 16:01:08 +00:00
John Ericson 368c54b81a [openmp][cmake] `CMAKE_INSTALL_BINDIR` usage should not be quoted
As @mstorsjo wrote in https://reviews.llvm.org/D117945#inline-1132920 :

> This change seems to have broken one aspect: When doing `ninja
install` I now get a warning saying `Error copying file "libomp.dll" to
"libiomp5md.dll".`, and `libiomp5md.dll` isn't installed.
>
> I believe the reason is that the inline cmake snippet is written to
`runtime/src/cmake_install.cmake` and then executed on install, but on
install, `${CMAKE_INSTALL_BINDIR}` isn't set (as `GNUInstallDirs` isn't
included there). Should this maybe expand `${CMAKE_INSTALL_BINDIR}`
right here instead of deferring it to the install cmake, or what's the
right course of action?

I agree that is the right course of action. We also agreed to restore the `CMAKE_INSTALL_PREFIX` that was there before, too.

Reviewed By: mstorsjo

Differential Revision: https://reviews.llvm.org/D118528
2022-01-29 23:52:50 +00:00
Shilei Tian d4d0ae628c [OpenMP] Fix link error on Windows caused by `interop` functions
This patch fixes the link error on Windows caused by `interop`
functions.

Reviewed By: mstorsjo

Differential Revision: https://reviews.llvm.org/D118524
2022-01-29 11:29:29 -05:00
Shilei Tian 184f94a8a8 [OpenMP] Fix wrong number in openmp/runtime/src/dllexports
This patch fixes the issue that numbers assigned to `interop` functions were already taken in `openmp/runtime/src/dllexports`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118523
2022-01-29 00:23:08 -05:00
Ye Luo bafb6f3e9c [OpenMP] disable build of old nvptx device runtime
Fully respect LIBOMPTARGET_BUILD_NVPTX_BCLIB. There is no CUDA toolchain dependency. Complement D118268.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118522
2022-01-28 21:25:48 -06:00
Ron Lieberman 619f44b0ed Revert "[OpenMP] Ensure broken assumptions print once, not thousands of times."
This reverts commit 27c799ecc9.
2022-01-28 01:41:10 +00:00
Joseph Huber 27c799ecc9 [OpenMP] Ensure broken assumptions print once, not thousands of times.
If we have a broken assumption we want to print a message to the user.
If the assumption is broken by many threads in many teams this can
become a problem. To avoid it we use a hash that tracks if a broken
assumption has (likely) been printed and avoid printing it again. This
is not fool proof and has some caveats that might cause problems in
the future (see comment) but it should improve the situation
considerably for now.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D112156
2022-01-27 18:43:45 -05:00
Johannes Doerfert 1e12156896 [OpenMP][NFCI] Pipe the IdentTy object through more new RT functions
IdentTy objects are useful for debugging and profiling so we want to
keep them around in more places, especially those that have a large
impact on performance, e.g., everything related to state.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112494
2022-01-27 15:36:55 -05:00
Sri Hari Krishna Narayanan f44e41af41 Runtime for Interop directive
This implements the runtime portion of the interop directive.
It expects the frontend and IRBuilder portions to be in place
for proper execution. It currently works only for GPUs
and has several TODOs that should be addressed going forward.

Reviewed By: RaviNarayanaswamy

Differential Revision: https://reviews.llvm.org/D106674
2022-01-27 15:16:24 -05:00
Jon Chesterfield e08f3bfe58 [openmp] Disable build of old runtimes by default
The old runtime is not tested by CI. Disable the build prior to the llvm-14 branch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118268
2022-01-26 19:17:31 +00:00
Malhar Jajoo c1988dbf2d [openmp] Allow x87 fp functions only in Openmp runtime for x86.
This patch allows Openmp runtime atomic functions operating on x87 high-precision
to be present only in Openmp runtime for x86 architectures

The functions affected are:

__kmpc_atomic_10
__kmpc_atomic_20
__kmpc_atomic_cmplx10_add
__kmpc_atomic_cmplx10_div
__kmpc_atomic_cmplx10_mul
__kmpc_atomic_cmplx10_sub
__kmpc_atomic_float10_add
__kmpc_atomic_float10_div
__kmpc_atomic_float10_mul
__kmpc_atomic_float10_sub

__kmpc_atomic_float10_add_fp
__kmpc_atomic_float10_div_fp
__kmpc_atomic_float10_mul_fp
__kmpc_atomic_float10_sub_fp
__kmpc_atomic_float10_max
__kmpc_atomic_float10_min

Differential Revision: https://reviews.llvm.org/D117473
2022-01-22 22:09:44 +00:00
John Ericson 0a6b4258ab [openmp][cmake] Use `GNUInstallDirs` to support custom installation dirs
I am breaking apart D99484 so the cause of build failures is easier to
understand.

Differential Revision: https://reviews.llvm.org/D117945
2022-01-22 18:05:36 +00:00
Joseph Huber 26feef0846 [Libomptarget] Change visibility to hidden for device RTL
This patch changes the visibility for all construct in the new device
RTL to be hidden by default. This is done after the changes introduced
in D117806 changed the visibility from being hidden by default for all
device compilations. This asserts that the visibility for the device
runtime library will be hidden except for the internal environment
variable. This is done to aid optimization and linking of the device
library.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D117807
2022-01-20 21:06:28 -05:00
Johannes Doerfert b0789a1b12 [OpenMP] Avoid costly shadow map traversals whenever possible
In the OpenMC app we saw `omp target update` spending an awful lot of
time in the shadow map traversal without ever doing any update there.
There are two cases that allow us to avoid the traversal completely.
The simplest thing is that small updates cannot (reasonably) contain
an attached pointer part. The other case requires to track in the
mapping table if an entry might contain an attached pointer as part.
Given that we have a single location shadow map entries are created,
the latter is actually fairly easy as well.

Differential Revision: https://reviews.llvm.org/D113124
2022-01-19 22:14:41 -06:00
Johannes Doerfert 1e447d03e2 [OpenMP] Introduce an environment variable to disable atomic map clauses
Atomic handling of map clauses was introduced to comply with the OpenMP
standard (see D104418). However, many apps won't need this feature which
can be costly in certain situations. To allow for applications to
opt-out we now introduce the `LIBOMPTARGET_MAP_FORCE_ATOMIC` environment
flag that voids the atomicity guarantee of the standard for map clauses
again, shifting the burden to the user.

This patch also de-duplicates the code that introduces the events used
to enforce atomicity as a cleanup.

Differential Revision: https://reviews.llvm.org/D117627
2022-01-19 22:14:41 -06:00
Joseph Huber 28d718602a [OpenMP] Expand short verisions of OpenMP offloading triples
The OpenMP offloading libraries are built with fixed triples and linked
in during compile time. This would cause un-helpful errors if the user
passed in the wrong expansion of the triple used for the bitcode
library. because we only support these triples for OpenMP offloading we
can normalize them to the full verion used in the bitcode library.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D117634
2022-01-19 20:26:37 -05:00
Jon Chesterfield ce8f365884 [openmp] Always pass valid triple to openmp-targets when using newRTL
Previously, we sometimes pass fopenmp-targets=nvptx64-nvidia-cuda-newRTL

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D117715
2022-01-19 22:07:22 +00:00
Jon Chesterfield 8baf4ba890 [openmp][amdgpu] Remove xfail from test using declare target variable 2022-01-19 15:55:37 +00:00
Jon Chesterfield ca84c43d69 [openmp][amdgpu] Disable tests on old runtime, enable tests on new one 2022-01-19 15:49:47 +00:00
Jon Chesterfield e35c8f541c [openmp][amdgpu] Temporarily disable tests on old runtime 2022-01-19 15:39:00 +00:00
Joseph Huber 4863fed933 [Libomptarget] Fix external visibility for internal variables
After the changes in D117362 made variables declared inside of a target
declare directive visible outside the plugin, some variables inside the
runtime were given visiblity that conflicted with their address space
type. This caused problems when shared or local memory was made
externally visible. This patch fixes this issue by making these
varialbes static within the module, therefore limiting their visibility
to being internal.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D117526
2022-01-18 18:19:57 -05:00
Joseph Huber 138cc5a001 Revert "[Libomptarget] Fix external visibility for internal variables"
Reverting to investigate break on AMDGPU. This reverts commit
0203ff1960.
2022-01-18 14:44:11 -05:00
Joseph Huber 0203ff1960 [Libomptarget] Fix external visibility for internal variables
After the changes in D117362 made variables declared inside of a target
declare directive visible outside the plugin, some variables inside the
runtime were given visiblity that conflicted with their address space
type. This caused problems when shared or local memory was made
externally visible. This patch fixes this issue by making these
varialbes static within the module, therefore limiting their visibility
to being internal.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D117526
2022-01-18 12:53:24 -05:00
Terry Wilmarth 2e02579a76 [OpenMP] Add use of TPAUSE
Add use of TPAUSE (from WAITPKG) to the runtime for Intel hardware,
with an envirable to turn it on in a particular C-state.  Always uses
TPAUSE if it is selected and enabled by Intel hardware and presence of
WAITPKG, and if not, falls back to old way of checking
__kmp_use_yield, etc.

Differential Revision: https://reviews.llvm.org/D115758
2022-01-18 10:14:32 -06:00
Joseph Huber 4869a22d1d [Libomptarget] Add `cold` to KeepAlive attributes
This patch adds the `cold` attribute to the keepAlive functions in the
RTL. This dummy function exists to keep certain RTL calls alive without
them being optimized out, but it is never called and can be declared
cold. This also helps some erroneous remarks being given on this
function because it has weak linkage and cannot be made internal.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D117513
2022-01-17 17:29:26 -05:00
John Ericson da77db58d7 Revert "[cmake] Use `GNUInstallDirs` to support custom installation dirs."
https://lab.llvm.org/buildbot/#/builders/46/builds/21146 Still have
this odd error, not sure how to reproduce, so I will just try breaking
up my patch.

This reverts commit 4a678f8072.
2022-01-16 05:48:30 +00:00
John Ericson 4a678f8072 [cmake] Use `GNUInstallDirs` to support custom installation dirs.
This is the original patch in my GNUInstallDirs series, now last to merge as the final piece!

It arose as a new draft of D28234. I initially did the unorthodox thing of pushing to that when I wasn't the original author, but since I ended up

 - Using `GNUInstallDirs`, rather than mimicking it, as the original author was hesitant to do but others requested.

 - Converting all the packages, not just LLVM, effecting many more projects than LLVM itself.

I figured it was time to make a new revision.

I have used this patch series (and many back-ports) as the basis of https://github.com/NixOS/nixpkgs/pull/111487 for my distro (NixOS), which was merged last spring (2021). It looked like people were generally on board in D28234, but I make note of this here in case extra motivation is useful.

---

As pointed out in the original issue, a central tension is that LLVM already has some partial support for these sorts of things. Variables like `COMPILER_RT_INSTALL_PATH` have already been dealt with. Variables like `LLVM_LIBDIR_SUFFIX` however, will require further work, so that we may use `CMAKE_INSTALL_LIBDIR`.

These remaining items will be addressed in further patches. What is here is now rote and so we should get it out of the way before dealing more intricately with the remainder.

Reviewed By: #libunwind, #libc, #libc_abi, compnerd

Differential Revision: https://reviews.llvm.org/D99484
2022-01-16 05:33:07 +00:00
John Ericson 6e52bfe09d Revert "[cmake] Use `GNUInstallDirs` to support custom installation dirs."
Sorry for the disruption, I will try again later.

This reverts commit efeb501970.
2022-01-15 07:35:02 +00:00
John Ericson efeb501970 [cmake] Use `GNUInstallDirs` to support custom installation dirs.
This is the original patch in my GNUInstallDirs series, now last to merge as the final piece!

It arose as a new draft of D28234. I initially did the unorthodox thing of pushing to that when I wasn't the original author, but since I ended up

 - Using `GNUInstallDirs`, rather than mimicking it, as the original author was hesitant to do but others requested.

 - Converting all the packages, not just LLVM, effecting many more projects than LLVM itself.

I figured it was time to make a new revision.

I have used this patch series (and many back-ports) as the basis of https://github.com/NixOS/nixpkgs/pull/111487 for my distro (NixOS), which was merged last spring (2021). It looked like people were generally on board in D28234, but I make note of this here in case extra motivation is useful.

---

As pointed out in the original issue, a central tension is that LLVM already has some partial support for these sorts of things. Variables like `COMPILER_RT_INSTALL_PATH` have already been dealt with. Variables like `LLVM_LIBDIR_SUFFIX` however, will require further work, so that we may use `CMAKE_INSTALL_LIBDIR`.

These remaining items will be addressed in further patches. What is here is now rote and so we should get it out of the way before dealing more intricately with the remainder.

Reviewed By: #libunwind, #libc, #libc_abi, compnerd

Differential Revision: https://reviews.llvm.org/D99484
2022-01-15 01:08:35 +00:00
Jon Chesterfield d53b979596 [openmp][devicertl] Handle missing clang_tool
Fixes github issues/52910

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D117230
2022-01-13 22:43:26 +00:00
Joseph Huber 4746e38f67 [Libomptarget] Fix multiply defined symbol during linking
This patch adds the `weak` identifier to the openmp device environment
variable. The changes introduced in https://reviews.llvm.org/D117211
result in multiply defined symbols. Because the symbol is potentially
included multiple times for each offloading file we will get symbol
colisions, and because it needs to have external visiblity it should be
weak.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D117231
2022-01-13 11:57:33 -05:00
Jon Chesterfield 4395608939 [openmp] Mark used variables as retain as well
D97446 changed the behaviour of 'used'. Compensate.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D117211
2022-01-13 13:57:32 +00:00
Jon Chesterfield a74826d30a [openmp][amdgpu] Replace unsigned long with uint64_t
Some types need to be 64 bit. Unsigned long is a hazard there.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D116963
2022-01-10 22:19:30 +00:00
Shilei Tian aab62aab04 [OpenMP][Offloading] Fixed a crash caused by dereferencing nullptr
In function `DeviceTy::getTargetPointer`, `Entry` could be `nullptr` because of
zero length array section. We need to check if it is a valid iterator before
using it.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D116716
2022-01-05 23:04:29 -05:00
Shilei Tian 9584c6fa2f [OpenMP][Offloading] Fixed data race in libomptarget caused by async data movement
The async data movement can cause data race if the target supports it.
Details can be found in [1]. This patch tries to fix this problem by attaching
an event to the entry of data mapping table. Here are the details.

For each issued data movement, a new event is generated and returned to `libomptarget`
by calling `createEvent`. The event will be attached to the corresponding mapping table
entry.

For each data mapping lookup, if there is no need for a data movement, the
attached event has to be inserted into the queue to gaurantee that all following
operations in the queue can only be executed if the event is fulfilled.

This design is to avoid synchronization on host side.

Note that we are using CUDA terminolofy here. Similar mechanism is assumped to
be supported by another targets. Even if the target doesn't support it, it can
be easily implemented in the following fall back way:
- `Event` can be any kind of flag that has at least two status, 0 and 1.
- `waitEvent` can directly busy loop if `Event` is still 0.

My local test shows that `bug49334.cpp` can pass.

Reference:
[1] https://bugs.llvm.org/show_bug.cgi?id=49940

Reviewed By: grokos, JonChesterfield, ye-luo

Differential Revision: https://reviews.llvm.org/D104418
2022-01-05 20:20:04 -05:00
RitanyaB 378b0ac179 SIGSEGV in ompt_tsan_dependences with for-ordered
Segmentation fault in ompt_tsan_dependences function due to an unchecked NULL pointer dereference is as follows:

```
ThreadSanitizer:DEADLYSIGNAL
	==140865==ERROR: ThreadSanitizer: SEGV on unknown address 0x000000000050 (pc 0x7f217c2d3652 bp 0x7ffe8cfc7e00 sp 0x7ffe8cfc7d90 T140865)
	==140865==The signal is caused by a READ memory access.
	==140865==Hint: address points to the zero page.
	/usr/bin/addr2line: DWARF error: could not find variable specification at offset 1012a
	/usr/bin/addr2line: DWARF error: could not find variable specification at offset 133b5
	/usr/bin/addr2line: DWARF error: could not find variable specification at offset 1371a
	/usr/bin/addr2line: DWARF error: could not find variable specification at offset 13a58
	#0 ompt_tsan_dependences(ompt_data_t*, ompt_dependence_t const*, int) /ptmp/bhararit/llvm-project/openmp/tools/archer/ompt-tsan.cpp:1004 (libarcher.so+0x15652)
	#1 __kmpc_doacross_post /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_csupport.cpp:4280 (libomp.so+0x74d98)
	#2 .omp_outlined. for_ordered_01.c:? (for_ordered_01.exe+0x5186cb)
	#3 __kmp_invoke_microtask /ptmp/bhararit/llvm-project/openmp/runtime/src/z_Linux_asm.S:1166 (libomp.so+0x14e592)
	#4 __kmp_invoke_task_func /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_runtime.cpp:7556 (libomp.so+0x909ad)
	#5 __kmp_fork_call /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_runtime.cpp:2284 (libomp.so+0x8461a)
	#6 __kmpc_fork_call /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_csupport.cpp:308 (libomp.so+0x6db55)
	#7 main ??:? (for_ordered_01.exe+0x51828f)
	#8 __libc_start_main ??:? (libc.so.6+0x24349)
	#9 _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120 (for_ordered_01.exe+0x4214e9)

	ThreadSanitizer can not provide additional info.
	SUMMARY: ThreadSanitizer: SEGV /ptmp/bhararit/llvm-project/openmp/tools/archer/ompt-tsan.cpp:1004 in ompt_tsan_dependences(ompt_data_t*, ompt_dependence_t const*, int)
	==140865==ABORTING
```

	To reproduce the error, use the following openmp code snippet:

```
/* initialise  testMatrixInt Matrix, cols, r and c */
	  #pragma omp parallel private(r,c) shared(testMatrixInt)
	    {
	      #pragma omp for ordered(2)
	      for (r=1; r < rows; r++) {
	        for (c=1; c < cols; c++) {
	          #pragma omp ordered depend(sink:r-1, c+1) depend(sink:r-1,c-1)
	          testMatrixInt[r][c] = (testMatrixInt[r-1][c] + testMatrixInt[r-1][c-1]) % cols ;
	          #pragma omp ordered depend (source)
	        }
	      }
	    }
```

	Compilation:
```
clang -g -stdlib=libc++ -fsanitize=thread -fopenmp -larcher test_case.c
```

	It seems like the changes introduced by the commit https://reviews.llvm.org/D114005 causes this particular SEGV while using Archer.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D115328
2022-01-03 11:23:57 -06:00
Shilei Tian 458db51c10 [OpenMP] Add missing `tt_hidden_helper_task_encountered` along with `tt_found_proxy_tasks`
In most cases, hidden helper task behave similar as detached tasks. That means,
for example, if we have to wait for detached tasks, we have to do the same thing
for hidden helper tasks as well. This patch adds the missing condition for hidden
helper task accordingly along with detached task.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D107316
2021-12-29 23:22:53 -05:00
Johannes Doerfert 73104ad65b [OpenMP][NFC] Move headers into include folder 2021-12-28 23:53:28 -06:00
Shilei Tian 943d1d83dd [OpenMP][CUDA] Add resource pool for CUevent
Following D111954, this patch adds the resource pool for CUevent.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D116315
2021-12-28 17:42:38 -05:00
Shilei Tian 357c8031ff [OpenMP][Plugin] Minor adjustments to ResourcePool
This patch makes some minor adjustments to `ResourcePool`:
- Don't initialize the resources if `Size` is 0 which can avoid assertion.
- Add a new interface function `clear` to release all hold resources.
- If initial size is 0, resize to 1 when the first request is encountered.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D116340
2021-12-28 16:11:03 -05:00
Joseph Huber 7cdaa5a94e [OpenMP][FIX] Change globalization alignment to 16
This patch changes the default aligntment from 8 to 16, and encodes this
information in the `__kmpc_alloc_shared` runtime call to communicate it
to the HeapToStack pass. The previous alignment of 8 was not sufficient
for the maximum size of primitive types on 64-bit systems, and needs to
be increaesd. This reduces the amount of space availible in the data
sharing stack, so this implementation will need to be improved later to
include the alignment requirements in the allocation call, and use it
properly in the data sharing stack in the runtime.

Depends on D115888

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115971
2021-12-27 16:58:25 -05:00
Shilei Tian a697a0a4b6 [OpenMP][Plugin] Introduce generic resource pool
Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now
we have the need that some resources, such as CUDA stream and event, will be
hold by `libomptarget`. It is always good to buffer those resources. What's more
important, given the way that `libomptarget` and plugins are connected, we cannot
make sure whether plugins are still alive when `libomptarget` is destroyed. That
leads to an issue that those resouces hold by `libomptarget` might not be
released correctly. As a result, we need an unified management of all the resources
that can be shared between `libomptarget` and plugins.

`ResourcePoolTy` is designed to manage the type of resource for one device.
It has to work with an allocator which is supposed to provide `create` and
`destroy`. In this way, when the plugin is destroyed, we can make sure that
all resources allocated from native runtime library will be released correctly,
no matter whether `libomptarget` starts its destroy.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D111954
2021-12-27 11:32:14 -05:00
Jonathan Peyton 6a556ecaf4 [OpenMP][libomp] Add use-all syntax to KMP_HW_SUBSET
This patch allows the user to request all resources of a particular
layer (or core-attribute). The syntax of KMP_HW_SUBSET is modified
so the number of units requested is optional or can be replaced with an
'*' character.

e.g., KMP_HW_SUBSET=c:intel_atom@3 will use all the cores after offset 3
e.g., KMP_HW_SUBSET=*c:intel_core will use all the big cores
e.g., KMP_HW_SUBSET=*s,*c,1t will use all the sockets, all cores per
      each socket and 1 thread per core.

Differential Revision: https://reviews.llvm.org/D115826
2021-12-20 13:45:21 -06:00
Jon Chesterfield 38af5b4fd1 [libomptarget][nfc] Refactor dlwrap.h for easier reuse in D115966 and upcoming patches 2021-12-17 22:28:31 +00:00
Jon Chesterfield 91dfb32f2f [openmp][amdgpu][nfc] Mark all external functions extern C to get type checking 2021-12-17 18:46:43 +00:00
Carlo Bertolli d3abb04e14 [OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter
I missed the async info parameter in the first version of this API.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115887
2021-12-17 15:58:18 +00:00
Carlo Bertolli d83dc4c648 [OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin
This patch extends the AMDGPU plugin for OpenMP target offloading from using a single HSA queue to multiple queues (four in this patch) per device. This enables concurrent threads to concurrently submit kernel launches to the same GPU.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115771
2021-12-15 15:33:17 +00:00
Jonathan Peyton 9769340905 [OpenMP][libomp] Fix compile errors with new KMP_HW_SUBSET changes
Add missing guards around x86-specific code.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D115664
2021-12-14 08:33:05 +01:00
John Ericson ddcc02dbcc Quote some more destination paths with variables
Just defensive CMake-ing. I pulled this from D115544 and D99484 which
are blocked on some lldb CI failures I don't yet understand. Hoping to land
something smaller in the meantime.

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D115566
2021-12-13 17:29:08 +00:00
Michael Kruse 77e019c233 [OpenMP] Add "not" to test dependencies.
The `not` program is used to test executions prefixed with `%libomptarget-run-fail-`. Currently `not` is not used for libomp tests, but might be used in the future and its dependency does not add any additional burden over the already established `FileCheck` dependency.

Required to add libomptarget testing to the Phabricator pre-merge check (see https://github.com/google/llvm-premerge-checks/issues/368)

Reviewed By: jdenny, JonChesterfield

Differential Revision: https://reviews.llvm.org/D115454
2021-12-12 10:52:53 -06:00
Med Ismail Bennani 30fc88bf1d Revert "Revert "Revert "Use `GNUInstallDirs` to support custom installation dirs. -- LLVM"""
This reverts commit 492de35df4.

I tried to apply John's changes in 8d897ec915 that were expected to
fix his patch but that didn't work unfortunately.

Reverting this again to fix the macOS bots and leave him more time to
investigate the issue.
2021-12-10 17:33:54 -08:00
John Ericson 492de35df4 Revert "Revert "Use `GNUInstallDirs` to support custom installation dirs. -- LLVM""
This reverts commit 797b50d4be.

See the original D99484. @mib who noticed the original problem could not longer
reproduce it, after I tried and also failed. We are threfore hoping it went
away on its own!

Reviewed By: mib

Differential Revision: https://reviews.llvm.org/D115544
2021-12-10 20:59:43 +00:00
Joseph Huber 8425bde82d Revert "[OpenMP] Avoid costly shadow map traversals whenever possible"
This reverts commit 7c8f4e7b85.
Fails a few OpenMP tests, causes a few updates to segfault.
2021-12-10 15:57:58 -05:00
Jonathan Peyton df20599597 [OpenMP][libomp] Add core attributes to KMP_HW_SUBSET
Allow filtering of resources based on core attributes. There are two new
attributes added:
1) Core Type (intel_atom, intel_core)
2) Core Efficiency (integer) where the higher the efficiency, the more
   performant the core
On hybrid architectures , e.g., Alder Lake, users can specify
KMP_HW_SUBSET=4c:intel_atom,4c:intel_core to select the first four Atom
and first four Big cores. The can also use the efficiency syntax. e.g.,
KMP_HW_SUBSET=2c:eff0,2c:eff1

Differential Revision: https://reviews.llvm.org/D114901
2021-12-10 14:34:33 -06:00
Joseph Huber 7c8f4e7b85 [OpenMP] Avoid costly shadow map traversals whenever possible
In the OpenMC app we saw `omp target update` spending an awful lot of
time in the shadow map traversal without ever doing any update there.
There are two cases that allow us to avoid the traversal completely.
The simplest thing is that small updates cannot (reasonably) contain
an attached pointer part. The other case requires to track in the
mapping table if an entry might contain an attached pointer as part.
Given that we have a single location shadow map entries are created,
the latter is actually fairly easy as well.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D113124
2021-12-10 14:33:18 -05:00
Carlo Bertolli 28309c5436 [OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous
and synchronous kernel launch implementations into a single
synchronous version.  This patch prepares the plugin for asynchronous
implementation by:

    Privatizing actual kernel launch code (valid in both cases) into
    an anonymous namespace base function (submitted at D115267)

    - Separating the control flow path of asynchronous and synchronous
      kernel launch functions** (this diff)

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115273
2021-12-10 19:21:05 +00:00