Commit Graph

2032 Commits

Author SHA1 Message Date
Jon Chesterfield 0fa45d6d80 Revert "[OpenMP] Lower printf to __llvm_omp_vprintf"
This reverts commit db81d8f6c4.
2021-11-08 20:28:57 +00:00
Jon Chesterfield dc9edc6a6d Revert "[openmp] Fix build, test passes on CI unexpectedly"
This reverts commit c499d690cd.
2021-11-08 20:28:52 +00:00
Jon Chesterfield c499d690cd [openmp] Fix build, test passes on CI unexpectedly 2021-11-08 18:45:27 +00:00
Jon Chesterfield db81d8f6c4 [OpenMP] Lower printf to __llvm_omp_vprintf
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

The exact set of changes to check-openmp probably needs revision before commit

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680
2021-11-08 18:38:00 +00:00
Quinn Pham c3b15b71ce [NFC] Inclusive Language: change master to main for .chm files
[NFC] As part of using inclusive language within the llvm project,
this patch replaces master with main when referring to `.chm` files.

Reviewed By: teemperor

Differential Revision: https://reviews.llvm.org/D113299
2021-11-08 08:23:04 -06:00
@t-msn 0808d956c4 [OpenMP] libomp: Fix handling of barrier pattern environment variables
It is better to set all barrier patterns to use "dist" when at least
one environment variable specifies "dist". Otherwise if only one
environment is set to "dist" and others left blank inadvertently,
it would result in mixing dist barrier with default hyper barrier
pattern.

Differential Revision: https://reviews.llvm.org/D112597
2021-11-08 15:01:26 +03:00
Jon Chesterfield 4f4c826e75 [libomptarget] Drop remote plugin cmake version requirement to match llvm
LLVM docs at https://llvm.org/docs/CMake.html#quick-start state 3.13.4

Reviewed By: atmnpatel

Differential Revision: https://reviews.llvm.org/D113271
2021-11-05 17:34:28 +00:00
Johannes Doerfert d4b1cf8f9c [OpenMP] Build device runtimes for sm_86
Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D113111
2021-11-04 17:54:59 -05:00
Johannes Doerfert ab9f3f5d25 [OpenMP] Introduce the keepAlive function into the old device RT
Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D113110
2021-11-04 17:54:56 -05:00
Johannes Doerfert 93bebdc78f [OpenMP][NFCI] Cleanup new device RT mapping interface
Minimize the `impl` interface and clean up some uses of mapping
functions.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D112154
2021-11-04 17:54:53 -05:00
Johannes Doerfert 73720c8059 [OpenMP][FIX] Introduce and use a simple generic-mode barrier
Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was
OK to be used in the custom state machine. Now that SPMD barriers are
assumed to be aligned we need to use a "generic" barrier in places
that are not aligned.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112893
2021-11-02 23:22:01 -05:00
Johannes Doerfert ccb5d2726a [OpenMP][FIX] Avoid a race between initialization and first state reads
When we pick state 0 to initialize state but thread N is going to be the
"main thread", in generic mode, we would require extra synchronization.
Instead, we should pick the main thread to initialize state in generic
mode and any thread in SPMD mode.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112874
2021-11-02 23:21:49 -05:00
Med Ismail Bennani 797b50d4be Revert "Use `GNUInstallDirs` to support custom installation dirs. -- LLVM"
This reverts commit 6fd2db04d0 since it
broke GreenDragon LLDB-Incremental bot:

https://green.lab.llvm.org/green/job/lldb-cmake/37560/console

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
2021-11-02 19:11:44 +01:00
John Ericson 6fd2db04d0 Use `GNUInstallDirs` to support custom installation dirs. -- LLVM
This is a new draft of D28234. I previously did the unorthodox thing of
pushing to it when I wasn't the original author, but since this version

- Uses `GNUInstallDirs`, rather than mimics it, as the original author
  was hesitant to do but others requested.

- Is much broader, effecting many more projects than LLVM itself.

I figured it was time to make a new revision.

I am using this patch (and many back-ports) as the basis of
https://github.com/NixOS/nixpkgs/pull/111487 for my distro (NixOS). It
looked like people were generally on board in D28234, but I make note of
this here in case extra motivation is useful.

---

As pointed out in the original issue, a central tension is that LLVM
already has some partial support for these sorts of things. For example
`LLVM_LIBDIR_SUFFIX`, or `COMPILER_RT_INSTALL_PATH`. Because it's not
quite clear yet what to do about those, we are holding off on changing
libdirs and `compiler-rt`. for this initial PR.

---

On the advice of @lebedev.ri, I am splitting this up a bit per
subproject, starting with LLVM. To allow it to be more easily reviewed. This and the subsequent patch must be landed together, as this will not build alone. But the rest can be landed on their own.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D100810
2021-11-02 10:23:30 -04:00
Shilei Tian 025f549240 [OpenMP][DeviceRTL] Fixed an issue that causes hang in SU3
The synchronization at the end of parallel region cannot make sure all threads
exit the scope. As a result, the assertions right after it might be hit, and
further the `state::assumeInitialState(IsSPMD)` in `__kmpc_target_deinit` may
not hold as well. We either add a synchronization right after the parallel region,
or remove the assertions and assuptions. Here we choose the first one as those
assertions and assumptions can help optimizations.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112861
2021-10-30 14:44:29 -04:00
Kazu Hirata 3cfc1757c5 Ensure newlines at the end of files (NFC) 2021-10-29 20:26:09 -07:00
Joseph Huber 927c74d4da [OpenMP] Fix assert macro expr
Summary:
A previous patch changed the check and mistakenly only did `!expr` when
this is a macro expansion and could only apply to the left side of an
expression.
2021-10-29 17:44:13 -04:00
Joseph Huber 2c6a4e5678 [OpenMP] Use the assertion formatting from assert.h
This patch changes the `assert_assume` function used for internal
assumptions in the device runtime to use a more standard formatting for
the assumption message.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112842
2021-10-29 16:44:01 -04:00
Joseph Huber 35f42340a2 [OpenMP][Docs] Add documentation for device RTL debugging
Add documentation for the debugging features in the OpenMP device
runtime library.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112010
2021-10-29 14:57:14 -04:00
Joseph Huber 6dd791bca8 [OpenMP] Check output of malloc in the device for debug
A common problem is the device running out of global heap memory and
crashing due to a nullptr dereference when using the data sharing stack.
This explicitly checks that a nullptr was not returned by malloc when
debugging field 1 is enabled.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112005
2021-10-29 14:57:12 -04:00
Joseph Huber 74f91741b6 [OpenMP] Use function tracing RAII for runtime functions.
This patch adds support for using function tracing features to track the
executino of runtime functions in the device runtime library. This is
enabled by first compiling the new runtime with
`-fopenmp-target-debug=3` and running with
`LIBOMPTARGET_DEVICE_RTL_DEBUG=3`. The output only tracks team 0 and
thread 0 so there isn't much output when using a generic region.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112002
2021-10-29 14:57:11 -04:00
Jon Chesterfield 4d50803ce4 [libomptarget] Build DeviceRTL for amdgpu
Passes same tests as the current deviceRTL. Includes cmake change from D111987.
CI is showing a different set of pass/fails to local, committing this
without the tests enabled by default while debugging that difference.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112227
2021-10-28 12:34:01 +01:00
Jon Chesterfield 22bd75be70 [openmp] Fix a git misfire in cf37a94c1e 2021-10-28 01:35:25 +01:00
Jon Chesterfield 6c7b203d1d Revert "[libomptarget] Build DeviceRTL for amdgpu"
- more tests failing on CI than failed locally when writing this patch

This reverts commit 33427fdb7b.
2021-10-28 01:01:53 +01:00
Jon Chesterfield cf37a94c1e [openmp] Add amdgpu impl missed from D112153 2021-10-28 00:55:53 +01:00
Jon Chesterfield 33427fdb7b [libomptarget] Build DeviceRTL for amdgpu
Passes same tests as the current deviceRTL. Includes cmake change from D111987.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112227
2021-10-28 00:41:45 +01:00
Johannes Doerfert 48877525cf [OpenMP] Remove obsolete external interface for device RT
We do not generate _serialized_parallel calls in device mode, no
need for an external API.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D112145
2021-10-27 18:22:35 -05:00
Johannes Doerfert 5102c3c61e [OpenMP][FIX] Do not adjust the level after the environment was popped
Exiting a data environment will reset all values, it is wrong to adjust
them afterwards.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112144
2021-10-27 18:22:33 -05:00
Johannes Doerfert b16aadf0a7 [OpenMP] Introduce aligned synchronization into the new device RT
We will later use the fact that a barrier is aligned to reason about
thread divergence. For now we introduce the assumption and some more
documentation.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112153
2021-10-27 18:22:31 -05:00
Johannes Doerfert ef922c692f [OpenMP][FIX] Query proper thread ID information to support nesting
The OpenMP thread ID is not the hardware thread ID if we have nesting.
We need to ask the runtime properly to ensure correct results.

Note that the loop interface is going to change soon so we do not adjust
it now but simply ignore the extra argument.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111950
2021-10-27 18:18:44 -05:00
Johannes Doerfert 4c88341d17 [OpenMP][FIX] Do check the level before return team size
The team size could/should be an ICV but since we know it is either 1 or
a value we can leave it in the team state for now. However, we still
need to determine if the current level is nested before we use it.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D111949
2021-10-27 18:18:42 -05:00
Johannes Doerfert dc72960967 [OpenMP][FIX] Do not dereference a potential nullptr
The first thread state in the new GPU runtime doesn't have a previous
one and we should not dereference the nullptr placeholder.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111946
2021-10-27 18:18:39 -05:00
AndreyChurbanov a64797b5b8 [OpenMP][NFC] disable test on power because of -mlong-double-80 option 2021-10-27 16:54:44 +03:00
AndreyChurbanov c704b25b44 [OpenMP] libomp: Fix possible NULL dereference.
According to dlsym description, the value of symbol could be NULL,
and there is no error in this case. Thus dlerror will also return NULL in
this case. We need to check the value returned by dlerror before printing it.

Differential Revision: https://reviews.llvm.org/D112174
2021-10-27 16:54:44 +03:00
Vignesh Balasubramanian b0277bef97 [OpenMP][OMPD] Implementation of OMPD debugging library - libompd.
This is a continuation of the review: https://reviews.llvm.org/D100183
It contains routines that retrieve OpenMP ICV values for OMPD.

Reviewed By: @hbae
Differential Revision: https://reviews.llvm.org/D100184
2021-10-27 16:31:19 +05:30
Jon Chesterfield e42e5785ad [libomptarget][nfc]Generalise DeviceRTL cmake to allow building for amdgpu
Essentially moves the foreach over sm integers into a macro and instantiates it for nvptx.

NFC in that the macro is not presently instantiated for amdgpu as the corresponding code doesn't compile yet.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D111987
2021-10-26 21:18:21 +01:00
Ron Lieberman be03ef3ed1 [openmp][lit] Add support to OpenMP lit.cfg for ROCR_VISIBLE_DEVICES env-var
add support for ROCR_VISIBLE_DEVICES similar to name and purpose
as CUDA_VISIBLE_DEVICES

Differential Revision: https://reviews.llvm.org/D112503
2021-10-26 13:46:42 +00:00
Georgios Rokos 2feafa2e46 [libomptarget][NFC] Add comment explaining why we pass argument bases and
offsets as two separate entities to the plugins.
2021-10-25 14:51:14 -07:00
Shilei Tian 2a30c03c62 [OpenMP][Offloading] Only get trip count if team construct
Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D112475
2021-10-25 17:16:14 -04:00
AndreyChurbanov e38a1deb66 [OpenMP] libomp: disable definitions of 5.1 atomics for non-x86 arch.
Declarations of 5.1 atomic entries were added under
"#if KMP_ARCH_X86 || KMP_ARCH_X86_64" in kmp_atomic.h,
but definitions of the functions missed architecture guard in kmp_atomic.cpp.
As a result mangled symbols were available on non-x86 architecture.
The patch eliminates these unexpected symbols from the library.

Differential Revision: https://reviews.llvm.org/D112261
2021-10-25 21:17:26 +03:00
Vladimir Inđić f41d08540b [OpenMP][OMPT] thread_num determination during execution of nested serialized parallel regions
__ompt_get_task_info_internal function is adapted to support thread_num
determination during the execution of multiple nested serialized
parallel regions enclosed by a regular parallel region.

Consider the following program that contains parallel region R1 executed
by two threads. Let the worker thread T of region R1 executes serialized
parallel regions R2 that encloses another serialized parallel region R3.
Note that the thread T is the master thread of both R2 and R3 regions.

Assume that __ompt_get_task_info_internal function is called with the
argument "ancestor_level == 1" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside
the team of region R2, whose implicit task is at level 1 inside the
hierarchy of active tasks. Since the thread T is the master thread of
region R2, one should expected that "thread_num" takes a value 0.
After the while loop finishes, the following stands: "lwt != NULL",
"prev_lwt == NULL", "prev_team" represents the team information about
the innermost serialized parallel region R3. This results in executing
the assignment "thread_num = prev_team->t.t_master_tid". Note that
"prev_team->t.t_master_tid" was initialized at the moment of
R2’s creation and represents the "thread_num" of the thread T inside
the region R1 which encloses R2. Since the thread T is the worker thread
of the region R1, "the thread_num" takes value 1, which is a contradiction.

This patch proposes to use "lwt" instead of "prev_lwt" when determining
the "thread_num". If "lwt" exists, the task at the requested level belongs
to the serialized parallel region. Since the serialized parallel region
is executed by one thread only, the "thread_num" takes value 0.

Similarly, assume that __ompt_get_task_info_internal function is called
with the argument "ancestor_level == 2" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside the
team of region R1. Since the thread is the worker inside the region R1,
one should expected that "thread_num" takes value 1. After the loop finishes,
the following stands: "lwt == NULL", "prev_lwt != NULL", "prev_team" represents
the team information about the innermost serialized parallel region R3.
This leads to execution of the assignment "thread_num = 0", which causes
a contradiction.

Ignoring the "prev_lwt" leads to executing the assignment
"thread_num = prev_team->t.t_master_tid" instead. From the previous explanation,
it is obvious that "thread_num" takes value 1.

Note that the "prev_lwt" variable is marked as unnecessary and thus removed.

This patch introduces the test case which represents the OpenMP program
described earlier in the summary.

Differential Revision: https://reviews.llvm.org/D110699
2021-10-25 18:21:20 +02:00
Vladimir Inđić f2410bfb1c [OpenMP][OMPT][clang] task frame support fixed in __kmpc_fork_call
__kmp_fork_call sets the enter_frame of the active task (th_curren_task)
before new parallel region begins. After the region is finished, the
enter_frame is cleared.

The old implementation of __kmpc_fork_call didn’t clear the enter_frame of
active task.

Also, the way of initializing the enter_frame of the active task was wrong.
Consider the following two OpenMP programs.

The first program: Let R1 be the serialized parallel region that encloses
another serialized parallel region R2. Assume that thread that executes R2 is
going to create a new serialized parallel region R3 by executing
__kmpc_fork_call. This thread is responsible to set enter_frame of R2's
implicit task. Note that the information about R2's implicit task is present
inside master_th->th.th_current_task at this moment, while lwt represents the
information about R1's implicit task. The old implementation uses lwt and
resets enter_frame of R1's implicit task instead of R2's implicit task. The
new implementation uses master_th->th.th_current_task instead.

The second program: Consider the OpenMP program that contains parallel region
R1 which encloses an explicit task T. Assume that thread should create another
parallel region R2 during the execution of the T. The __kmpc_fork_call is
responsible to create R2 and set enter frame of T whose information is present
inside the master_th->th.th_current_task.
Old implementation tries to set the frame of
parent_team->t.t_implicit_task_taskdata[tid] which corresponds to the implicit
task of the R1, instead of T.

Differential Revision: https://reviews.llvm.org/D112419
2021-10-25 18:21:19 +02:00
Joachim Protze 7368227965 [OpenMP][Tests] Test omp_get_wtime for invariants
As discussed in D108488, testing for invariants of omp_get_wtime would be more
reliable than testing for duration of sleep, as return from sleep might be
delayed due to system load.

Alternatively/in addition, we could compare the time measured by omp_get_wtime
 to time measured with C++11 chrono (for portability?).

Differential Revision: https://reviews.llvm.org/D112458
2021-10-25 18:20:59 +02:00
Joachim Protze 3f229f42b7 [OpenMP][Tests][NFC] Actually check for test outcome
The CHECK: line in the test had no effect, because the test does not
pipe to FileCheck. Since the test only checks for a single value,
encode the result in the return value of the test.
2021-10-25 18:20:12 +02:00
Joachim Protze 047890bc3f [OpenMP][Tests][NFC] Mark tests trying to link COI as unsupported
For some tests with target-related functionality icc 18/19 tries to link
libioffload_target.so.5, which fails for missing COI symbols.
2021-10-25 18:20:12 +02:00
Joachim Protze d7fdd236d5 [OpenMP][Tests][NFC] Replace atomic increment by reduction
Also mark the test as unsupported by intel-21, because the test does
not terminate
2021-10-25 18:20:12 +02:00
Joachim Protze 38f78dd2e2 [OpenMP][Tools][NFC] Fix C99-style declaration of iteration variables
Where possible change to declare the variable before the loop.
Where not possible, specifically request -std=c99 (could be limited to
specific compilers like icc).
2021-10-25 18:20:12 +02:00
Joachim Protze d29a7d23ec [OpenMP][Tools][NFC] Pass intel license ENV to lit 2021-10-25 18:20:11 +02:00
Kazu Hirata d8e4170b0a Ensure newlines at the end of files (NFC) 2021-10-23 08:45:29 -07:00
Jon Chesterfield bf6f955f39 [libomptarget] Run GPU offloading tests on both new and old runtime
Implemented by patching python config instead of modifying all
the tests so that -generic and XFAIL work as usual. Expectation is for
this to be reverted once the old runtime is deleted.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D112225
2021-10-22 23:28:44 +01:00