Commit Graph

1912 Commits

Author SHA1 Message Date
Vignesh Balasubramanian 939154125b [OpenMP] [OMPD] OPENMP_INSTALL_LIBDIR is set for the install dir
OPENMP_INSTALL_LIBDIR is set to the installation path of shared and static
libompd.This should avoid the mixing of 32 and 64 bit on same path in
multi-lib set-up.

Reviewed By: @mceier
Differential Revision: https://reviews.llvm.org/D109352
2021-09-13 10:25:50 +05:30
Joseph Huber 7eb899cbcd [OpenMP] Add more verbose remarks for runtime folding
We peform runtime folding, but do not currently emit remarks when it is
performed. This is because it comes from the runtime library and is
beyond the users control. However, people may still wish to view  this
and similar information easily, so we can enable this behaviour using a
special flag to enable verbose remarks.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109627
2021-09-10 17:36:06 -04:00
Ye Luo 2187cbf56f [OpenMP][libomptarget] Add __tgt_target_return_t enum for __tgt_target_XXX return int
The defintion of OFFLOAD_SUCCESS and OFFLOAD_FAIL used in plugin APIs and libomptarget public APIs are not consistent.
Create __tgt_target_return_t for libomptarget public APIs.

Differential Revision: https://reviews.llvm.org/D109304
2021-09-10 16:11:08 -05:00
Jon Chesterfield f244af5c9f [openmp][amdgpu] Update SupportAndFAQ docs 2021-09-10 18:35:29 +01:00
Johannes Doerfert 9f844aeeb4 [OpenMP][Docs] Remove old/outdated webpage
This should have happened a long time ago, now that openmp.llvm.org
redirects to openmp.llvm.org/docs we completely switched over to the
sphinx documentation page instead.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D108588
2021-09-10 12:11:05 -05:00
Jon Chesterfield 6760234e8d [libomptarget][amdgpu] Precisely manage hsa lifetime
The hsa library must be initialized before any calls into it and
destructed after the last call into it. There have been a number of bugs in
this area related to member variables which would like to use raii to manage
resources acquired from hsa.

This patch moves the init/shutdown of hsa into a class, such that when used as
the first member variable (could be a base), the lifetime of other member
variables are reliably scoped within it. This will allow other classes to use
raii reliably when used as member variables within the global.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D109512
2021-09-09 17:28:11 +01:00
Jon Chesterfield 2a581710c1 [openmp] No longer use LIBRARY_PATH to find devicertl
Given D109057, change test runner to use the libomptarget-x-bc-path
argument instead of the LIBRARY_PATH environment variable to find the device
library.

Also drop the use of LIBRARY_PATH environment variable as it is far
too easy to pull in the device library from an unrelated toolchain by accident
with the current setup. No loss in flexibility to developers as the clang
commandline used here is still available.

Reviewed By: jdoerfert, tianshilei1992

Differential Revision: https://reviews.llvm.org/D109061
2021-09-09 17:16:41 +01:00
Jon Chesterfield d642156f8f [libomptarget][nfc] Hoist hsa_init into rtl.cpp 2021-09-09 16:09:34 +01:00
Hansang Bae 3976035d68 [OpenMP] Fix line truncation in omp_lib.h
Fixed code that exceeds 72-column.

Differential Revision: https://reviews.llvm.org/D109469
2021-09-09 09:33:45 -05:00
AndreyChurbanov d40108e0af [OpenMP] libomp: runtime part of omp_all_memory task dependence implementation.
New omp_all_memory task dependence type is implemented.
Library recognizes the new type via either
(dependence_address == NULL && dependence_flag == 0x80)
or
(dependence_address == SIZE_MAX).
A task with new dependence type depends on each preceding task
with any dependence type (kind of a dependence barrier).

Differential Revision: https://reviews.llvm.org/D108574
2021-09-08 16:55:32 +03:00
Ye Luo 2cfe1a09d1 [OpenMP][libomptarget][NFC] Change checkDeviceAndCtors return type to bool.
What is exactly needed is only a boolean. Pulling OFFLOAD_SUCCESS/FAIL only adds confusion.

Differential Revision: https://reviews.llvm.org/D109303
2021-09-07 13:59:27 -05:00
Hansang Bae 224f51d879 [OpenMP] Add interface for 5.1 scope construct
The new interface only marks begin/end of a scope construct for
corresponding OMPT events, and we can use existing interfaces for
reduction operations.

Differential Revision: https://reviews.llvm.org/D108062
2021-09-07 11:22:21 -05:00
Nawrin Sultana c24da72fa4 [OpenMP] Change monotonicity of dynamic schedule
This patch changes the default monotonicity of dynamic schedule from
monotonic to non-monotonic when no modifier is specified.

Differential Revision: https://reviews.llvm.org/D109026
2021-09-07 08:18:46 -05:00
Ye Luo c3aecf87d5 [OpenMP][libomptarget] Change device vector elements to unique_ptr type
Using std::vector<DeviceTy> requires implementing copy constructor and copied assign operator for DeviceTy.
Indeed DeviceTy should never be copied. After changing to std::vector<std::unique_ptr<DeviceTy>>,
All the unsafe copy constructor and copy assign operator implementations can be removed.
Compilers mark them deleted due to mutex or underlying objects and this is the desired behavior.

Differential Revision: https://reviews.llvm.org/D109276
2021-09-06 22:28:49 -05:00
Ye Luo 8e5c1b039e [OpenMP][libomptarget] Change synchronize_ty return type to int32_t
Plugins always return int32_t. Stay consistent with other functions which return error status.

Differential Revision: https://reviews.llvm.org/D109341
2021-09-06 21:38:54 -05:00
Ron Lieberman fdac5adee6 [openmp] NFC add bitcode comment 2021-09-02 18:21:39 -05:00
Jon Chesterfield 201e466eba [libomptarget][amdgpu] Add gfx90a to build list 2021-09-02 18:11:02 +01:00
Jon Chesterfield 3153bdd547 [libomptarget][amdgpu] Drop env variables
Use the same debug print as the rest of libomptarget plugins with
the same environment control. Also drop the max queue size debugging hook as
I don't believe it is still in use, can bring it back near the rest of the env
handling in rtl.cpp if someone objects.

That makes most of rt.h and all of utils.cpp unused. Clean that up and simplify
control flow in a couple of places.

Behaviour change is that debug prints that used to use the old environment
variable now use the new one and print in slightly different format, and the
removal of the max queue size variable.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D108784
2021-09-02 11:02:39 +01:00
Ye Luo 289a1089cd [libomptarget] Move HostDataToTargetTy states into StatesTy
Use unique_ptr to achieve the effect of mutable.

Remove mutable keyword of DynRefCount and HoldRefCount
Remove std::shared_ptr from UpdateMtx

Reviewed By: tianshilei1992, grokos

Differential Revision: https://reviews.llvm.org/D109007
2021-09-01 23:36:05 -05:00
Fangrui Song 4d5220faf9 [OpenMP] Fix -Wunused-but-set-parameter in -DLLVM_ENABLE_ASSERTIONS=off builds. NFC 2021-09-01 17:55:13 -07:00
Joel E. Denny 1f9e437065 [OpenMP][AMDGPU] Remove unneeded XFAILs 2021-09-01 18:00:25 -04:00
Joel E. Denny 786a140650 [OpenMP] Use IsHostPtr where needed in rest of omptarget.cpp
As started in D107925, this patch replaces the remaining occurrences
of `UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin` in
`omptarget.cpp` with `IsHostPtr`.  The former condition is broken in
the rare case that the device and host happen to use the same address
for their mapped allocations.  I don't know how to write a test that's
likely to reveal this case.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D107928
2021-09-01 17:31:42 -04:00
Joel E. Denny d11bab0b73 [OpenMP] Use IsHostPtr where needed for targetDataBegin
As discussed in D105990, without this patch, `targetDataBegin`
determines whether to transfer data (as opposed to assuming it's in
shared memory) using the condition `!UseUSM || HasCloseModifier`.
However, this condition is broken if use of discrete memory was forced
by `omp_target_associate_ptr`.  This patch extends
`unified_shared_memory/associate_ptr.c` to reveal this case, and it
fixes it using `!IsHostPtr` in `DeviceTy::getTargetPointer` to replace
this condition.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D107927
2021-09-01 17:31:42 -04:00
Joel E. Denny fa6c275505 [OpenMP][NFC] Eliminate CopyMember from targetDataEnd
This patch is based on comments in D105990.  It is NFC according to
the following observations:

1. `CopyMember` is computed as `!IsHostPtr && IsLast`.
2. `DelEntry` is true only if `IsLast` is true.

We apply those observations in order:

```
if ((DelEntry || Always || CopyMember) && !IsHostPtr)

if ((DelEntry || Always || IsLast) && !IsHostPtr)

if ((Always || IsLast) && !IsHostPtr)
```

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D107926
2021-09-01 17:31:42 -04:00
Joel E. Denny 8e4836b2a2 [OpenMP] Use IsHostPtr where needed for targetDataEnd
As discussed in D105990, without this patch, `targetDataEnd`
determines whether to transfer data or delete a device mapping (as
opposed to assuming it's in shared memory) using two different
conditions, each of which is broken for some cases:

1. `!(UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin)`: The
   broken case is rare: the device and host might happen to use the
   same address for their mapped allocations.  I don't know how to
   write a test that's likely to reveal this case, but this patch does
   fix it, as discussed below.
2. `!UNIFIED_SHARED_MEMORY || HasCloseModifier`: There are at least
   two broken cases:
    1. The `close` modifier might have been specified on an `omp
      target enter data` but not the corresponding `omp target exit
      data`, which thus might falsely assume a mapping is in shared
      memory.  The test `unified_shared_memory/close_enter_exit.c`
      already has a missing deletion as a result, and this patch adds
      a check for that.  This patch also adds the new test
      `close_member.c` to reveal a missing transfer and deletion.
    2. Use of discrete memory might have been forced by
      `omp_target_associate_ptr`, as in the test
      `unified_shared_memory/api.c`.  In the current `targetDataEnd`
      implementation, this condition turns out not be used for this
      case: because the reference count is infinite, a transfer is
      possible only with an `always` modifier, and this condition is
      never used in that case.  To ensure it's never used for that
      case in the future, this patch adds the test
      `unified_shared_memory/associate_ptr.c`.

Fortunately, `DeviceTy::getTgtPtrBegin` already has a solution: it
reports whether the allocation was found in shared memory via the
variable `IsHostPtr`.

After this patch, `HasCloseModifier` is no longer used in
`targetDataEnd`, and I wonder if the `close` modifier is ever useful
on an `omp target data end`.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D107925
2021-09-01 17:31:42 -04:00
Jon Chesterfield cef1199686 Revert "[openmp] No longer use LIBRARY_PATH to find devicertl"
This reverts commit 7a228f872f.
Failing test case under CI
2021-09-01 20:44:12 +01:00
Jon Chesterfield 7a228f872f [openmp] No longer use LIBRARY_PATH to find devicertl
Given D109057, change test runner to use the libomptarget-x-bc-path
argument instead of the LIBRARY_PATH environment variable to find the device
library.

Also drop the use of LIBRARY_PATH environment variable as it is far
too easy to pull in the device library from an unrelated toolchain by accident
with the current setup. No loss in flexibility to developers as the clang
commandline used here is still available.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109061
2021-09-01 20:24:34 +01:00
Jon Chesterfield 718e5a9883 [libomptarget] Set runpath on libomptarget, use that to drop LD_LIBRARY_PATH from test runner
Using rpath instead of LD_LIBRARY_PATH to find libomp.so and
libomptarget.so lets one rerun the already built test executables without
setting environment variables and removes the risk of the test runner picking
up different libraries to the developer debugging the failure.

rpath usually means runpath, which is not transitive, so set runpath on
libomptarget itself so that it can find the plugins located next to it,
spelled $ORIGIN. This provides sufficient functionality to drop D102043

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D109071
2021-09-01 18:47:56 +01:00
Jon Chesterfield f8bcbb82a7 [libomptarget] Normalise a cmake debug string, checking it triggers CI 2021-09-01 14:24:28 +01:00
Vignesh Balasubramanian b9a27908f9 [OpenMP][OMPD] Implementation of OMPD debugging library - libompd.
This is a continuation of the review: https://reviews.llvm.org/D100181
Creates a new directory "libompd" under openmp.

"TargetValue" provides operational access to the OpenMP runtime memory
 for OMPD APIs.
With TargetValue, using "pointer" a user can do multiple operations
from casting, dereferencing to accessing an element for structure.
The member functions are designed to concatenate the operations that
are needed to access values from structures.

e.g., _a[6]->_b._c would read like :
TValue(ctx, "_a").cast("A",2)
	.getArrayElement(6).access("_b").cast("B").access("_c")
For example:
If you have a pointer "ThreadHandle" of a running program then you can
access/retrieve "threadID" from the memory using TargetValue as below.

  TValue(context, thread_handle->th) /*__kmp_threads[t]->th*/
    .cast("kmp_base_info_t")
    .access("th_info") /*__kmp_threads[t]->th.th_info*/
    .cast("kmp_desc_t")
    .access("ds") /*__kmp_threads[t]->th.th_info.ds*/
    .cast("kmp_desc_base_t")
    .access("ds_thread") /*__kmp_threads[t]->th.th_info.ds.ds_thread*/
                .cast("kmp_thread_t")
.getRawValue(thread_id, 1);

Reviewed By: @hbae
Differential Revision: https://reviews.llvm.org/D100182
2021-09-01 14:50:16 +05:30
Joel E. Denny 1688b4cf8e [OpenMP][AMDGPU] XFAIL test where kernels call printf 2021-08-31 22:11:28 -04:00
Joel E. Denny ec1ebcd302 [OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in runtime (2/2)
This patch implements OpenMP runtime support for an original OpenMP
extension we have developed to support OpenACC: the `ompx_hold` map
type modifier.  The previous patch in this series, D106509, implements
Clang support and documents the new functionality in detail.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D106510
2021-08-31 16:13:49 -04:00
Joel E. Denny 83ddfa0d22 [OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in Clang (1/2)
This patch implements Clang support for an original OpenMP extension
we have developed to support OpenACC: the `ompx_hold` map type
modifier.  The next patch in this series, D106510, implements OpenMP
runtime support.

Consider the following example:

```
 #pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x
 {
   foo(); // might have map(delete: x)
   #pragma omp target map(present, alloc: x) // x is guaranteed to be present
   printf("%d\n", x);
 }
```

The `ompx_hold` map type modifier above specifies that the `target
data` directive holds onto the mapping for `x` throughout the
associated region regardless of any `target exit data` directives
executed during the call to `foo`.  Thus, the presence assertion for
`x` at the enclosed `target` construct cannot fail.  (As usual, the
standard OpenMP reference count for `x` must also reach zero before
the data is unmapped.)

Justification for inclusion in Clang and LLVM's OpenMP runtime:

* The `ompx_hold` modifier supports OpenACC functionality (structured
  reference count) that cannot be achieved in standard OpenMP, as of
  5.1.
* The runtime implementation for `ompx_hold` (next patch) will thus be
  used by Flang's OpenACC support.
* The Clang implementation for `ompx_hold` (this patch) as well as the
  runtime implementation are required for the Clang OpenACC support
  being developed as part of the ECP Clacc project, which translates
  OpenACC to OpenMP at the directive AST level.  These patches are the
  first step in upstreaming OpenACC functionality from Clacc.
* The Clang implementation for `ompx_hold` is also used by the tests
  in the runtime implementation.  That syntactic support makes the
  tests more readable than low-level runtime calls can.  Moreover,
  upstream Flang and Clang do not yet support OpenACC syntax
  sufficiently for writing the tests.
* More generally, the Clang implementation enables a clean separation
  of concerns between OpenACC and OpenMP development in LLVM.  That
  is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's
  extended OpenMP implementation and test suite without directly
  considering OpenACC's language and execution model, which can be
  handled by LLVM's OpenACC developers.
* OpenMP users might find the `ompx_hold` modifier useful, as in the
  above example.

See new documentation introduced by this patch in `openmp/docs` for
more detail on the functionality of this extension and its
relationship with OpenACC.  For example, it explains how the runtime
must support two reference counts, as specified by OpenACC.

Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new
command-line option introduced by this patch, is specified.

Reviewed By: ABataev, jdoerfert, protze.joachim, grokos

Differential Revision: https://reviews.llvm.org/D106509
2021-08-31 16:13:49 -04:00
Shilei Tian 8442967fe3 [OpenMP] Fix task wait doesn't work as expected in serialized team
As discussed in D107121, task wait doesn't work when a regular task T depends on
a detached task or a hidden helper task T' in a serialized team. The root cause is,
since the team is serialized, the last task will not be tracked by
`td_incomplete_child_tasks`. When T' is finished, it first releases its
dependences, and then decrements its parent counter. So far so good. For the thread
that is running task wait, if at the moment it is still spinning and trying to
execute tasks, it is fine because it can detect the new task and execute it.
However, if it happends to finish the function `flag.execute_tasks(...)`, it will
be broken because `td_incomplete_child_tasks` is 0 now.

In this patch, we update the rule to track children tasks a little bit. If the
task team encounters a proxy task or a hidden helper task, all following tasks
will be tracked.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D107496
2021-08-31 12:15:46 -04:00
Joachim Protze 5ea1c37118 [libomptarget][amdcgn] Only add opt/llvm-link dependency if TARGET is available
In some build configurations, the target we depend on is not available for declaring the build dependency.
We only need to declare the build dependency, if the build target is available in the same build.

Fixes the issue raised in https://reviews.llvm.org/D107156#2969862
This patch should go into release/13 together with D108404

Differential Revision: https://reviews.llvm.org/D108868
2021-08-30 17:32:11 +02:00
Shilei Tian e8fdacfd81 [OpenMP][NVPTX] Fixed missing variables for CUDA free compilation in NVPTX plugin
`CU_EVENT_DEFAULT` is defined in CUDA header. It should be added to
`openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` for CUDA free build.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108878
2021-08-28 18:08:10 -04:00
Shilei Tian 29df4ab3f3 [OpenMP][Offloading] Add support for event related interfaces
This patch adds the support form event related interfaces, which will be used
later to fix data race. See D104418 for more details.

Reviewed By: jdoerfert, ye-luo

Differential Revision: https://reviews.llvm.org/D108528
2021-08-28 16:24:14 -04:00
George Rokos a2bd44089e [libomptarget][NFC] Fixed tests which checked for obsolete string "getOrAllocTgtPtr" 2021-08-28 07:35:42 -07:00
Jon Chesterfield 78f92c3810 [openmp][amdgpu] Initial gfx10 offloading implementation
Lets wavefront size be 32 for amdgpu openmp, as well as 64.

Fixes up as little as possible to pass that through the libraries. This change
is end to end, as opposed to updating clang/devicertl/plugin separately. It can
be broken up for review/commit if preferred. Posting as-is so that others with
a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are
probably bugs remaining as well as the todo: for letting grid values vary more.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108708
2021-08-27 12:34:03 +01:00
George Rokos 3819aae6dd [libomptarget][NFC] Replaced obsolete name "getOrAllocTgtPtr" with new "getTargetPointer" in debug messages. 2021-08-26 18:01:18 -07:00
Jon Chesterfield 3d85342982 [libomptarget][amdgpu][nfc] Rename variables, delete dead code 2021-08-26 19:58:38 +01:00
Jon Chesterfield 68ab93f4d7 [libomptarget][amdgpu][nfc] Rename source files 2021-08-26 18:29:44 +01:00
Jon Chesterfield a5f4074d85 [libomptarget][amdgpu] Macro for accessing GPU variables from plugin
Lets the amdgpu plugin write to omptarget_device_environment
to enable debugging. Intend to use in the near future to record the
wavesize that a given deviceRTL was compiled with for running on hardware
that supports 32 or 64.

Patch sets all the attributes that are useful. Notably .data means the variable
is set by writing to host memory before copying to the GPU instead of launching
a kernel to update the image. Can simplify the plugin slightly to drop the
code for patching after load if this is used consistently.

NFC on nvptx, cuda plugin seems to work fine without any annotations.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108698
2021-08-26 17:28:18 +01:00
Jon Chesterfield ba0af885e7 [libomptarget][amdgpu][nfc] Make grid value access match devicertl 2021-08-25 15:11:19 +01:00
Jon Chesterfield 9b2c6c07b5 [libomptarget][amdgpu] Refactor debug printing
Move most debug printing in rtl.cpp behind DP() macro
Adjust the print output for gpu arch mismatch when the architectures match
Convert an assert into graceful failure

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108562
2021-08-25 14:57:51 +01:00
Jon Chesterfield ba8547775b [libomptarget][amdgpu] Fix debug build from D104696 2021-08-25 01:27:51 +01:00
Michael Kruse 1275ee3041 [OpenMP][amdgcn] Don't use in-tree clang if not available.
The use of `$<TARGET_FILE:clang>` was adapted too broadly from D101265.

Fixes llvm.org/PR51579

Also see discussion in D108534.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D108640
2021-08-24 12:50:49 -05:00
Pushpinder Singh 9b8b7c1180 [AMDGPU][Libomptarget] Delete g_atl_machine global
With uses of g_atl_machine gone, a significant portion of dead
code has been removed.

This patch depends on D104691 and D104695.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104696
2021-08-24 07:59:40 +00:00
Jon Chesterfield d26000e4cc [openmp][devicertl] Freestanding nvptx via stub printf
Compiled nvptx devicertl as freestanding, breaking the
dependency on host glibc and gcc-multilibs. Thus build it by default.

Comes at the cost of #defining out printf. Tried mapping it onto
__builtin_printf but that gets transformed back to printf instead
of hitting the cuda/openmp lowering transform.

Printf could be preserved by one of:
- dropping all the standard headers and ffreestanding
- providing a header only printf implementation
- changing the compiler handling of printf

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D108349
2021-08-23 23:07:47 +01:00
Jon Chesterfield 842f875c8b [openmp] Use llvm GridValues from devicertl
Add include path to the cmakefiles and set the target_impl enums
from the llvm constants instead of copying the values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108391
2021-08-23 20:25:24 +01:00