Commit Graph

2123 Commits

Author SHA1 Message Date
Giorgis Georgakoudis 1d66649adf [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert, jhuber6

Differential Revision: https://reviews.llvm.org/D102107
2021-09-21 10:50:04 -07:00
Shilei Tian 49e976c934 [OpenMP][NVPTX] Fix a warning that data argument not used by format string
Reviewed By: jhuber6, grokos

Differential Revision: https://reviews.llvm.org/D110104
2021-09-20 17:22:14 -04:00
Peyton, Jonathan L 1e45cd75df [OpenMP][host runtime] Fix indirect lock table race condition
The indirect lock table can exhibit a race condition during initializing
and setting/unsetting locks. This occurs if the lock table is
resized by one thread (during an omp_init_lock) and accessed (during an
omp_set|unset_lock) by another thread.

The test runtime/test/lock/omp_init_lock.c test exposed this issue and
will fail if run enough times.

This patch restructures the lock table so pointer/iterator validity is
always kept. Instead of reallocating a single table to a larger size, the
lock table begins preallocated to accommodate 8K locks. Each row of the
table is allocated as needed with each row allowing 1K locks. If the 8K
limit is reached for the initial table, then another table, capable of
holding double the number of locks, is allocated and linked
as the next table. The indices stored in the user's locks take this
linked structure into account when finding the lock within the table.

Differential Revision: https://reviews.llvm.org/D109725
2021-09-20 13:01:58 -05:00
Joseph Huber f1c821fa85 [OpenMP] Add support for dynamic shared memory in new RTL
This patch adds support for using dynamic shared memory in the new
device runtime. The new function `__kmpc_get_dynamic_shared` will return a
pointer to the buffer of dynamic shared memory. Currently the amount of memory
allocated is set by an environment variable.

In the future this amount will be added to the amount used for the smart stack
which will be configured in a similar way.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110006
2021-09-17 21:25:36 -04:00
Joseph Huber ec02c34b6d [OpenMP] Add additional fields to device environment
This patch adds fields for the device number and number of devices into
the device environment struct and debugging values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110004
2021-09-17 21:25:32 -04:00
Joseph Huber b266bcb135 [OpenMP] Implement __assert_fail in the new device runtime
This patch implements the `__assert_fail` function in the new device
runtime. This allows users and developers to use the standars assert
function inside of the device.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D109886
2021-09-17 21:25:28 -04:00
Shilei Tian 81a1a91c62 [NFC] clang-format -i /openmp/libomptarget/deviceRTLs/interface.h 2021-09-17 12:55:02 -04:00
AndreyChurbanov 59b877d001 [OpenMP] NFC: add type casts to silence gcc warnings 2021-09-17 19:49:40 +03:00
AndreyChurbanov 7f1a6d891e [OpenMP] libomp: Update third-party sources of ittnotify client code.
The third-party ittnotify sources updated from https://github.com/intel/ittapi.
Changes applied:
- llvm license aded to all files; initial BSD license saved in LICENSE.txt;
- clang-formatted;
- renamed *.c to *.cpp, similar to what we did with all our sources;
- added #include "kmp_config.h" with definition of INTEL_ITTNOTIFY_PREFIX macro
  into ittnotify_static.cpp.

Differential Revision: https://reviews.llvm.org/D109333
2021-09-17 19:38:34 +03:00
Hansang Bae ae2a5facce [OpenMP][libomptarget] Minor fix in x86_64 plugin
Call to remove() was passing invalid address for the file name.

Differential Revision: https://reviews.llvm.org/D109846
2021-09-15 15:57:06 -05:00
Peyton, Jonathan L 258e27aae1 [OpenMP] Add support for GOMP depobj
GOMP depobjs are represented as a two intptr_t array. The first
element is the base address of the dependency and the second element
is the flag indicating the type the depobj represents.

Differential Revision: https://reviews.llvm.org/D108790
2021-09-15 12:47:08 -05:00
Vignesh Balasubramanian 939154125b [OpenMP] [OMPD] OPENMP_INSTALL_LIBDIR is set for the install dir
OPENMP_INSTALL_LIBDIR is set to the installation path of shared and static
libompd.This should avoid the mixing of 32 and 64 bit on same path in
multi-lib set-up.

Reviewed By: @mceier
Differential Revision: https://reviews.llvm.org/D109352
2021-09-13 10:25:50 +05:30
Joseph Huber 7eb899cbcd [OpenMP] Add more verbose remarks for runtime folding
We peform runtime folding, but do not currently emit remarks when it is
performed. This is because it comes from the runtime library and is
beyond the users control. However, people may still wish to view  this
and similar information easily, so we can enable this behaviour using a
special flag to enable verbose remarks.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109627
2021-09-10 17:36:06 -04:00
Ye Luo 2187cbf56f [OpenMP][libomptarget] Add __tgt_target_return_t enum for __tgt_target_XXX return int
The defintion of OFFLOAD_SUCCESS and OFFLOAD_FAIL used in plugin APIs and libomptarget public APIs are not consistent.
Create __tgt_target_return_t for libomptarget public APIs.

Differential Revision: https://reviews.llvm.org/D109304
2021-09-10 16:11:08 -05:00
Jon Chesterfield f244af5c9f [openmp][amdgpu] Update SupportAndFAQ docs 2021-09-10 18:35:29 +01:00
Johannes Doerfert 9f844aeeb4 [OpenMP][Docs] Remove old/outdated webpage
This should have happened a long time ago, now that openmp.llvm.org
redirects to openmp.llvm.org/docs we completely switched over to the
sphinx documentation page instead.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D108588
2021-09-10 12:11:05 -05:00
Jon Chesterfield 6760234e8d [libomptarget][amdgpu] Precisely manage hsa lifetime
The hsa library must be initialized before any calls into it and
destructed after the last call into it. There have been a number of bugs in
this area related to member variables which would like to use raii to manage
resources acquired from hsa.

This patch moves the init/shutdown of hsa into a class, such that when used as
the first member variable (could be a base), the lifetime of other member
variables are reliably scoped within it. This will allow other classes to use
raii reliably when used as member variables within the global.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D109512
2021-09-09 17:28:11 +01:00
Jon Chesterfield 2a581710c1 [openmp] No longer use LIBRARY_PATH to find devicertl
Given D109057, change test runner to use the libomptarget-x-bc-path
argument instead of the LIBRARY_PATH environment variable to find the device
library.

Also drop the use of LIBRARY_PATH environment variable as it is far
too easy to pull in the device library from an unrelated toolchain by accident
with the current setup. No loss in flexibility to developers as the clang
commandline used here is still available.

Reviewed By: jdoerfert, tianshilei1992

Differential Revision: https://reviews.llvm.org/D109061
2021-09-09 17:16:41 +01:00
Jon Chesterfield d642156f8f [libomptarget][nfc] Hoist hsa_init into rtl.cpp 2021-09-09 16:09:34 +01:00
Hansang Bae 3976035d68 [OpenMP] Fix line truncation in omp_lib.h
Fixed code that exceeds 72-column.

Differential Revision: https://reviews.llvm.org/D109469
2021-09-09 09:33:45 -05:00
AndreyChurbanov d40108e0af [OpenMP] libomp: runtime part of omp_all_memory task dependence implementation.
New omp_all_memory task dependence type is implemented.
Library recognizes the new type via either
(dependence_address == NULL && dependence_flag == 0x80)
or
(dependence_address == SIZE_MAX).
A task with new dependence type depends on each preceding task
with any dependence type (kind of a dependence barrier).

Differential Revision: https://reviews.llvm.org/D108574
2021-09-08 16:55:32 +03:00
Ye Luo 2cfe1a09d1 [OpenMP][libomptarget][NFC] Change checkDeviceAndCtors return type to bool.
What is exactly needed is only a boolean. Pulling OFFLOAD_SUCCESS/FAIL only adds confusion.

Differential Revision: https://reviews.llvm.org/D109303
2021-09-07 13:59:27 -05:00
Hansang Bae 224f51d879 [OpenMP] Add interface for 5.1 scope construct
The new interface only marks begin/end of a scope construct for
corresponding OMPT events, and we can use existing interfaces for
reduction operations.

Differential Revision: https://reviews.llvm.org/D108062
2021-09-07 11:22:21 -05:00
Nawrin Sultana c24da72fa4 [OpenMP] Change monotonicity of dynamic schedule
This patch changes the default monotonicity of dynamic schedule from
monotonic to non-monotonic when no modifier is specified.

Differential Revision: https://reviews.llvm.org/D109026
2021-09-07 08:18:46 -05:00
Ye Luo c3aecf87d5 [OpenMP][libomptarget] Change device vector elements to unique_ptr type
Using std::vector<DeviceTy> requires implementing copy constructor and copied assign operator for DeviceTy.
Indeed DeviceTy should never be copied. After changing to std::vector<std::unique_ptr<DeviceTy>>,
All the unsafe copy constructor and copy assign operator implementations can be removed.
Compilers mark them deleted due to mutex or underlying objects and this is the desired behavior.

Differential Revision: https://reviews.llvm.org/D109276
2021-09-06 22:28:49 -05:00
Ye Luo 8e5c1b039e [OpenMP][libomptarget] Change synchronize_ty return type to int32_t
Plugins always return int32_t. Stay consistent with other functions which return error status.

Differential Revision: https://reviews.llvm.org/D109341
2021-09-06 21:38:54 -05:00
Ron Lieberman fdac5adee6 [openmp] NFC add bitcode comment 2021-09-02 18:21:39 -05:00
Jon Chesterfield 201e466eba [libomptarget][amdgpu] Add gfx90a to build list 2021-09-02 18:11:02 +01:00
Jon Chesterfield 3153bdd547 [libomptarget][amdgpu] Drop env variables
Use the same debug print as the rest of libomptarget plugins with
the same environment control. Also drop the max queue size debugging hook as
I don't believe it is still in use, can bring it back near the rest of the env
handling in rtl.cpp if someone objects.

That makes most of rt.h and all of utils.cpp unused. Clean that up and simplify
control flow in a couple of places.

Behaviour change is that debug prints that used to use the old environment
variable now use the new one and print in slightly different format, and the
removal of the max queue size variable.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D108784
2021-09-02 11:02:39 +01:00
Ye Luo 289a1089cd [libomptarget] Move HostDataToTargetTy states into StatesTy
Use unique_ptr to achieve the effect of mutable.

Remove mutable keyword of DynRefCount and HoldRefCount
Remove std::shared_ptr from UpdateMtx

Reviewed By: tianshilei1992, grokos

Differential Revision: https://reviews.llvm.org/D109007
2021-09-01 23:36:05 -05:00
Fangrui Song 4d5220faf9 [OpenMP] Fix -Wunused-but-set-parameter in -DLLVM_ENABLE_ASSERTIONS=off builds. NFC 2021-09-01 17:55:13 -07:00
Joel E. Denny 1f9e437065 [OpenMP][AMDGPU] Remove unneeded XFAILs 2021-09-01 18:00:25 -04:00
Joel E. Denny 786a140650 [OpenMP] Use IsHostPtr where needed in rest of omptarget.cpp
As started in D107925, this patch replaces the remaining occurrences
of `UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin` in
`omptarget.cpp` with `IsHostPtr`.  The former condition is broken in
the rare case that the device and host happen to use the same address
for their mapped allocations.  I don't know how to write a test that's
likely to reveal this case.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D107928
2021-09-01 17:31:42 -04:00
Joel E. Denny d11bab0b73 [OpenMP] Use IsHostPtr where needed for targetDataBegin
As discussed in D105990, without this patch, `targetDataBegin`
determines whether to transfer data (as opposed to assuming it's in
shared memory) using the condition `!UseUSM || HasCloseModifier`.
However, this condition is broken if use of discrete memory was forced
by `omp_target_associate_ptr`.  This patch extends
`unified_shared_memory/associate_ptr.c` to reveal this case, and it
fixes it using `!IsHostPtr` in `DeviceTy::getTargetPointer` to replace
this condition.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D107927
2021-09-01 17:31:42 -04:00
Joel E. Denny fa6c275505 [OpenMP][NFC] Eliminate CopyMember from targetDataEnd
This patch is based on comments in D105990.  It is NFC according to
the following observations:

1. `CopyMember` is computed as `!IsHostPtr && IsLast`.
2. `DelEntry` is true only if `IsLast` is true.

We apply those observations in order:

```
if ((DelEntry || Always || CopyMember) && !IsHostPtr)

if ((DelEntry || Always || IsLast) && !IsHostPtr)

if ((Always || IsLast) && !IsHostPtr)
```

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D107926
2021-09-01 17:31:42 -04:00
Joel E. Denny 8e4836b2a2 [OpenMP] Use IsHostPtr where needed for targetDataEnd
As discussed in D105990, without this patch, `targetDataEnd`
determines whether to transfer data or delete a device mapping (as
opposed to assuming it's in shared memory) using two different
conditions, each of which is broken for some cases:

1. `!(UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin)`: The
   broken case is rare: the device and host might happen to use the
   same address for their mapped allocations.  I don't know how to
   write a test that's likely to reveal this case, but this patch does
   fix it, as discussed below.
2. `!UNIFIED_SHARED_MEMORY || HasCloseModifier`: There are at least
   two broken cases:
    1. The `close` modifier might have been specified on an `omp
      target enter data` but not the corresponding `omp target exit
      data`, which thus might falsely assume a mapping is in shared
      memory.  The test `unified_shared_memory/close_enter_exit.c`
      already has a missing deletion as a result, and this patch adds
      a check for that.  This patch also adds the new test
      `close_member.c` to reveal a missing transfer and deletion.
    2. Use of discrete memory might have been forced by
      `omp_target_associate_ptr`, as in the test
      `unified_shared_memory/api.c`.  In the current `targetDataEnd`
      implementation, this condition turns out not be used for this
      case: because the reference count is infinite, a transfer is
      possible only with an `always` modifier, and this condition is
      never used in that case.  To ensure it's never used for that
      case in the future, this patch adds the test
      `unified_shared_memory/associate_ptr.c`.

Fortunately, `DeviceTy::getTgtPtrBegin` already has a solution: it
reports whether the allocation was found in shared memory via the
variable `IsHostPtr`.

After this patch, `HasCloseModifier` is no longer used in
`targetDataEnd`, and I wonder if the `close` modifier is ever useful
on an `omp target data end`.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D107925
2021-09-01 17:31:42 -04:00
Jon Chesterfield cef1199686 Revert "[openmp] No longer use LIBRARY_PATH to find devicertl"
This reverts commit 7a228f872f.
Failing test case under CI
2021-09-01 20:44:12 +01:00
Jon Chesterfield 7a228f872f [openmp] No longer use LIBRARY_PATH to find devicertl
Given D109057, change test runner to use the libomptarget-x-bc-path
argument instead of the LIBRARY_PATH environment variable to find the device
library.

Also drop the use of LIBRARY_PATH environment variable as it is far
too easy to pull in the device library from an unrelated toolchain by accident
with the current setup. No loss in flexibility to developers as the clang
commandline used here is still available.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109061
2021-09-01 20:24:34 +01:00
Jon Chesterfield 718e5a9883 [libomptarget] Set runpath on libomptarget, use that to drop LD_LIBRARY_PATH from test runner
Using rpath instead of LD_LIBRARY_PATH to find libomp.so and
libomptarget.so lets one rerun the already built test executables without
setting environment variables and removes the risk of the test runner picking
up different libraries to the developer debugging the failure.

rpath usually means runpath, which is not transitive, so set runpath on
libomptarget itself so that it can find the plugins located next to it,
spelled $ORIGIN. This provides sufficient functionality to drop D102043

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D109071
2021-09-01 18:47:56 +01:00
Jon Chesterfield f8bcbb82a7 [libomptarget] Normalise a cmake debug string, checking it triggers CI 2021-09-01 14:24:28 +01:00
Vignesh Balasubramanian b9a27908f9 [OpenMP][OMPD] Implementation of OMPD debugging library - libompd.
This is a continuation of the review: https://reviews.llvm.org/D100181
Creates a new directory "libompd" under openmp.

"TargetValue" provides operational access to the OpenMP runtime memory
 for OMPD APIs.
With TargetValue, using "pointer" a user can do multiple operations
from casting, dereferencing to accessing an element for structure.
The member functions are designed to concatenate the operations that
are needed to access values from structures.

e.g., _a[6]->_b._c would read like :
TValue(ctx, "_a").cast("A",2)
	.getArrayElement(6).access("_b").cast("B").access("_c")
For example:
If you have a pointer "ThreadHandle" of a running program then you can
access/retrieve "threadID" from the memory using TargetValue as below.

  TValue(context, thread_handle->th) /*__kmp_threads[t]->th*/
    .cast("kmp_base_info_t")
    .access("th_info") /*__kmp_threads[t]->th.th_info*/
    .cast("kmp_desc_t")
    .access("ds") /*__kmp_threads[t]->th.th_info.ds*/
    .cast("kmp_desc_base_t")
    .access("ds_thread") /*__kmp_threads[t]->th.th_info.ds.ds_thread*/
                .cast("kmp_thread_t")
.getRawValue(thread_id, 1);

Reviewed By: @hbae
Differential Revision: https://reviews.llvm.org/D100182
2021-09-01 14:50:16 +05:30
Joel E. Denny 1688b4cf8e [OpenMP][AMDGPU] XFAIL test where kernels call printf 2021-08-31 22:11:28 -04:00
Joel E. Denny ec1ebcd302 [OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in runtime (2/2)
This patch implements OpenMP runtime support for an original OpenMP
extension we have developed to support OpenACC: the `ompx_hold` map
type modifier.  The previous patch in this series, D106509, implements
Clang support and documents the new functionality in detail.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D106510
2021-08-31 16:13:49 -04:00
Joel E. Denny 83ddfa0d22 [OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in Clang (1/2)
This patch implements Clang support for an original OpenMP extension
we have developed to support OpenACC: the `ompx_hold` map type
modifier.  The next patch in this series, D106510, implements OpenMP
runtime support.

Consider the following example:

```
 #pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x
 {
   foo(); // might have map(delete: x)
   #pragma omp target map(present, alloc: x) // x is guaranteed to be present
   printf("%d\n", x);
 }
```

The `ompx_hold` map type modifier above specifies that the `target
data` directive holds onto the mapping for `x` throughout the
associated region regardless of any `target exit data` directives
executed during the call to `foo`.  Thus, the presence assertion for
`x` at the enclosed `target` construct cannot fail.  (As usual, the
standard OpenMP reference count for `x` must also reach zero before
the data is unmapped.)

Justification for inclusion in Clang and LLVM's OpenMP runtime:

* The `ompx_hold` modifier supports OpenACC functionality (structured
  reference count) that cannot be achieved in standard OpenMP, as of
  5.1.
* The runtime implementation for `ompx_hold` (next patch) will thus be
  used by Flang's OpenACC support.
* The Clang implementation for `ompx_hold` (this patch) as well as the
  runtime implementation are required for the Clang OpenACC support
  being developed as part of the ECP Clacc project, which translates
  OpenACC to OpenMP at the directive AST level.  These patches are the
  first step in upstreaming OpenACC functionality from Clacc.
* The Clang implementation for `ompx_hold` is also used by the tests
  in the runtime implementation.  That syntactic support makes the
  tests more readable than low-level runtime calls can.  Moreover,
  upstream Flang and Clang do not yet support OpenACC syntax
  sufficiently for writing the tests.
* More generally, the Clang implementation enables a clean separation
  of concerns between OpenACC and OpenMP development in LLVM.  That
  is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's
  extended OpenMP implementation and test suite without directly
  considering OpenACC's language and execution model, which can be
  handled by LLVM's OpenACC developers.
* OpenMP users might find the `ompx_hold` modifier useful, as in the
  above example.

See new documentation introduced by this patch in `openmp/docs` for
more detail on the functionality of this extension and its
relationship with OpenACC.  For example, it explains how the runtime
must support two reference counts, as specified by OpenACC.

Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new
command-line option introduced by this patch, is specified.

Reviewed By: ABataev, jdoerfert, protze.joachim, grokos

Differential Revision: https://reviews.llvm.org/D106509
2021-08-31 16:13:49 -04:00
Shilei Tian 8442967fe3 [OpenMP] Fix task wait doesn't work as expected in serialized team
As discussed in D107121, task wait doesn't work when a regular task T depends on
a detached task or a hidden helper task T' in a serialized team. The root cause is,
since the team is serialized, the last task will not be tracked by
`td_incomplete_child_tasks`. When T' is finished, it first releases its
dependences, and then decrements its parent counter. So far so good. For the thread
that is running task wait, if at the moment it is still spinning and trying to
execute tasks, it is fine because it can detect the new task and execute it.
However, if it happends to finish the function `flag.execute_tasks(...)`, it will
be broken because `td_incomplete_child_tasks` is 0 now.

In this patch, we update the rule to track children tasks a little bit. If the
task team encounters a proxy task or a hidden helper task, all following tasks
will be tracked.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D107496
2021-08-31 12:15:46 -04:00
Joachim Protze 5ea1c37118 [libomptarget][amdcgn] Only add opt/llvm-link dependency if TARGET is available
In some build configurations, the target we depend on is not available for declaring the build dependency.
We only need to declare the build dependency, if the build target is available in the same build.

Fixes the issue raised in https://reviews.llvm.org/D107156#2969862
This patch should go into release/13 together with D108404

Differential Revision: https://reviews.llvm.org/D108868
2021-08-30 17:32:11 +02:00
Shilei Tian e8fdacfd81 [OpenMP][NVPTX] Fixed missing variables for CUDA free compilation in NVPTX plugin
`CU_EVENT_DEFAULT` is defined in CUDA header. It should be added to
`openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` for CUDA free build.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108878
2021-08-28 18:08:10 -04:00
Shilei Tian 29df4ab3f3 [OpenMP][Offloading] Add support for event related interfaces
This patch adds the support form event related interfaces, which will be used
later to fix data race. See D104418 for more details.

Reviewed By: jdoerfert, ye-luo

Differential Revision: https://reviews.llvm.org/D108528
2021-08-28 16:24:14 -04:00
George Rokos a2bd44089e [libomptarget][NFC] Fixed tests which checked for obsolete string "getOrAllocTgtPtr" 2021-08-28 07:35:42 -07:00
Jon Chesterfield 78f92c3810 [openmp][amdgpu] Initial gfx10 offloading implementation
Lets wavefront size be 32 for amdgpu openmp, as well as 64.

Fixes up as little as possible to pass that through the libraries. This change
is end to end, as opposed to updating clang/devicertl/plugin separately. It can
be broken up for review/commit if preferred. Posting as-is so that others with
a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are
probably bugs remaining as well as the todo: for letting grid values vary more.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108708
2021-08-27 12:34:03 +01:00
George Rokos 3819aae6dd [libomptarget][NFC] Replaced obsolete name "getOrAllocTgtPtr" with new "getTargetPointer" in debug messages. 2021-08-26 18:01:18 -07:00
Jon Chesterfield 3d85342982 [libomptarget][amdgpu][nfc] Rename variables, delete dead code 2021-08-26 19:58:38 +01:00
Jon Chesterfield 68ab93f4d7 [libomptarget][amdgpu][nfc] Rename source files 2021-08-26 18:29:44 +01:00
Jon Chesterfield a5f4074d85 [libomptarget][amdgpu] Macro for accessing GPU variables from plugin
Lets the amdgpu plugin write to omptarget_device_environment
to enable debugging. Intend to use in the near future to record the
wavesize that a given deviceRTL was compiled with for running on hardware
that supports 32 or 64.

Patch sets all the attributes that are useful. Notably .data means the variable
is set by writing to host memory before copying to the GPU instead of launching
a kernel to update the image. Can simplify the plugin slightly to drop the
code for patching after load if this is used consistently.

NFC on nvptx, cuda plugin seems to work fine without any annotations.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108698
2021-08-26 17:28:18 +01:00
Jon Chesterfield ba0af885e7 [libomptarget][amdgpu][nfc] Make grid value access match devicertl 2021-08-25 15:11:19 +01:00
Jon Chesterfield 9b2c6c07b5 [libomptarget][amdgpu] Refactor debug printing
Move most debug printing in rtl.cpp behind DP() macro
Adjust the print output for gpu arch mismatch when the architectures match
Convert an assert into graceful failure

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108562
2021-08-25 14:57:51 +01:00
Jon Chesterfield ba8547775b [libomptarget][amdgpu] Fix debug build from D104696 2021-08-25 01:27:51 +01:00
Michael Kruse 1275ee3041 [OpenMP][amdgcn] Don't use in-tree clang if not available.
The use of `$<TARGET_FILE:clang>` was adapted too broadly from D101265.

Fixes llvm.org/PR51579

Also see discussion in D108534.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D108640
2021-08-24 12:50:49 -05:00
Pushpinder Singh 9b8b7c1180 [AMDGPU][Libomptarget] Delete g_atl_machine global
With uses of g_atl_machine gone, a significant portion of dead
code has been removed.

This patch depends on D104691 and D104695.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104696
2021-08-24 07:59:40 +00:00
Jon Chesterfield d26000e4cc [openmp][devicertl] Freestanding nvptx via stub printf
Compiled nvptx devicertl as freestanding, breaking the
dependency on host glibc and gcc-multilibs. Thus build it by default.

Comes at the cost of #defining out printf. Tried mapping it onto
__builtin_printf but that gets transformed back to printf instead
of hitting the cuda/openmp lowering transform.

Printf could be preserved by one of:
- dropping all the standard headers and ffreestanding
- providing a header only printf implementation
- changing the compiler handling of printf

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D108349
2021-08-23 23:07:47 +01:00
Jon Chesterfield 842f875c8b [openmp] Use llvm GridValues from devicertl
Add include path to the cmakefiles and set the target_impl enums
from the llvm constants instead of copying the values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108391
2021-08-23 20:25:24 +01:00
Peyton, Jonathan L d39d3a327b [OpenMP][test] fix omp_get_wtime.c test to be more accommodating
The omp_get_wtime.c test fails intermittently if the recorded times are
off by too much which can happen when many tests are run in parallel.

Instead of failing if one timing is a little off, take average of 100
timings minus the 10 worst.

Differential Revision: https://reviews.llvm.org/D108488
2021-08-23 08:13:42 -05:00
Vignesh Balasubramanian 589519b9ab [OpenMP][OMPD]Code movement required for OMPD
These changes don't come under OMPD guard as it is a movement of existing code to capture parallel behavior correctly.
"Runtime Entry Points for OMPD" like "ompd_bp_parallel_begin" and "ompd_bp_parallel_begin" should be placed at the correct execution point for the debugging tool to access proper handles/data.
Without the below changes, in certain cases, debugging tool will pick the wrong parallel and task handle.

Reviewed By: @hbae
Differential Revision: https://reviews.llvm.org/D100366
2021-08-20 14:36:22 +05:30
Shilei Tian 1d8d43ae61 [OpenMP] Use `__kmpc_give_task` in `__kmp_push_task` when encountering a hidden helper task
This patch replaces the current implementation, overwrites `gtid` and `thread`,
with `__kmpc_give_task`.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D106977
2021-08-19 20:49:29 -04:00
Joachim Protze 4bb36df144 [libomptarget][amdcgn] Add build dependency for llvm-link and opt
D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime
(LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same
cmake instance.

We could limit the dependency to cases where libomptarget/plugins are really
built. But compared to the whole llvm project, building openmp runtime is
negligible and postponing the build of OpenMP runtime after the dependencies
are ready seems reasonable.

The direct dependency introduced in D107156 and D107320 is necessary for the
case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp).

Differential Revision: https://reviews.llvm.org/D108404
2021-08-20 01:57:58 +02:00
Jennifer Yu c274b19866 Add implicit map for a list item appears in a reduction clause.
A new rule is added in 5.0:
If a list item appears in a reduction, lastprivate or linear clause
on a combined target construct then it is treated as if it also appears
in a map clause with a map-type of tofrom.

Currently map clauses for all capture variables are added implicitly.
But missing for list item of expression for array elements or array
sections.

The change is to add implicit map clause for array of elements used in
reduction clause. Skip adding map clause if the expression is not
mappable.
Noted: For linear and lastprivate, since only variable name is
accepted, the map has been added though capture variables.

To do so:
During the mappable checking, if error, ignore diagnose and skip
adding implicit map clause.

The changes:
1> Add code to generate implicit map in ActOnOpenMPExecutableDirective,
   for omp 5.0 and up.
2> Add extra default parameter NoDiagnose in ActOnOpenMPMapClause:
Use that to skip error as well as skip adding implicit map during the
mappable checking.

Note: there are only tow places need to be check for NoDiagnose. Rest
of them either the check is for < omp 5.0 or the error already generated for
reduction clause.

Differential Revision: https://reviews.llvm.org/D108132
2021-08-19 12:53:47 -07:00
Jon Chesterfield ad0f6e1d98 [openmp] Disable the tests that block CI for amdgpu and host offloading. 2021-08-19 20:43:30 +01:00
Jon Chesterfield 6c75ce1b8b [libomptarget][nfc] Move lanemask_t type into target_impl.h 2021-08-19 18:50:03 +01:00
Jon Chesterfield 77579b99e9 [openmp][nfc] Replace OMPGridValues array with struct
[nfc] Replaces enum indices into an array with a struct. Named the
fields to match the enum, leaves memory layout and initialization unchanged.

Motivation is to later safely remove dead fields and replace redundant ones
with (compile time) computation. It should also be possible to factor some
common fields into a base and introduce a gfx10 amdgpu instance with less
duplication than the arrays of integers require.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108339
2021-08-19 13:25:42 +01:00
Jon Chesterfield f420939b82 [libomptarget] Apply D106710 to amdgcn devicertl 2021-08-19 01:34:33 +01:00
Jon Chesterfield c480792b6a [libomptarget][nfc][devicertl] Delete unused enums 2021-08-19 00:14:34 +01:00
Jon Chesterfield 21d91a8ef3 [libomptarget][devicertl] Replace lanemask with uint64 at interface
Use uint64_t for lanemask on all GPU architectures at the interface
with clang. Updates tests. The deviceRTL is always linked as IR so the zext
and trunc introduced for wave32 architectures will fold after inlining.

Simplification partly motivated by amdgpu gfx10 which will be wave32 and
is awkward to express in the current arch-dependant typedef interface.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108317
2021-08-18 20:47:33 +01:00
Joseph Huber edb8acdc6e [Libomptarget] Correctly default to Generic if exec_mode is not present
Currently, the runtime returns an error when the `exec_mode` global is
not present. The expected behvaiour is that the region will default to
Generic. This prevents global constructors from being called because
they do not contain execution mode globals.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108255
2021-08-18 11:24:28 -04:00
Martin Storsjö f5616a981c [OpenMP] Fix the usage of sscanf on MinGW
KMP_SSCANF only evaluates to sscanf_s within
    #if KMP_OS_WINDOWS && KMP_MSVC_COMPAT
so we need to pass the sscanf_s specific parameters within a similar
condition.

Differential Revision: https://reviews.llvm.org/D108196
2021-08-17 21:36:09 +03:00
Peyton, Jonathan L b4a1f441d9 [OpenMP] Add a few small fixes
* Add comment to help ensure new construct data are added in two places
* Check for division by zero in the loop worksharing code
* Check for syntax errors in parrange parsing

Differential Revision: https://reviews.llvm.org/D105929
2021-08-16 10:02:49 -05:00
Peyton, Jonathan L 6eeb4c1f32 [OpenMP] Fix incorrect parameters to sscanf_s call
On Windows, the documentation states that when using sscanf_s,
each %c and %s specifier must also have additional size parameter.
This patch adds the size parameter in the one place where %c is
used.

Differential Revision: https://reviews.llvm.org/D105931
2021-08-16 09:59:21 -05:00
AndreyChurbanov 52cac541d4 [OpenMP] libomp: cleanup: minor fixes to silence static analyzer.
Added couple more checks to silence KlocWork static code analyzer.

Differential Revision: https://reviews.llvm.org/D107348
2021-08-16 13:39:23 +03:00
AndreyChurbanov f94da67f49 [OpenMP][NFC] libomp: reduced timeouts in the test from 50 to 2 sec. 2021-08-11 17:58:52 +03:00
George Rokos df06ec3057 [libomptarget][NFC] Fix compilation issue with GCC
Removed redundant assignment from condition which causes gcc to emit the following error:

error: operation on ‘MoveData’ may be undefined [-Werror=sequence-point]
2021-08-10 09:43:43 -07:00
Joel E. Denny 2ced1f338a [OpenMP][NFC] Simplify targetDataEnd conditions for CopyMember
targetDataEnd and targetDataBegin compute CopyMember/copy differently,
and I don't see why they should.  This patch eliminates one of those
differences by making a simplifying NFC change to targetDataEnd.

The change is NFC as follows.  The change only affects the case when
`!UNIFIED_SHARED_MEMORY || HasCloseModifier`.  In that case, the
following points are always true:

* The value of CopyMember is relevant later only if DelEntry = false.
* DelEntry = false only if one of the following is true:
    * IsLast = false.  In this case, it's always true that CopyMember
      = false = IsLast.
    * `MEMBER_OF && !PTR_AND_OBJ` is true.  In this case, CopyMember =
      IsLast.
* Thus, if CopyMember is relevant, CopyMember = IsLast.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D105990
2021-08-10 12:29:55 -04:00
Pirama Arumuga Nainar 49fabd9d76 [openmp] Do not use shared memory on Android
Android provides ashmem/ASharedMemory support on newer releases, which
we can use if requested by openmp users on Android.

Also refactor the preprocessor check for using shared memory to
kmp_config.h.cmake.

Differential Revision: https://reviews.llvm.org/D107181
2021-08-09 09:41:32 -07:00
Dimitry Andric 400cd6d2f0 [libomptarget][amdgpu] use --allow-shlib-undefined to link on FreeBSD
On FreeBSD, the `environ` symbol is undefined at link time for shared
libraries, but resolved by the dynamic linker at runtime. Therefore,
allow the symbol to be undefined when creating a shared library, by
using the `--allow-shlib-undefined` linker flag, instead of `-z defs`
(a.k.a `--no-undefined`).

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D107698
2021-08-08 13:52:44 +02:00
Ye Luo 262289c103 [OpenMP] mark target task untied
OpenMP specification Tasking Terminology
target task :A mergeable and untied task that ...

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D107686
2021-08-07 12:31:20 -04:00
Dimitry Andric 71ae2e0221 [libomptarget][amdgpu] don't declare Elf_Note on FreeBSD
On FreeBSD, the system `<libelf.h>` already declares `struct Elf_Note`
indirectly (via `<sys/elf_common.h>`). This results in compile errors
when building the libomptarget amdgpu plugin. Avoid redeclaring `struct
Elf_Note` on FreeBSD to fix the errors.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D107661
2021-08-06 21:45:26 +02:00
Shilei Tian 28939b6ae5 [NFC] Clean up and clang-format openmp/libomptarget/plugins/cuda/src/rtl.cpp 2021-08-05 22:32:28 -04:00
Shilei Tian 680c71b127 [OpenMP] Clean up for hidden helper task
This patch makes some clean up for code of hidden helper task.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D107008
2021-08-04 12:36:44 -04:00
Shilei Tian 9f5d6ea52e [OpenMP] Fix performance regression reported in bug #51235
This patch fixes the "performance regression" reported in https://bugs.llvm.org/show_bug.cgi?id=51235. In fact it has nothing to do with performance. The root cause is, the stolen task is not allowed to execute by another thread because by default it is tied task. Since hidden helper task will always be executed by hidden helper threads, it should be untied.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D107121
2021-08-04 12:34:49 -04:00
Lechen Yu 3bc8ce5dd7 [openmp] Add OMPT initialization in libomptarget
When loading libomptarget, the init function in libomptarget/src/rtl.cpp
will search for the libomptarget_start_tool function using libdl.
libomptarget_start_tool will pass those OMPT callbacks related to target
constructs to libomptarget

Differential Revision: https://reviews.llvm.org/D99803
2021-08-04 18:00:11 +02:00
AndreyChurbanov 8e29b4b323 [OpenMP] libomp: taskwait depend implementation fixed.
Fix for https://bugs.llvm.org/show_bug.cgi?id=49723.
Eliminated references from task dependency hash to node allocated on stack,
thus eliminated accesses to stale memory. So the node now never freed.
Uncommented assertion which triggered when stale memory accessed.
Removed unneeded ref count increment for stack allocated node.

Differential Revision: https://reviews.llvm.org/D106705
2021-08-03 15:45:20 +03:00
Jon Chesterfield 567c8c7bfd [libomptarget][nfc] Only set cuda-path for nvptx tests
Remove --cuda-path=CUDA_TOOLKIT_ROOT_DIR-NOTFOUND
from the invocation of non-nvptx test cases. Better signal
to noise ratio on other architectures.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D107074
2021-07-30 23:01:09 +01:00
Jose M Monsalve Diaz 5424ceeda0 [OpenMP] Fixing llvm-omp-device-info compilation with runtimes
When using `-DLLVM_ENABLED_RUNTIMES` instead of `-DLLVM_ENABLED_PROJECTS`
the `llvm-omp-device-info` tool is not compiled or installed.
In general, no llvm tool would be build on runtimes, because the
-DLLVM_BUILD_TOOLS flag is removed by the way runtimes compilation calls
cmake again.

This patch is simple. Just forward the value of this flag to the
runtime cmake command.

I'm also removing an unnecessary comment in the compilation of the tool

Differential Revision: https://reviews.llvm.org/D107177
2021-07-30 13:09:08 -05:00
Shilei Tian 36d53af4a9 [OpenMP][Offloading] Remove task wait in nowait interfaces
All `nowait` series of interfaces in `libomptarget` accept four more arguments (`int32_t depNum, void *depList, int32_t noAliasDepNum, void *noAliasDepList`) compared with their counterparts w/o `nowait`. These extra arguments were expected for dependence resolution, potentially lowered to device side. Current implementation calls `libomp` function `__kmpc_omp_taskwait`. However, the front end simply ignores them, that these four arguments are not emitted at all. As a consequence, the `depNum` and `noAliasDepNum` are garbage, which could lead to unnecessary task wait.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D107164
2021-07-30 11:39:46 -04:00
AndreyChurbanov 8b81524c6d [OpenMP][NFC] libomp: silence warnings on unused variables.
Put declarations/definitions of unused variables under corresponding macros
to silence clang build warnings.

Differential Revision: https://reviews.llvm.org/D106608
2021-07-30 17:04:42 +03:00
Joachim Protze 4ffa1478fd [libomptarget][amdcgn] Add build dependency for opt
This patch should fix the build we observe when building LLVM from scratch.

Differential Revision: https://reviews.llvm.org/D107156
2021-07-30 15:45:13 +02:00
Terry Wilmarth d8e4cb9121 [OpenMP] libomp: Add new experimental barrier: two-level distributed barrier
Two-level distributed barrier is a new experimental barrier designed
for Intel hardware that has better performance in some cases than the
default hyper barrier.

This barrier is designed to handle fine granularity parallelism where
barriers are used frequently with little compute and memory access
between barriers. There is no need to use it for codes with few
barriers and large granularity compute, or memory intensive
applications, as little difference will be seen between this barrier
and the default hyper barrier. This barrier is designed to work
optimally with a fixed number of threads, and has a significant setup
time, so should NOT be used in situations where the number of threads
in a team is varied frequently.

The two-level distributed barrier is off by default -- hyper barrier
is used by default. To use this barrier, you must set all barrier
patterns to use this type, because it will not work with other barrier
patterns. Thus, to turn it on, the following settings are required:

KMP_FORKJOIN_BARRIER_PATTERN=dist,dist
KMP_PLAIN_BARRIER_PATTERN=dist,dist
KMP_REDUCTION_BARRIER_PATTERN=dist,dist

Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER,
and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed
barrier.

Patch fixed for ITTNotify disabled builds and non-x86 builds

Co-authored-by: Jonathan Peyton <jonathan.l.peyton@intel.com>
Co-authored-by: Vladislav Vinogradov <vlad.vinogradov@intel.com>

Differential Revision: https://reviews.llvm.org/D103121
2021-07-29 14:09:26 -05:00
Joachim Protze 4acc2f29a2 [OpenMP][Tools][Tests][NFC] Address flaky archer tests
Adding more concurrent threads significantly increases the
chance that the data race can be observed during testing.
2021-07-29 17:56:44 +02:00
Jon Chesterfield a90da62adb [libomptarget][amdgpu] Update printed plugin name 2021-07-29 14:46:42 +01:00
Jose M Monsalve Diaz 88e66fa60a [OpenMP] Fixing missing variables when CUDA SDK not in system
This patch fixes the error reported in D106751. When there is no CUDA SDK
installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE`
variables.

Using @zsrkmyn sugested fix

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106933
2021-07-27 23:46:15 -05:00
Jose M Monsalve Diaz 313c523995 [OpenMP][Tool] Introducing the `llvm-omp-device-info` tool
This patch introduces the `llvm-omp-device-info` tool, which uses the
omptarget library and interface to query the device info from all the
available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo`

Since omptarget usually requires a description structure with executable
kernels, I split the initialization of the RTLs and Devices to be able to
initialize all possible devices and query each of them.

This revision relies on the patch that introduces the print device info.

A limitation is that the order in which the devices are initialized, and the
corresponding device ID is not necesarily the one seen by OpenMP.

The changes are as follows:
1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function
2. Create an `initAllRTLs` method that initializes all available RTLs at runtime
3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information.

Example Output:
```
Device (0):
    print_device_info not implemented

Device (1):
    print_device_info not implemented

Device (2):
    print_device_info not implemented

Device (3):
    print_device_info not implemented

Device (4):
    CUDA Driver Version:                11000
    CUDA Device Number:                 0
    Device Name:                        Quadro P1000
    Global Memory Size:                 4236312576 bytes
    Number of Multiprocessors:          5
    Concurrent Copy and Execution:      Yes
    Total Constant Memory:              65536 bytes
    Max Shared Memory per Block:        49152 bytes
    Registers per Block:                65536
    Warp Size:                          32 Threads
    Maximum Threads per Block:          1024
    Maximum Block Dimensions:           1024, 1024, 64
    Maximum Grid Dimensions:            2147483647 x 65535 x 65535
    Maximum Memory Pitch:               2147483647 bytes
    Texture Alignment:                  512 bytes
    Clock Rate:                         1480500 kHz
    Execution Timeout:                  Yes
    Integrated Device:                  No
    Can Map Host Memory:                Yes
    Compute Mode:                       DEFAULT
    Concurrent Kernels:                 Yes
    ECC Enabled:                        No
    Memory Clock Rate:                  2505000 kHz
    Memory Bus Width:                   128 bits
    L2 Cache Size:                      1048576 bytes
    Max Threads Per SMP:                2048
    Async Engines:                      Yes (2)
    Unified Addressing:                 Yes
    Managed Memory:                     Yes
    Concurrent Managed Memory:          Yes
    Preemption Supported:               Yes
    Cooperative Launch:                 Yes
    Multi-Device Boars:                 No
    Compute Capabilities:               61
```

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D106752
2021-07-27 22:38:35 -04:00
Jose M Monsalve Diaz d2f85d0910 [OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget`
This patch introduces a function in the device's plugin to print the
device information. This patch relates to another patch that introduces
a CLI tool to obtain the device information from the omplibrary directly.
It is inspired by PGI's pgaccelinfo.

The modifications are as follows:
1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL.
2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented
3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy`
4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106751
2021-07-27 21:47:57 -04:00
Jose M Monsalve Diaz 5ab6aedda9 [OpenMP] Folding threadLimit and numThreads when single value in kernels
The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block`
and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.

In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and
`NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions.
The code checks all the kernels, and if their attributes match, the functions are folded.

In the future we will explore specializing for multiple values of NumThreads and NumTeams.

Depends on D106390

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D106033
2021-07-27 21:47:12 -04:00
Johannes Doerfert ed7ec860f0 [OpenMP] Improve alignment handling in the new device runtime 2021-07-27 17:50:27 -05:00
Joseph Huber e3ee76245e [Libomptarget] Revert new variable sharing to use the old method
The new method of sharing variables introduces a `__kmpc_alloc_shared` call
that cannot be removed in the middle end because of its non-constant argument
and unconnected free. This patch reverts this to the old method that used a
static amount of shared memory for sharing variables.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106905
2021-07-27 18:14:01 -04:00
Joachim Protze e32e1dae61 [OpenMP][Tests] Fix test compatibility
gcc and clang disagree in how the event handle needs to be handled.
According to OpenMP LC, gcc is right. Will open clang bug report
2021-07-28 00:08:32 +02:00
Joachim Protze 3c76e99291 [OpenMP] Fix deadlock for detachable task with child tasks
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=49066.

For detachable tasks, the assumption breaks that the proxy task cannot have
remaining child tasks when the proxy completes.
In stead of increment/decrement the incomplete task count, a high-order bit
is flipped to mark and wait for the incomplete proxy task.

Differential Revision: https://reviews.llvm.org/D101082
2021-07-28 00:01:35 +02:00
Vignesh Balasubramanian 23eced9ead Convert the error to warning for enabling OMPD in non-Linux platform
OMPD is enabled by default on Linux machines and disabled on others.
However, if explicitly enabled it throws an error and exit while configuring.

It is mentioned in Bug: https://bugs.llvm.org/show_bug.cgi?id=51121

This patch, instead of throwing error, disables OMPD support with a warning message,
so configuration can continue.

Reviewed By: @protze.joachim
Differential Revision: https://reviews.llvm.org/D106682
2021-07-27 17:25:27 +05:30
Johannes Doerfert 67ab875ff5 [OpenMP] Prototype opt-in new GPU device RTL
The "old" OpenMP GPU device runtime (D14254) has served us well for many
years but modernizing it has caused some pain recently. This patch
introduces an alternative which is mostly written from scratch embracing
OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual
interfaces. This new runtime is opt-in through a clang flag (D106793).
The new runtime is currently only build for nvptx and has "-new" in its
name.

The design is tailored towards middle-end optimizations rather than
front-end code generation choices, a trend we already started in the old
runtime a while back. In contrast to the old one, state is organized in
a simple manner rather than a "smart" one. While this can induce costs
it helps optimizations. Our expectation is that the majority of codes
can be optimized and a "simple" design is therefore preferable. The new
runtime does also avoid users to pay for things they do not use,
especially wrt. memory. The unlikely case of nested parallelism is
supported but costly to make the more likely case use less resources.

The worksharing and reduction implementation have been taken from the
old runtime and will be rewritten in the future if necessary.

Documentation and debug features are still mostly missing and will be
added over time.

All external symbols start with `__kmpc` for legacy reasons but should
be renamed once we switch over to a single runtime. All internal symbols
are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name
clashes with user symbols.

Differential Revision: https://reviews.llvm.org/D106803
2021-07-27 00:56:05 -05:00
Shilei Tian e97e0a4fad [AbstractAttributor] Fold __kmpc_parallel_level if possible
Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible.
Note that `__kmpc_parallel_level` doesn't take activeness into consideration,
based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead
of 0, 129, 130, etc. that also indicate activeness.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106154
2021-07-26 22:46:19 -04:00
Joseph Huber dead50d442 [OpenMP][NFC] Fix a few typos in OpenMP documentation
Summary:
Fixes some typos in the OpenMP documentation.
2021-07-26 16:03:47 -04:00
Jon Chesterfield 2a613a7790 [libomptarget] Build amdgpu plugin without hsa
Default to building the amdgpu plugin to use dlopen when hsa is
not found instead of disabling it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106600
2021-07-26 09:54:51 +01:00
Jon Chesterfield 93fe84d32f [libomptarget][nfc] Squash unused variable warning
Suppress only current warning on openmp-clang-x86_64-linux-debian

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106777
2021-07-26 09:54:31 +01:00
Jon Chesterfield dd0b463dd9 [libomptarget][amdgpu] More robust handling of failure to init HSA
If hsa_init fails, subsequent calls into hsa are not safe. Except for
hsa_init, but we don't retry on failure.

This patch:
- deletes a print that called into hsa to ask why it can't call into hsa
- drops a merge conflict block next to that print
- reliably initializes number of devices to zero
- skips the plugin destructor contents if the constructor failed to init hsa

Tested by making hsa_init return error, and by forcing the dynamic library
use which was then deleted from disk. Before this patch, both segv. After it,
friendly message about offloading being unavailable.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106774
2021-07-25 23:15:58 +01:00
Jon Chesterfield e3251f2ec4 Revert "[libomptarget] Build amdgpu plugin without hsa"
Inaccurate error handling around hsa_init

This reverts commit e30b3b23a4.
2021-07-25 21:03:51 +01:00
Jon Chesterfield e30b3b23a4 [libomptarget] Build amdgpu plugin without hsa
Default to building the amdgpu plugin to use dlopen when hsa is
not found instead of disabling it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106600
2021-07-25 19:33:36 +01:00
Joachim Protze c46ccb8538 [OpenMP][tests][NFC] Update test status for gcc 11 and 12
gcc 11 introduced support for depend clause, but the gomp interface of libomp
does not yet handle the information.
Also remove -fopenmp-version=50, which is no longer needed for clang, but not
supported by gcc.
2021-07-25 18:56:36 +02:00
Shilei Tian f1b8fa55d0 [OpenMP][NVPTX] Disable OpenMPOpt when building deviceRTLs
We build `deviceRTLs` with `-O1` by default, which also triggers OpenMPOpt. When
the info cache is created, some attributes are removed. As a result, although we
mark a few functions `noinline`, they are still inlined when the bitcode library
is generated. This can cause an issue in middle end optimization.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106710
2021-07-25 10:38:27 -04:00
Ye Luo 4079037a3e [OpenMP] always compile with c++14 instead of gnu++14
Fixes PR 51174. c++14 should be a more portable option than gnu++14.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D106632
2021-07-23 17:29:08 -04:00
Shilei Tian c2c43132f6 [OpenMP] Fix bug 50022
Bug 50022 [0] reports target nowait fails in certain case, which is added in this
patch. The root cause of the failure is, when the second task is created, its
parent's `td_incomplete_child_tasks` will not be incremented because there is no
parallel region here thus its team is serialized. Therefore, when the initial
thread is waiting for its unfinished children tasks, it thought there is only
one, the first task, because it is hidden helper task, so it is tracked. The
second task will only be pushed to the queue when the first task is finished.
However, when the first task finishes, it first decrements the counter of its
parent, and then release dependences. Once the counter is decremented, the thread
will move on because its counter is reset, but actually, the second task has not
been executed at all. As a result, since in this case, the main function finishes,
then `libomp` starts to destroy. When the second task is pushed somewhere, all
some of the structures might already have already been destroyed, then anything
could happen.

This patch simply moves `__kmp_release_deps` ahead of decrement of the counter.
In this way, we can make sure that the initial thread is aware of the existence
of another task(s) so it will not move on. In addition, in order to tackle
dependence chain starting with hidden helper thread, when hidden helper task is
encountered, we force the task to release dependences.

Reference:
[0] https://bugs.llvm.org/show_bug.cgi?id=50022

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D106519
2021-07-23 16:54:11 -04:00
Joseph Huber e1dedecaa6 [Libomptarget] Add unroll flag to shared variables loop
Unrolling this loop provides better performance in practice because it is
executed on the device and is likely to be very small.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D106692
2021-07-23 16:45:27 -04:00
Shilei Tian 18ce3d3f2c [OpenMP][Offloading] Fix data race in data mapping by using two locks
This patch tries to partially fix one of the two data race issues reported in
[1] by introducing a per-entry mutex. Additional discussion can also be found in
D104418, which will also be refined to fix another data race problem.

Here is how it works. Like before, `DataMapMtx` is still being used for mapping
table lookup and update. In any case, we will get a table entry. If we need to
make a data transfer (update the data on the device), we need to lock the entry
right before releasing `DataMapMtx`, and the issue of data transfer should be
after releasing `DataMapMtx`, and the entry is unlocked afterwards. This can
guarantee that: 1) issue of data movement is not in critical region, which will
not affect performance too much, and also will not affect other threads that don't
touch the same entry; 2) if another thread accesses the same entry, the state of
data movement is consistent (which requires that a thread must first get the
update lock before getting data movement information).

For a target that doesn't support async data transfer, issue of data movement is
data transfer. This two-lock design can potentially improve concurrency compared
with the design that guards data movement with `DataMapMtx` as well. For a target
that supports async data movement, we could simply attach the event between the
issue of data movement and unlock the entry. For a thread that wants to get the
event, it must first get the lock. This can also get rid of the busy wait until
the event pointer is valid.

Reference:
[1] https://bugs.llvm.org/show_bug.cgi?id=49940

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D104555
2021-07-23 16:10:51 -04:00
Abhinav Gaba f7c92995c0 [OpenMP] Fix CUDA plugin build after 3817ba13ae.
The build was broken on machines that don't have Cuda SDK installed.

See https://reviews.llvm.org/D106627 for the original discussion.
2021-07-23 16:50:00 +08:00
Johannes Doerfert d12ee28e2e [OpenMP] Simplify the ThreadStackTy for globalization fallback
With D106496 we can make the globalization fallback stack much simpler
and this version doesn't seem to experience the spurious failures and
deadlocks we have seen before.

Differential Revision: https://reviews.llvm.org/D106576
2021-07-22 23:57:46 -05:00
Joseph Huber 76c0c0ca86 [OpenMP][NFC] Fix formatting in CUDA plugin 2021-07-22 21:50:40 -04:00
Joseph Huber 3817ba13ae [OpenMP] Add environment variables to change stack / heap size in the CUDA plugin
This patch adds support for two environment variables to configure the device.
``LIBOMPTARGET_STACK_SIZE`` sets the amount of memory in bytes that each thread
has for its stack. ``LIBOMPTARGET_HEAP_SIZE`` sets the amount of heap memory
that can be allocated using malloc / free on the device.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106627
2021-07-22 21:40:02 -04:00
Shilei Tian ea452353c0 [OpenMP] Refined the logic to give a regular task from a hidden helper task
In current implementation, if a regular task depends on a hidden helper task,
and when the hidden helper task is releasing its dependences, it directly calls
`__kmp_omp_task`. This could cause a problem that if `__kmp_push_task` returns
`TASK_NOT_PUSHED`, the task will be executed immediately. However, the hidden
helper threads are assumed to only execute hidden helper tasks. This could cause
problems because when calling `__kmp_omp_task`, the encountering gtid, which is
not the real one of the thread, is passed.

This patch uses `__kmp_give_task`, but because it is a static function, a new
wrapper `__kmpc_give_task` is added.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D106572
2021-07-22 19:21:29 -04:00
Jose M Monsalve Diaz 68d6278a6e [OpenMP] Renaming RT functions `GetNumberOfBlocksInKernel` and `GetNumberOfThreadsInBlock`
These functions should follow the camel case convention. These are really easy to change
and are needed for D106033.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D106390
2021-07-22 18:17:49 -04:00
Jon Chesterfield 9e05c084e5 [libomptarget][amdgpu][nfc] Normalise license headers
Reviewed By: gregrodgers, jdoerfert

Differential Revision: https://reviews.llvm.org/D106581
2021-07-22 20:23:41 +01:00
Jon Chesterfield 14e34a83b0 [libomptarget][amdgpu][nfc] Replace use of gelf.h with libelf.h
AMDGPU can assume Elf64 so doesn't need to abstract over Elf32

Drop a few other unused headers at the same time. Now only llvm elf
and libelf are used by the plugin.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106579
2021-07-22 20:04:13 +01:00
Jon Chesterfield 1a96570621 [libomptarget][amdgpu] Implement dlopen of libhsa
AMDGPU plugin equivalent of D95155, build without HSA installed locally

Compiles a new file, plugins/amdgpu/dynamic_hsa/hsa.cpp, to an object file that
exposes the same symbols that the plugin presently uses from hsa. The object
file contains dlopen of hsa and cached dlsym calls. Also provides header files
corresponding to the subset that is used.

This is behind a feature flag, LIBOMPTARGET_FORCE_DLOPEN_LIBHSA, default off.
That allows developers to build against the dlopen/dlsym implementation, e.g.
while testing this mode.

Enabling by default will cause this plugin to build on a wider variety of
machines than it does at present so may break some CI builds. That risk can
be minimised by reviewing the header dependencies of the library and ensuring
it doesn't use any libraries that are not already used by libomptarget.

Separating the implementation from enabling by default in case the latter needs
to be rolled back after wider CI results.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106559
2021-07-22 16:54:10 +01:00
Jon Chesterfield 6e9cd3e9f1 [libomptarget][nfc] Improve static assert message in dlwrap
Revision of D102858. Raise dlwrap arity argument to template argument
so the correct value is given in the error message. E.g. '2 == 1' instead of
'2 == trait<>::nargs'.

Arity higher than it should be:
Before diff
```
$/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: error:
      static_assert failed due to requirement '2 == trait<cudaError_enum (*)(unsigned int)>::nargs'
      "Arity Error"
DLWRAP_INTERNAL(cuInit, 2);
^~~~~~~~~~~~~~~~~~~~~~~~~~
...
$/include/dlwrap.h:166:3: note: expanded from macro
      'DLWRAP_COMMON'
  static_assert(ARITY == trait<decltype(&SYMBOL)>::nargs, "Arity Error");      \
```

After diff
In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16:
```
$/include/dlwrap.h:131:3: error: static_assert failed due to
      requirement '2UL == 1UL' "Arity Error"
  static_assert(Requested == Required, "Arity Error");
  ^             ~~~~~~~~~~~~~~~~~~~~~
$/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in
      instantiation of function template specialization 'dlwrap::verboseAssert<2UL, 1UL>' requested
      here
DLWRAP_INTERNAL(cuInit, 2);
```

Arity lower than it should be:
Before diff
```
$/plugins/cuda/dynamic_cuda/cuda.cpp:131:10: error: no
      matching function for call to 'dlwrap_cuInit'
  return dlwrap_cuInit(X);
         ^~~~~~~~~~~~~
$/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: candidate
      function not viable: requires 0 arguments, but 1 was provided
DLWRAP_INTERNAL(cuInit, 0);
```

After diff
In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16:
```
$/include/dlwrap.h:131:3: error: static_assert failed due to
      requirement '0UL == 1UL' "Arity Error"
  static_assert(Requested == Required, "Arity Error");
  ^             ~~~~~~~~~~~~~~~~~~~~~
$/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in
      instantiation of function template specialization 'dlwrap::verboseAssert<0UL, 1UL>' requested
      here
DLWRAP_INTERNAL(cuInit, 0);
```

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106543
2021-07-22 15:24:20 +01:00
Joseph Huber a158d3663f [OpenMP] Fix warnings for uninitialized block counts
Summary:
Fixes some warning given for uninitialized block counts if the exection mode is
not recognized. This shouldn't happen in practice because the execution mode is
checked when it's read from the device.
2021-07-22 09:24:07 -04:00
Jon Chesterfield dc1f6f8b92 [libomptarget][amdgpu][nfc] Drop dead signal pool setup
This class is instantiated once in rtl.cpp before hsa_init is
called. The hsa_signal_create call therefore fails leaving the pool empty.

This signal pool is a legacy from ATMI where it was constructed after hsa_init.
Moving the state into the rtl.cpp global class disabled the initial populating
of the pool without noticeably changing performance. Just rechecked with a fix
that allocates the signals after hsa_init and that also doesn't noticeably
change performance.

This patch therefore drops the initialisation. Only change from main is to
drop a DEBUG_PRINT statement that would say the pool initial size is zero.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106515
2021-07-22 10:29:32 +01:00
Joseph Huber 4a66860424 [OpenMP] Add an option to disable function internalization
Function internalization can sometimes occur in situations where we want to
keep the call sites intact. This patch adds an option to disable function
internalization and prevents the device runtime from being internalized while
creating the bitcode library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106438
2021-07-21 21:18:18 -04:00
Joseph Huber 1684012a47 [Libomptarget] Introduce new main thread ID runtime function
This patch introduces `__kmpc_is_generic_main_thread_id` which splits the old
comparison into its own runtime function. The purpose of this is so we can fold
this part independently, so when both this and `is_spmd_mode` are folded the
final function will be folded as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106437
2021-07-21 21:18:14 -04:00
Joseph Huber 7d57639264 [OpenMP] Add new execution mode for SPMD execution with Generic semantics
Qualified kernels can be transformed from generic-mode to SPMD mode using an
optimization in OpenMPOpt. This patch introduces a new execution mode to
indicate kernels that have been transformed from generic-mode to SPMD-mode.
These kernels have SPMD-mode execution, but need generic-mode semantics for
scheduling the blocks and threads. Without this far too few blocks will be
scheduled for a generic region as SPMD mode expects the trip count to be
divided by the number of threads.

Reviewed By: ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D106460
2021-07-21 20:57:28 -04:00
Joseph Huber 754eb1c210 [OpenMP] Change `__kmpc_free_shared` to include the paired allocation size
This patch changes `__kmpc_free_shared` to take an additional argument
corresponding to the associated allocation's size. This makes it easier to
implement the allocator in the runtime.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106496
2021-07-21 20:56:21 -04:00
Giorgis Georgakoudis 5a682d9b91 [OpenMP] Expose libomptarget function to get HW thread id
The patch exposes the libomptarget runtime function that gets the hardware thread id through the kmpc API. This is to be used in SPMDization for checking the thread id to execute regions by a single thread in a block.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106323
2021-07-21 10:26:04 -07:00
Jon Chesterfield a733bbbd17 [libomptarget][amdgpu][nfc] Refactor #includes
Create a hsa_api.h header that includes the ROCr headers in use
Drop some unused headers and _cplusplus macros

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106455
2021-07-21 17:28:07 +01:00
Shilei Tian 55c65884a4 [OpenMP][deviceRTLs] Update return type of function __kmpc_parallel_level
In `deviceRTLs`, the parallel level is stored in a shared variable of type `uint8_t`.
`__kmpc_parallel_level` currently returns a 16-bit interger. This patch first
changes the return type of the function to `uint8_t`, same as the shared variable,
and then corrects function type which was updated in D105955.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106384
2021-07-20 15:45:43 -04:00
Shilei Tian 02dff78983 [NFC][OpenMP] Fix an issue that no CHECK in test cases
This fixes the complaint from FileCheck.

Reviewed By: abhinavgaba, jdoerfert

Differential Revision: https://reviews.llvm.org/D106387
2021-07-20 15:39:18 -04:00
Joseph Huber b917a1d713 [OpenMP] Change AMDGCN to AMDGPU in the Cmake Module
Summary:
Change the name for targeting AMD offloading.
2021-07-20 12:52:53 -04:00
Joseph Huber 6242f9b966 [OpenMP][Documentation] Fix hyperlink location
Fixes the documentation hyperlinks not showing the header.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106374
2021-07-20 12:42:32 -04:00
Tony Tye 038602139d [NFC] Correct documentation error in OpenMP release ReleaseNotes
Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D106330
2021-07-20 02:04:43 +00:00
Shilei Tian 996baa58a4 [OpenMP] Fixed a segmentation fault when using taskloop and target nowait
The synchronization of task loop misses hidden helper tasks, causing segmentation
fault reported in https://bugs.llvm.org/show_bug.cgi?id=50002.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D106220
2021-07-19 21:09:05 -04:00
Joseph Huber 762badb0ab [Libomptarget] Remove volatile from NVPTX work function
Currently the NPVTX work function is marked volatile. This prevents some
optimizations from using this value.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106310
2021-07-19 20:03:25 -04:00
Giorgis Georgakoudis fb0cf01795 Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit e9c7291cb2.

Fix failing tests
2021-07-19 07:54:26 -07:00
Shilei Tian 4504e1134c [OpenMP][CMake] Fix an issue when there is space in the argument LIBOMPTARGET_LIT_ARGS
D106236 added a new CMake argument for `libomptarget` test, but when user's
input contains white spaces, CMake will add escape char to the final lit command,
which leads to an error. This patch converts the user's input `LIBOMPTARGET_LIT_ARGS`
into a local array, and then passes the array to the function.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D106247
2021-07-18 21:54:14 -04:00
Shilei Tian 954711ed8f [OpenMP][Offloading] Add a CMake argument LIBOMPTARGET_LIT_ARGS to control behavior of libomptarget lit test
By default, `lit` uses all threads to invoke tests, which  can easily cause out
of memory on GPUs because most of OpenMP offloading test usually take about 1GB
GPU memory, but a typical GPU only has 4-8GB memory. This patch introduce a
CMake argument `LIBOMPTARGET_LIT_ARGS` to allow users to control the behavior of
`libomptarget` tests, similar to `LLVM_LIT_ARGS`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D106236
2021-07-18 13:16:10 -04:00
Shilei Tian 4357cfc792 [OpenMP][Offloading] Add -g when compiling deviceRTLs in debug mode
Currently when we compile the project in debug mode, `-g` will not be added to
compilation flag. The bc files generated in different mode are of different size.
When using GPU debuggers like `cuda-gdb`, it is expected to provide more info
with a debug version of bc lib.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D106229
2021-07-18 09:34:54 -04:00
Giorgis Georgakoudis e9c7291cb2 [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102107
2021-07-16 23:27:44 -07:00
Joseph Huber 1616407921 [OpenMP] Add remark documentation to the OpenMP webpage
This patch begins adding documentation for each remark emitted by
`openmp-opt`. This builds on the IDs introduced in D105939 so that users
can more easily identify each remark in the webpage.

Depends on D105939.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106018
2021-07-16 14:09:43 -04:00
Shilei Tian 97c8f60bba [NFC][OpenMP][Offloading] Replaced explicit parallel level computation with function `__kmpc_parallel_level`
There are two places in current deviceRTLs where it computes parallel level explicitly,
which is basically the functionality of `__kmpc_parallel_level`. Starting from
D105787, we plan to introduce a series of function call folding based on information
that can be deducted during compilation time. Computation of parallel level is
the next target. This patch makes steps for the optimization.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105955
2021-07-15 22:21:06 -04:00
George Rokos 0c7a4870c5 [libomptarget] Keep the Shadow Pointer Map up-to-date
D105812 introduced a regression where if a PTR_AND_OBJ entry was mapped on the device, then the OBJ was deallocated and then reallocated at a different address, the Shadow Pointer Map would still contain an entry for the PTR but pointing to the old address. This caused test `env/base_ptr_ref_count.c` to fail.

Differential Revision: https://reviews.llvm.org/D105947
2021-07-14 15:19:58 -07:00
Peyton, Jonathan L 424f14f0d2 [OpenMP] Fix one sign-compare warning from GCC 2021-07-13 12:36:12 -05:00
Peyton, Jonathan L 405eefe464 [OpenMP][NFC] Change comment style to eliminate warnings from GCC
Standalone build for OpenMP runtime using GCC is giving -Wcomment
warnings where a backslash newline is encountered in the // style
comment. This switches the // style for /* style to silence the
warnings.
2021-07-13 12:27:08 -05:00
Hansang Bae db635a28e6 [OpenMP] Minor improvement in task allocation
This patch includes a few changes to improve task allocation
performance slightly. These changes are enough to restore performance
drop observed after introducing hidden helper.

Differential Revision: https://reviews.llvm.org/D105715
2021-07-13 09:07:14 -05:00
Roman Lebedev 4709d9d5be
[libomp] ompd_init(): fix heap-buffer-overflow when constructing libompd.so path
There is no guarantee that the space allocated in `libname`
is enough to accomodate the whole `dl_info.dli_fname`,
because it could e.g. have an suffix  - `.5`,
and that highlights another problem - what it should do about suffxies,
and should it do anything to resolve the symlinks before changing the filename?

```
$ LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"  ./src/utilities/rstest/rstest -c /tmp/f49137920.NEF
dl_info.dli_fname "/usr/local/lib/libomp.so.5"
strlen(dl_info.dli_fname) 26
lib_path_length 14
lib_path_length + 12 26
=================================================================
==30949==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000002a at pc 0x000000548648 bp 0x7ffdfa0aa780 sp 0x7ffdfa0a9f40
WRITE of size 27 at 0x60300000002a thread T0
    #0 0x548647 in strcpy (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x548647)
    #1 0x7fb9e3e3d234 in ompd_init() /repositories/llvm-project/openmp/runtime/src/ompd-specific.cpp:102:5
    #2 0x7fb9e3dcb446 in __kmp_do_serial_initialize() /repositories/llvm-project/openmp/runtime/src/kmp_runtime.cpp:6742:3
    #3 0x7fb9e3dcb40b in __kmp_get_global_thread_id_reg /repositories/llvm-project/openmp/runtime/src/kmp_runtime.cpp:251:7
    #4 0x59e035 in main /home/lebedevri/rawspeed/build-Clang-SANITIZE/../src/utilities/rstest/rstest.cpp:491
    #5 0x7fb9e3762d09 in __libc_start_main csu/../csu/libc-start.c:308:16
    #6 0x4df449 in _start (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x4df449)

0x60300000002a is located 0 bytes to the right of 26-byte region [0x603000000010,0x60300000002a)
allocated by thread T0 here:
    #0 0x55cc5d in malloc (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x55cc5d)
    #1 0x7fb9e3e3d224 in ompd_init() /repositories/llvm-project/openmp/runtime/src/ompd-specific.cpp:101:17
    #2 0x7fb9e3762d09 in __libc_start_main csu/../csu/libc-start.c:308:16

SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x548647) in strcpy
Shadow bytes around the buggy address:
  0x0c067fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c067fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c067fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c067fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c067fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c067fff8000: fa fa 00 00 00[02]fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==30949==ABORTING
Aborted
```
2021-07-13 15:36:46 +03:00
George Rokos bb0166dc72 [libomptarget] Update device pointer only if needed
Currently, libomptarget will always perform a host-to-device memory transfer in
order to update the device pointer of a PTR_AND_OBJ entry. This is not always
necessary because the device pointer may have been set to the correct pointee
address already, so we can eliminate the redundant memory transfer.
2021-07-13 04:18:55 -07:00
Jon Chesterfield b6b53ffef4 [libomptarget][devicertl] Remove branches around setting parallelLevel
Simplifies control flow to allow store/load forwarding

This change folds two basic blocks into one, leaving a single store to parallelLevel.
This is a step towards spmd kernels with sufficiently aggressive inlining folding
the loads from parallelLevel and thus discarding the nested parallel handling
when it is unused.

Transform:
```
int threadId = GetThreadIdInBlock();
if (threadId == 0) {
  parallelLevel[0] = expr;
} else if (GetLaneId() == 0) {
  parallelLevel[GetWarpId()] = expr;
}
// =>
if (GetLaneId() == 0) {
  parallelLevel[GetWarpId()] = expr;
}
// because
unsigned GetLaneId() { return GetThreadIdInBlock() & (WARPSIZE - 1);}
// so whenever threadId == 0, GetLaneId() is also 0.
```

That replaces a store in two distinct basic blocks with as single store.

A more aggressive follow up is possible if the threads in the warp/wave
race to write the same value to the same address. This is not done as
part of this change.

```
if (GetLaneId() == 0) {
  parallelLevel[GetWarpId()] = expr;
}
// =>
parallelLevel[GetWarpId()] = expr;
// because
unsigned GetWarpId() { return GetThreadIdInBlock() / WARPSIZE; }
// so GetWarpId will index the same element for every thread in the warp
// and, because expr is lane-invariant in this case, every lane stores the
// same value to this unique address
```

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D105699
2021-07-13 12:06:57 +01:00
Joachim Protze 681055ea69 [OpenMP] Remove TSAN annotations from libomp
The annotations in libomp were never built by default. The annotations are
also superseded by the annotations which the OMPT tool libarcher.so provides.
With respect to libarcher, libomp behaves as if libarcher would be the last
element of OMP_TOOL_LIBARARIES. I.e., if no other OMPT tool gets active,
libarcher will check if an OpenMP application is built with TSan.

Since libarcher gets loaded by default, enabling LIBOMP_TSAN_SUPPORT would
result in redundant annotations for TSan, which slightly differ in details
and coverage (e.g. task dependencies are not handled well by the annotations
in libomp).

This patch removes all TSan annotations from the OpenMP runtime code.

Differential Revision: https://reviews.llvm.org/D103767
2021-07-12 18:49:11 +02:00
Joachim Protze fedbff75f4 [OpenMP][OMPT] Fix compile-time assertion in ompt-multiplex.h
The compile-time assertion is supposed to prevent double-free caused by
unexpected combination of preprocessor defines passed by an OMPT tool.
The current defines are not used, so this patch replaces the check with
macros actually used in ompt-multiplex.h

Reported by: Semih Burak

Differential Revision: https://reviews.llvm.org/D104633
2021-07-12 12:12:09 +02:00
Johannes Doerfert a7b7b5dfe5 [OpenMP] Create and use `__kmpc_is_generic_main_thread`
In order to fold calls based on high-level knowledge and control flow
tracking it helps to expose the information as a runtime call. The
logic: `!SPMD && getTID() == getMasterTID()` was used in various places
and is now encapsulated in `__kmpc_is_generic_main_thread`. As part of
this rewrite we replaced eager computation of arguments with on-demand
computation, especially helpful if the calls can be folded and arguments
don't need to be computed consequently.

Differential Revision: https://reviews.llvm.org/D105768
2021-07-11 19:18:03 -05:00
Johannes Doerfert 1ab1f04a2b [OpenMP] Simplify variable sharing and increase shared memory size
In order to avoid malloc/free, up to NUM_SHARED_VARIABLES_IN_SHARED_MEM
(=64) variables are communicated in dedicated shared memory instead. The
simplification does avoid the need for an "init" and requires "deinit"
only if we ever communicate more than NUM_SHARED_VARIABLES_IN_SHARED_MEM
variables.

Differential Revision: https://reviews.llvm.org/D105767
2021-07-11 19:18:03 -05:00
Johannes Doerfert 0a223827de [OpenMP] Remove checkXXXX device runtime functions
We had multiple functions to determine the execution mode (SPMD/Generic)
and runtime status (initialized/uninitialized) but that just increased
complexity without a real benefit. Especially with D102307 in mind it
is helpful to reduce the dependence on the `ident_t` flags.

Differential Revision: https://reviews.llvm.org/D105586
2021-07-10 18:20:40 -05:00
Johannes Doerfert e2cfbfcc0c [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 17:53:56 -05:00
Nico Weber d3e7491333 Revert Attributor patch series
Broke check-clang, see https://reviews.llvm.org/D102307#2869065
Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`
2021-07-10 16:15:55 -04:00
Johannes Doerfert e603ca0306 [OpenMP] Remove checkXXXX device runtime functions
We had multiple functions to determine the execution mode (SPMD/Generic)
and runtime status (initialized/uninitialized) but that just increased
complexity without a real benefit. Especially with D102307 in mind it
is helpful to reduce the dependence on the `ident_t` flags.

Differential Revision: https://reviews.llvm.org/D105586
2021-07-10 12:32:51 -05:00
Johannes Doerfert 1d5711c3ee [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 12:32:50 -05:00
Joel E. Denny d99f65de2a [OpenMP] Avoid checking parent reference count in targetDataBegin
This patch is an attempt to do for `targetDataBegin` what D104924 does
for `targetDataEnd`:

* Eliminates a lock/unlock of the data mapping table.
* Clarifies the logic that determines whether a struct member's
  host-to-device transfer occurs.  The old logic, which checks the
  parent struct's reference count, is a leftover from back when we had
  a different map interface (as pointed out at
  <https://reviews.llvm.org/D104924#2846972>).

Additionally, it eliminates the `DeviceTy::getMapEntryRefCnt`, which
is no longer used after this patch.

While D104924 does not change the computation of `IsLast`, I found I
needed to change the computation of `IsNew` for this patch.  As far as
I can tell, the change is correct, and this patch does not cause any
additional `openmp` tests to fail.  However, I'm not sure I've thought
of all use cases.  Please advise.

Reviewed By: jdoerfert, jhuber6, protze.joachim, tianshilei1992, grokos, RaviNarayanaswamy

Differential Revision: https://reviews.llvm.org/D105121
2021-07-10 12:15:04 -04:00
Joel E. Denny 1d0456361a [OpenMP] Avoid checking parent reference count in targetDataEnd
The patch has the following benefits:

* Eliminates a lock/unlock of the data mapping table.
* Clarifies the logic that determines whether a struct member's
  device-to-host transfer occurs.  The old logic, which checks the
  parent struct's reference count, is a leftover from back when we had
  a different map interface (as pointed out at
  <https://reviews.llvm.org/D104924#2846972>).

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D104924
2021-07-10 12:15:04 -04:00
Alexey Bataev ab8989ab87 [OPENMP]Fix overlapped mapping for dereferenced pointer members.
If the base is used in a map clause and later we have a memberexpr with
this base, and the member is a pointer, and this pointer is dereferenced
anyhow (subscript, array section, dereference, etc.), such components
should be considered as overlapped, otherwise it may lead to incorrect
size computations, since we try to map a pointee as a part of the whole
struct, which is not true for the pointer members.

Differential Revision: https://reviews.llvm.org/D105562
2021-07-09 12:51:26 -07:00
Michał Górny 2b0d95fb58 [openmp] [test] Add missing <limits> include to capacity_nthreads
Differential Revision: https://reviews.llvm.org/D105474
2021-07-06 20:39:53 +02:00
Jon Chesterfield ddfb074a80 [libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global
[libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global

Folds some duplicates logic into a helper function, passes the new environment
struct into getLaunchVals which no longer reads the DeviceInfo global.

Implemented on top of D105237

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D105239
2021-07-06 17:06:38 +01:00
Atmn Patel 21e92612c0 [Libomptarget] Experimental Remote Plugin Fixes
D97883 introduced a compile-time error in the experimental remote offloading
libomptarget plugin, this patch fixes it and resolves a number of
inconsistencies in the plugin as well:

1. Non-functional Asynchronous API
2. Unnecessarily verbose debug printing
3. Misc. code clean ups

This is not intended to make any functional changes to the plugin.

Differential Revision: https://reviews.llvm.org/D105325
2021-07-02 12:38:34 -04:00
Hansang Bae f1b9ce2736 [OpenMP] Fix a few issues with hidden helper task
This patch includes the following changes to address a few issues when
using hidden helper task.

- Assertion is triggered when there are inadvertent calls to hidden
  helper functions on non-Linux OS
- Added deinit code in __kmp_internal_end_library function to fix random
  shutdown crashes
- Moved task data access into the lock-guarded region in __kmp_push_task

Differential Revision: https://reviews.llvm.org/D105308
2021-07-01 17:10:32 -05:00
Shilei Tian 369216ab31 [OpenMP][Offloading] Refined return value of `DeviceTy::getOrAllocTgtPtr`
`DeviceTy::getOrAllocTgtPtr` just returns a target pointer. In addition,
two bool values (`IsNew` and `IsHostPtr`) are passed by reference to make the
change in the function available in callee.

In this patch, a struct, which contains the target pointer, two flags, and an
iterator to the map table entry corresponding to the queried host pointer, will
be returned. In addition to make the logic clearer regarding the two bool values,
this paves the way for the next patch to fix the data race in `bug49334.cpp` by
attaching an event to the map table entry (and that's why we need the iterator).

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D104382
2021-07-01 12:32:03 -04:00
Jon Chesterfield db89414da4 [libomptarget][nfc] Move grid size computation
Change getLaunchVals to return the integers used for launch

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D105237
2021-07-01 12:53:04 +01:00
Dhruva Chakrabarti 98c36f0079 Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size"
This reverts commit 2240b41ee4.
A value of 0 for KernDescVal WG_Size implies it is unknown, so it should be
set to the default. The above change was made without this assumption.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D105250
2021-06-30 17:15:00 -07:00
Jon Chesterfield 4b0926b044 [libomptarget][nfc] Replace out arguments with struct return
A step towards making this function adequately self contained that it
can be tested easily. No functional change intended here, left variable
names unchanged.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D105229
2021-06-30 22:40:07 +01:00
Jon Chesterfield d86b0073cf [libomptarget][amdgpu][nfc] Fix build warnings, drop some headers
Removes stdarg header, drops uses of iostream, fix some format string errors.
Also changes a C style struct to C++ style to avoid a warning from clang/

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D104923
2021-06-30 22:23:36 +01:00
Shilei Tian 24a36ce58b [OpenMP][Offloading] Replace all calls to `isSPMDMode` with `__kmpc_is_spmd_exec_mode`
In our ongoing work, we are using `AbstractAttributor` to deduct execution model
of device functions, and potententially remove unnecessary function calls to
`__kmpc_is_spmd_exec_mode`. In current device runtime, we have mixed use of
`isSPMDMode` and `__kmpc_is_spmd_exec_mode`, but in fact in `__kmpc_is_spmd_exec_mode`
it simply calls `isSPMDMode`. Since all functions starting with `__kmpc` is C
function, which doesn't have things like name mangling. It is more optimization
friendly. In this patch, we simply replaced all calls to `isSPMDMode` with
`__kmpc_is_spmd_exec_mode` to pave the way for the optimization.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D105211
2021-06-30 15:39:57 -04:00
Dhruva Chakrabarti e0b713a035 [libomptarget] [amdgpu] Change default number of teams per computation unit
This patch is related to https://reviews.llvm.org/D98832. Based on discussions there, I decided to separate out the teams default as this patch. This change is to increase the number of teams per computation unit so as to provide more wavefronts for hiding latency. This change improves performance for some programs, including 20-50% for some Stream benchmarks.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D99003
2021-06-29 15:34:35 -07:00
Dhruva Chakrabarti 2240b41ee4 [libomptarget] [amdgpu] Fix default setting of max flat workgroup size
When max flat workgroup size is not specified, it is set to the default
workgroup size. This prevents kernel launch with a workgroup size larger
than the default. The fix is to ignore a size of 0 and treat it as
unspecified.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D105073
2021-06-29 13:47:24 -07:00
Johannes Doerfert 4eb90e893f Revert "[OpenMP] Add Two-level Distributed Barrier"
This reverts commit 25073a4ecf.

This breaks non-x86 OpenMP builds for a while now. Until a solution is
ready to be upstreamed we revert the feature and unblock those builds.
See:
  https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821
and
  https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821

The currently proposed fix (D104788) seems not to be ready yet:
  https://reviews.llvm.org/D104788#2841928
2021-06-29 09:38:27 -05:00
Johannes Doerfert bc8bb3df35 Revert "[omp] Fix build without ITT after D103121 changes"
This reverts commit eab1fd389b.

This commit fixed a problem with 25073a4ecf (D103121) which is the one
we actually need to revert to unblock non-X86 builds of OpenMP. Can be
reapplied, or merged into, D103121 as it goes in again.
2021-06-29 09:38:27 -05:00
Joseph Huber 2190c48fde [OpenMP][Documentation] Add FAQ entry for CMake module
This patch adds documentation for using the CMake find module for OpenMP
target offloading provided by LLVM. It also removes the requirement for
AMD's architecture to be set as this isn't necessary for upstream LLVM.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105051
2021-06-28 17:05:07 -04:00
Joseph Huber c9f3240c9d [OpenMP][Documentation] Add OpenMPOpt optimization section
Add some information about the optimizations currently provided by
OpenMPOpt. Every optimization performed should eventually be listed
here.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105050
2021-06-28 17:05:03 -04:00
Pushpinder Singh 20df2c7052 [AMDGPU][Libomptarget] Collect allocatable memory pools using HSA
The logic is almost similar to that of system.cpp with one change that
instead of adding all the memory pools to a device struct it only
keeps a single pool. The existing approach also always allocated memory on
the first HSA pool found for a GPU.

This depends on D104691. The goal of this series of patches is to remove
_atl_machine global. The next patch will drop g_atl_machine entirely.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104695
2021-06-28 11:28:04 +00:00
Jon Chesterfield f66b8fdc0a [libomptarget][amdgpu] Build openmp for two more targets
[libomptarget][amdgpu] Build openmp for two more targets

The 4800U APU is a gfx902 and the MI100 accelerator is a gfx908.
Both numbers are listed in ROCT topology.c

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D104922
2021-06-25 19:02:03 +01:00
Jon Chesterfield 96f6873dff [OpenMP][NFC] Drop unused headers from amdgpu plugin 2021-06-25 12:08:56 +01:00
AndreyChurbanov b2787945f9 [OpenMP][NFC] libomp: fix wrong debug assertion.
Normalized bounds of chunk of iterations to steal from are inclusive,
so upper bound should not be decremented in expression to check.
Problem was in attempt to steal iterations 0:0, that caused assertion after
wrong decrement. Reported in comment to https://reviews.llvm.org/D103648.

Differential Revision: https://reviews.llvm.org/D104880
2021-06-25 02:02:14 +03:00
Aakanksha Patil 3453f3dd46 [AMDGPU] Add gfx1035 target
Differential Revision: https://reviews.llvm.org/D104804
2021-06-24 14:32:41 -04:00
Joel E. Denny 9fa5e3280d [OpenMP] Fix delete map type in ref count debug messages
For example, without this patch:

```
$ cat test.c
int main() {
  int x;
  #pragma omp target enter data map(alloc: x)
  #pragma omp target enter data map(alloc: x)
  #pragma omp target enter data map(alloc: x)
  #pragma omp target exit data map(delete: x)
  ;
  return 0;
}
$ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c
$ LIBOMPTARGET_DEBUG=1 ./a.out |& grep 'Creating\|Mapping exists\|last'
Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=1, Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (incremented), Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=3 (incremented), Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (decremented)
Libomptarget --> There are 4 bytes allocated at target address 0x00000000013bb040 - is not last
```

`RefCount` is reported as decremented to 2, but it ought to be reset
because of the `delete` map type, and `is not last` is incorrect.

This patch migrates the reset of reference counts from
`DeviceTy::deallocTgtPtr` to `DeviceTy::getTgtPtrBegin`, which then
correctly reports the reset.  Based on the `IsLast` result from
`DeviceTy::getTgtPtrBegin`, `targetDataEnd` then correctly reports `is
last` for any deletion.  `DeviceTy::deallocTgtPtr` is responsible only
for the final reference count decrement and mapping removal.

An obscure side effect of this patch is that a `delete` map type when
the reference count is infinite yields `DelEntry=IsLast=false` in
`targetDataEnd` and so no longer results in a
`DeviceTy::deallocTgtPtr` call.  Without this patch, that call is a
no-op anyway besides some unnecessary locking and mapping table
lookups.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D104560
2021-06-23 09:57:19 -04:00
Joel E. Denny 48421ac441 [OpenMP] Improve ref count debug messages
For example, without this patch:

```
$ cat test.c
int main() {
  int x;
  #pragma omp target enter data map(alloc: x)
  #pragma omp target exit data map(release: x)
  ;
  return 0;
}
$ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c
$ LIBOMPTARGET_DEBUG=1 ./a.out |& grep 'Creating\|Mapping exists'
Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, updated RefCount=1
```

There are two problems in this example:

* `RefCount` is not reported when a mapping is created, but it might
  be 1 or infinite.  In this case, because it's created by `omp target
  enter data`, it's 1.  Seeing that would make later `RefCount`
  messages easier to understand.
* `RefCount` is still 1 at the `omp target exit data`, but it's
  reported as `updated`.  The reason it's still 1 is that, upon
  deletions, the reference count is generally not updated in
  `DeviceTy::getTgtPtrBegin`, where the report is produced.  Instead,
  it's zeroed later in `DeviceTy::deallocTgtPtr`, where it's actually
  removed from the mapping table.

This patch makes the following changes:

* Report the reference count when creating a mapping.
* Where an existing mapping is reported, always report a reference
  count action:
    * `update suppressed` when `UpdateRefCount=false`
    * `incremented`
    * `decremented`
    * `deferred final decrement`, which replaces the misleading
      `updated` in the above example
* Add comments to `DeviceTy::getTgtPtrBegin` to explain why it does
  not zero the reference count.  (Please advise if these comments miss
  the point.)
* For unified shared memory, don't report confusing messages like
  `RefCount=` or `RefCount= updated` given that reference counts are
  irrelevant in this case.  Instead, just report `for unified shared
  memory`.
* Use `INFO` not `DP` consistently for `Mapping exists` messages.
* Fix device table dumps to print `INF` instead of `-1` for an
  infinite reference count.

Reviewed By: jhuber6, grokos

Differential Revision: https://reviews.llvm.org/D104559
2021-06-23 09:57:19 -04:00
Joseph Huber 72d4cd627c [OpenMP] Introduce an CMake find module for OpenMP Target support
This introduces a CMake find module for detecting target offloading support in
a compiler. The goal is to make it easier to incorporate target offloading into
a cmake project.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D104710
2021-06-22 23:01:38 -04:00
Joseph Huber 422adaa879 [OpenMP] Add thread limit environment variable support to plugins
The OpenMP 5.1 standard defines the environment variable
`OMP_TEAMS_THREAD_LIMIT` to limit the number of threads that will be run in a
single block. This patch adds support for this into the AMDGPU and CUDA
plugins.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103923
2021-06-22 16:25:40 -04:00
Shilei Tian 0029059074 [NFC][OpenMP][Offloading] Unified the construction of mapping table entry
This patch unifies construction of mapping table entry to use `emplace`.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D104580
2021-06-22 12:38:47 -04:00
Joseph Huber 244e98ff48 [Libomptarget] Improve device runtime implementation for globalized variables.
Currently the runtime implementation of `__kmpc_alloc_shared` is extremely slow because it allocated memory for each thread individually. This patch adds a small buffer for the threads to share data and will greatly improve performance for builds where all globalization could not be optimized out. If the shared buffer is full, then memory will not only be allocated per-warp rather than per-thread.

Depends on D97680

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104666
2021-06-22 11:52:49 -04:00
Joseph Huber 952a0f2385 [Libomptarget] Introduce new globalization runtime calls
Summary:
This patch introduces the new globalization runtime to be used by D97680. These
runtime calls will replace the __kmpc_data_sharing_push_stack and
__kmpc_data_sharing_pop_stack functions.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102532
2021-06-22 10:05:42 -04:00
AndreyChurbanov 5dd4d0d46f [OpenMP] libomp: fix dynamic loop dispatcher
Restructured dynamic loop dispatcher code.
Fixed use of dispatch buffers for nonmonotonic dynamic (static_steal) schedule:
- eliminated possibility of stealing iterations of the wrong loop when victim
  thread changed its buffer to work on another loop;
- fixed race when victim thread changed its buffer to work in nested parallel;
- eliminated "static" property of the schedule, that is now a single thread can
  execute whole loop.

Differential Revision: https://reviews.llvm.org/D103648
2021-06-22 16:29:01 +03:00
Pushpinder Singh 9d110f9159 [AMDGPU][Libomptarget] Move allow_access_to_all_gpu_agents to rtl.cpp
Moving this method helps eliminate a use of g_atl_machine.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104691
2021-06-22 11:44:52 +00:00
Vladislav Vinogradov eab1fd389b [omp] Fix build without ITT after D103121 changes
Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D104638
2021-06-21 18:17:52 +03:00
Vyacheslav Zakharin aad9e48c5f [NFC][libomptarget] Remove redundant libelf dependency for elf_common.
Differential Revision: https://reviews.llvm.org/D104549
2021-06-21 07:19:55 -07:00
Pushpinder Singh 7a97cd9da7 [AMDGPU][Libomptarget] Remove redundant functions
There does not seem to be any use of these functions. They just
put the value to a local which is never used again.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104512
2021-06-21 06:13:24 +00:00
Shilei Tian ec97866454 [OpenMP] Make bug49334.cpp more reproducible
`bug49334.cpp` cannot detect data race in `libomptarget` efficiently. It
is reported that with `N = 256` and `BS = 16`, the data race can be reproduced
more steadily. The next coming pathces will fix it so this patch is expected to
fail now.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104552
2021-06-18 18:35:41 -04:00
Asher Mancinelli 5c189d30e6 [OpenMP] Update FAQ for enabling cuda offloading
Add an FAQ entry and add a few lines to an existing one. Document
the use of `GCC_INSTALL_PREFIX` for pointing clang to correct
GCC installation for two-stage build.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D104474
2021-06-18 11:55:45 -06:00
Vyacheslav Zakharin 836992ab9a [NFC][libomptarget] Build elf_common with PIC.
Differential Revision: https://reviews.llvm.org/D104545
2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin c5b7c7c8f7 [NFC][libomptarget] Fixed -DLLVM_ENABLE_RUNTIMES="openmp" build.
Differential Revision: https://reviews.llvm.org/D104535
2021-06-18 09:20:10 -07:00
Terry Wilmarth 25073a4ecf [OpenMP] Add Two-level Distributed Barrier
Two-level distributed barrier is a new experimental barrier designed
for Intel hardware that has better performance in some cases than the
default hyper barrier.

This barrier is designed to handle fine granularity parallelism where
barriers are used frequently with little compute and memory access
between barriers.  There is no need to use it for codes with few
barriers and large granularity compute, or memory intensive
applications, as little difference will be seen between this barrier
and the default hyper barrier. This barrier is designed to work
optimally with a fixed number of threads, and has a significant setup
time, so should NOT be used in situations where the number of threads
in a team is varied frequently.

The two-level distributed barrier is off by default -- hyper barrier
is used by default. To use this barrier, you must set all barrier
patterns to use this type, because it will not work with other barrier
patterns.  Thus, to turn it on, the following settings are required:

KMP_FORKJOIN_BARRIER_PATTERN=dist,dist
KMP_PLAIN_BARRIER_PATTERN=dist,dist
KMP_REDUCTION_BARRIER_PATTERN=dist,dist

Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER,
and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed
barrier.

Differential Revision: https://reviews.llvm.org/D103121
2021-06-16 15:34:55 -05:00
Vyacheslav Zakharin b5c4fc0f23 [NFC][libomptarget] Reduce the dependency on libelf
This change-set removes libelf usage from elf_common part of the plugins.
libelf is still used in x86_64 generic plugin code and in some plugins
(e.g. amdgpu) - these will have to be cleaned up in separate checkins.

Differential Revision: https://reviews.llvm.org/D103545
2021-06-16 08:34:23 -07:00
AndreyChurbanov 610fea65e2 [OpenMP] libomp: fixed implementation of OMP 5.1 inoutset task dependence type
Refactored code of dependence processing and added new inoutset dependence type.
Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps.
All dependence flags library gets so far and corresponding dependence types:
1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET.

Differential Revision: https://reviews.llvm.org/D97085
2021-06-16 14:47:29 +03:00
Joachim Protze d2a7871b5e [OpenMP][NFC] Add back suppression of warning
Commit cff215565e did not fix all unused variables in different builds,
so adding back the suppression for now.
2021-06-16 10:14:59 +02:00
Joachim Protze cff215565e [OpenMP] Remove unused variables from libomp code
Several variables were left unused as a result of different patches removing
their use.

Two variables have some use:
`poll_count` is used by the KMP_BLOCKING macro only under certain conditions.
Adding (void) to tell the compiler to ignore the unused variable.

`padding` is a dummy stack allocation with no intent to be used. Also adding
(void) to make the compiler ignore the unused variable.

Differential Revision: https://reviews.llvm.org/D104303
2021-06-16 09:33:46 +02:00
Peyton, Jonathan L 56da28240f [OpenMP] Add GOMP 5.0 version symbols to API
* Add GOMP versioned pause functions
* Add GOMP versioned affinity format functions

To do the affinity format functions, only attach versioned symbols
to the APPEND Fortran entries (e.g., omp_set_affinity_format_) since
GOMP only exports two symbols (one for Fortran, one for C). Our
affinity format functions have three symbols.
e.g., with omp_set_affinity_format:
1) omp_set_affinity_format (Fortran interface)
2) omp_set_affinity_format_ (Fortran interface)
3) ompc_set_affinity_format (C interface)

Have the GOMP version of the C symbol alias the ompc_* 3) version
instead of the Fortran unappended version 1).

Differential Revision: https://reviews.llvm.org/D103647
2021-06-15 16:25:00 -05:00
Peyton, Jonathan L 92baf414db [OpenMP] Fix affinity determine capable algorithm on Linux
Remove strange checks for syscall() arguments where mask is NULL.
Valgrind reports these as error usages for the syscall.
Instead, just check if CACHE_LINE bytes is long enough. If not, then
search for the size. Also, by limiting the first size detection
attempt to CACHE_LINE bytes, instead of 1MB, we don't use more than one
cache line for the mask size. Before this patch, sometimes the returned
mask size was 640 bytes (10 cache lines) because the initial call to
getaffinity() was limited only by the internal kernel mask size
which can be very large.

Differential Revision: https://reviews.llvm.org/D103637
2021-06-15 16:21:30 -05:00
Peyton, Jonathan L 0ddde4d865 [OpenMP] Lazily assign root affinity
Lazily set affinity for root threads. Previously, the root thread
executing middle initialization would attempt to assign affinity
to other existing root threads. This was not working properly as the
set_system_affinity() function wasn't setting the affinity for the
target thread. Instead, the middle init thread was resetting the
its own affinity using the target thread's affinity mask.

Differential Revision: https://reviews.llvm.org/D103625
2021-06-15 16:21:06 -05:00
Pushpinder Singh cadcaf3f46 [AMDGPU][Libomptarget] Drop dead code related to g_atl_machine
This patch includes some changes which deletes the code accessing
g_atl_machine global. Some accesses related to memory_pools are
still remaining.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103813
2021-06-15 05:21:35 +00:00
Ron Lieberman 91f147792e [libomptarget][amdgpu] Remove stray fprintf in rtl.cpp
remove unintended fprintf in rtl.cpp

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D104003
2021-06-10 01:57:30 +00:00
AndreyChurbanov 9ce2e5e700 Revert "[OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type"
This reverts commit a1f550e052.

Revert in order to fix backwards compatibility breakage
caused by type size change for task dependence flag.
2021-06-09 17:38:38 +03:00
Joachim Protze 639b397931 [OpenMP][Tools] Fix Archer handling of task dependencies
The current handling of dependencies in Archer has two flaws:

- annotation of dependency synchronization is not limited to sibling tasks
- annotation of in/out dependencies is based on the assumption, that dependency
  variables will rarely be byte-sized variables.

This patch introduces a map in the generating task to manage the dependency
variables for the child tasks. The map is only accesses from the generating
task, so no locking is necessary. This also limits the dependency-based
synchronization to sibling tasks.
This patch also introduces proper handling for new dependency types such as
mutexinoutset and inoutset.

Differential Revision: https://reviews.llvm.org/D103608
2021-06-09 13:36:20 +02:00
Joachim Protze 08d8f1a958 [OpenMP][Tools] Cleanup memory pool used in Archer
The main motivation for reusing objects is that it helps to avoid creating and
leaking synchronization clocks in TSan. The reused object will reuse the
synchronization clock in TSan.

Before, new and delete operators were overloaded to get and return memory for
the object from/to the object pool.
This patch replaces the operator overloading with explicit static New/Delete
functions.

Objects for parallel regions and implicit tasks will always be recruited and
returned to the thread-local object pool. Only for explicit task, there is a
chance that an other thread completes the task and will free the object. This
patch optimizes the thread-local New/Delete calls by avoiding locks and only
lock if the pool is empty. Remote threads return the object into a separate
queue.

The chunk size for allocations is now decided based on page size. The objects
will also be aligned to cache lines avoiding false sharing.

This is the first patch in a series to provide better tasking support.

Differential Revision: https://reviews.llvm.org/D103606
2021-06-09 13:36:19 +02:00
Joachim Protze 82e4e50531 [OpenMP][Tools] Fix Archer for MACOS
Archer uses weak symbol overloads of TSan functions to enable loading the tool
even if the application is not built with TSan. For MACOS the tool collects
the function pointer at runtime.
When adding the function entry/exit markers, we missed to add the functions
in the MACOS codepath.
This patch also replaces the repeated function lookup by a single initial
function lookup and fixes the disabling logic in RunningOnValgrind.

Differential Revision: https://reviews.llvm.org/D103607
2021-06-09 13:36:19 +02:00
Brendon Cahoon 294efbbd3e Reland "[AMDGPU] Add gfx1013 target"
This reverts commit 211e584fa2.

Fixed a use-after-free error that caused the sanitizers to fail.
2021-06-08 21:15:35 -04:00
Joseph Huber df965513a9 [OpenMP] Add an information flag for device data transfers
This patch adds an information flag that indicated when data is being copied to
and from the device. This will be helpful for finding redundant or unnecessary
data transfers in applications.

Reviewed By: jdoerfert, grokos

Differential Revision: https://reviews.llvm.org/D103927
2021-06-08 20:23:27 -04:00
Brendon Cahoon 211e584fa2 Revert "[AMDGPU] Add gfx1013 target"
This reverts commit ea10a86984.

A sanitizer buildbot reports an error.
2021-06-08 16:29:41 -04:00
Brendon Cahoon ea10a86984 [AMDGPU] Add gfx1013 target
Differential Revision: https://reviews.llvm.org/D103663
2021-06-08 12:49:49 -04:00
Vignesh Balasubramanian f61602b0d3 [OpenMP][OMPD] Implementation of OMPD debugging library - libompd.
This is the first of seven patches that implements OMPD, a debugging interface to support debugging of OpenMP programs.
It contains support code required in "openmp/runtime" for OMPD implementation.

Reviewed By: @hbae
Differential Revision: https://reviews.llvm.org/D100181
2021-06-08 16:44:22 +05:30
Peyton, Jonathan L d70e1f1276 [OpenMP][runtime] add .clang-tidy file
Use same checks as compiler-rt which removes checks for readability-*
and llvm-header style.

Differential Revision: https://reviews.llvm.org/D103711
2021-06-07 13:56:39 -05:00
AndreyChurbanov a1f550e052 [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type
Refactored code of dependence processing and added new inoutset dependence type.
Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps.
Size of type of the dependence flag changed from 1 to 4 bytes in clang.
All dependence flags library gets so far and corresponding dependence types:
1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET.

Differential Revision: https://reviews.llvm.org/D97085
2021-06-07 21:42:51 +03:00
Bryan Chan 54f059c900 [OpenMP] Check loc for NULL before dereferencing it
The ident_t * argument in __kmp_get_monotonicity was being used without
a customary NULL check, causing the function to crash in a Debug build.
Release builds were not affected thanks to dead store elimination.
2021-06-07 10:45:48 -04:00
Pushpinder Singh 4f8bc7caf4 [AMDGPU][Libomptarget] Remove atlc global
This global struct used to hold various flags for monitoring the
initialization of hsa.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103795
2021-06-07 11:09:01 +00:00
Pushpinder Singh f5f329a371 [AMDGPU][Libomptarget] Rework logic for locating kernarg pools
Previous logic was to always use the first kernarg pool found to allocate
kernel args. This patch changes this to use only the kernarg pool which
has non-zero size. This logic is also reworked to not use any globals.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103600
2021-06-07 06:41:37 +00:00
Terry Wilmarth 8ec9aa236e [OpenMP] Add experimental nesting mode feature
Nesting mode is a new experimental feature in the OpenMP
runtime. It allows a user to set up nesting for an application in a
way that corresponds to the hardware topology levels on the machine an
application is being run on.  For example, if a machine has 2 sockets,
each with 12 cores, then use of nesting mode could set up an outer
level of nesting that uses 2 threads per parallel region, and an inner
level of nesting that uses 12 threads per parallel region.

Nesting mode is controlled with the KMP_NESTING_MODE environment
variable as follows:

1) KMP_NESTING_MODE = 0: Nesting mode is off (default); max-active-levels-var
is set to 1 (the default -- nesting is off, nested parallel regions
are serialized).

2) KMP_NESTING_MODE = 1: Nesting mode is on, and a number of threads
will be assigned for each level discovered in the machine topology;
max-active-levels-var is set to the number of levels discovered.

3) KMP_NESTING_MODE = n, n>1: [Note: this option is experimental and may change
or be removed in the future.] Nesting mode is on, and a number of
threads will be assigned for each topology level discovered on the
machine, up to k<=n levels (since there may be fewer than n levels
discovered in the topology), and beyond the kth level, nested parallel
regions will be serialized; NOTE: max-active-levels-var is 1 (the default --
nesting is off, and nested parallel regions are serialized until the
user changes max-active-levels-var.

If the user sets OMP_NUM_THREADS or OMP_MAX_ACTIVE_LEVELS, they will
override KMP_NESTING_MODE settings for the associated environment
variables. The detected topology may be limited by an affinity mask
setting on the initial thread, or if the user sets KMP_HW_SUBSET. See
also: KMP_HOT_TEAMS_MAX_LEVEL for controlling use of hot teams for
nested parallel regions. Note that this feature only sets numbers of
threads used at nesting levels.  The user should make use of
OMP_PLACES and OMP_PROC_BIND or KMP_AFFINITY for affinitizing those
threads, if desired.

Differential Revision: https://reviews.llvm.org/D102188
2021-06-04 16:01:11 -05:00
Peyton, Jonathan L 56dd158c32 [OpenMP] fix spelling error in message-converter.pl 2021-06-04 11:20:32 -05:00
Peyton, Jonathan L f7655f3df3 [OpenMP] Fix improper printf format specifier 2021-06-02 11:04:48 -05:00
Hansang Bae 7ba4e96ede [OpenMP] Use new task type/flag for taskwait depend events.
Differential Revision: https://reviews.llvm.org/D103464
2021-06-02 10:16:38 -05:00
Pushpinder Singh b25546a4b4 [AMDGPU][Libomptarget][NFC] Remove bunch of dead structs
Dropped structs are atmi_machine_t, atmi_device_t and atmi_memory_t

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103509
2021-06-02 10:40:51 +00:00
Pushpinder Singh 2368170a8d [AMDGPU][Libomptarget][NFC] Remove atmi_place_t
atmi_place_t has been replaced with int DeviceId.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103508
2021-06-02 10:35:28 +00:00
Peyton, Jonathan L 2020c981fa [OpenMP] Add L2-Tile equivalence for KNL
When on KNL and L2 or Tile layer is detected, manually add
the corresponding layer which is equivalent.

Differential Revision: https://reviews.llvm.org/D102865
2021-06-01 14:17:13 -05:00
Hansang Bae cf5c94ef08 [OpenMP] Define named constants for interop's foreign runtime ID
Also added missing Fortran definitions for interop support.

Differential Revision: https://reviews.llvm.org/D102883
2021-06-01 13:06:59 -05:00
Pushpinder Singh fb113264a8 [AMDGPU][Libomptarget] Remove g_atmi_machine global
Turns out the only purpose of this class was verify if device ID
was in range or not which could be done easily by using g_atl_machine.

Still getting rid of g_atl_machine is pending which would be done in
a later patch.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103443
2021-06-01 12:34:24 +00:00
Pushpinder Singh 4fc3286951 [AMDGPU][Libomptarget][NFC] Split host and device malloc
This patch splits the code path for host and device malloc.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103389
2021-05-31 12:09:18 +00:00
Pushpinder Singh 8b79dfb302 [AMDGPU][Libomptarget][NFC] Remove atmi_mem_place_t
This struct was used to specify the device on which memory was
being allocated/free in atmi_malloc/free. It has now been replaced
with int DeviceId.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103239
2021-05-27 11:53:18 +00:00
Jon Chesterfield 2fdf8bbd19 [libomptarget][nfc][amdgpu] Factor out setting upper bounds
Refactor suggested in D103037 to help avoid similar copy-paste errors.
Change is mechanical. Some parts of this would be more robust with unsigned.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D103090
2021-05-26 19:57:49 +01:00
Jon Chesterfield c5c1ec7945 [libomptarget][nfc][amdgpu] Refactor uses of KernelInfoTable
Suggested in D103059. Use a single lookup instead of two, more const, less mutation.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D103093
2021-05-26 19:25:25 +01:00
Jon Chesterfield 07f59baad6 [libomptarget][nfc][amdgpu] Remove atmi_status_t type
ATMI_STATUS_UNKNOWN was unused, deleted references to it.
Replaced ATMI_STATUS_{SUCCESS,ERROR} with HSA_STATUS_{SUCCESS,ERROR}
Replaced atmi_status_t with hsa_status_t

Otherwise no change. In particular, conversions between atmi_status_t and
hsa_status_t will now be conversions between hsa_status_t and itself.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D103115
2021-05-26 17:02:19 +01:00
Pushpinder Singh a2d6ef5876 [AMDGPU][Libomptarget] Inline atmi_init/atmi_finalize
After D102847, these functions can be inlined.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103075
2021-05-26 10:50:08 +00:00
Pushpinder Singh cc8661ac4a [AMDGPU][Libomptarget] Delete g_atmi_initialized
This patch drops g_atmi_initialized and inlines the Initialize &
Finalize methods from Runtime class.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102847
2021-05-26 10:46:54 +00:00
Pushpinder Singh 7648b6978e [AMDGPU][Libomptarget] Move Kernel/Symbol info tables to RTLDeviceInfoTy
Two globals KernelInfoTable & SymbolInfoTable are moved
into RTLDeviceInfoTy class.
This builds on the top of D102691.
[2/2]

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102692
2021-05-26 10:02:28 +00:00
Jon Chesterfield df005fa364 [libomptarget][nfc] Move hostcall required test to rtl
[libomptarget][nfc] Move hostcall required test to rtl

Remove a global, fix minor race. First of N patches to bring up hostcall.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103058
2021-05-25 22:43:17 +01:00