Commit Graph

1432 Commits

Author SHA1 Message Date
George Rokos 5adb3a6d86 [libomptarget] Fix copy-to motion for PTR_AND_OBJ entries where PTR is a struct member.
This patch fixes a problem whereby the pointee object of a PTR_AND_OBJ entry with a `map(to)` motion clause can be overwritten on the device even if its reference counter is >=1.

Currently, we check the reference counter of the parent struct in order to determine whether the motion clause should be respected, but since the pointee object is not part of the struct, it's got its own reference counter which should be used to enqueue the copy or discard it.

The same behavior has already been implemented in targetDataEnd (omptarget.cpp:539-540), but we somehow missed doing the same in targetDataBegin.

Differential Revision: https://reviews.llvm.org/D89597
2020-10-16 16:14:01 -07:00
JonChesterfield 7d2ecef5ed [openmp][libomptarget] Include header from LLVM source tree
[openmp][libomptarget] Include header from LLVM source tree

The change is to the amdgpu plugin so is unlikely to break anything.

The point of contention is whether libomptarget can depend on LLVM.
A community discussion was cautiously not opposed yesterday.

This introduces a compile time dependency on the LLVM source tree, in this case
expressed as skipping the building of the plugin if LLVM_MAIN_INCLUDE_DIR is not
set. One the source files will #include llvm/Frontend/OpenMP/OMPGridValues.h,
instead of copy&pasting the numbers across.

For users that download the monorepo, the llvm tree is already on disk. This will
inconvenience users who download only the openmp source as a tar, as they would
now also have to download (at least a file or two) from the llvm source, if they want
to build the parts of the openmp project that (post this patch) depend on llvm.

There was interest expressed in going further - using llvm tools as part of
building libomp, or linking against llvm libraries. That seems less clear cut
an improvement and worthy of further discussion. This patch seeks only to change
policy to support openmp depending on the llvm source tree. Including in the
other direction, or using libraries / tools etc, are purposefully out of scope.

Reviewers are a best guess at interested parties, please feel free to add others

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D87841
2020-10-15 15:46:19 +01:00
JonChesterfield 8b6cd15242 [libomptarget][amdgcn] Implement partial barrier
[libomptarget][amdgcn] Implement partial barrier

named_sync is used to coordinate non-spmd kernels. This uses bar.sync on nvptx.
There is no corresponding ISA support on amdgcn, so this is implemented using
shared memory, one word initialized to zero.

Each wave increments the variable by one. Whichever wave is last is responsible
for resetting the variable to zero, at which point it and the others continue.

The race condition on a wave reaching the barrier before another wave has
noticed that it has been released is handled with a generation counter, packed
into the same word.

Uses a shared variable that is not needed on nvptx. Introduces a new hook,
kmpc_impl_target_init, to allow different targets to do extra initialization.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D88602
2020-10-12 21:27:32 +01:00
Joseph Huber d564409946 [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default
Summary:
This patch changes the CMake files for Clang and Libomptarget to query the
system for its supported CUDA architecture. This makes it much easier for the
user to build optimal code without needing to set the flags manually. This
relies on the now deprecated FindCUDA method in CMake, but full support for
architecture detection is only availible in CMake >3.18

Reviewers: jdoerfert ye-luo

Subscribers: cfe-commits guansong mgorny openmp-commits sstefan1 yaxunl

Tags: #clang #OpenMP

Differential Revision: https://reviews.llvm.org/D87946
2020-10-08 12:09:34 -04:00
Pushpinder Singh 3a12ff0dac [OpenMP][RTL] Remove dead code
RequiresDataSharing was always 0, resulting dead code in device runtime library.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D88829
2020-10-06 05:43:47 -04:00
Joachim Protze 69f87400a8 [OpenMP][Archer][Tests] NFC: fix spurious test failure
The test disables suppression and therefore sometimes triggers a know false
positive in the openmp runtime. The test should only verify that the env
var is handles as expected.
2020-10-06 00:26:08 +02:00
Joachim Protze 34b34e90fc [OpenMP][Tests] NFC: fix flaky test failure caused by rare scheduling
The worker thread can start execution of the task before creation of the second task
Fixes the spurious failure reported in https://reviews.llvm.org/D61657
2020-10-05 16:55:32 +02:00
Joachim Protze 23419bfd1c [OpenMP][libarcher] Allow all possible argument separators in TSAN_OPTIONS
Currently, the parser used to tokenize the TSAN_OPTIONS in libomp uses
only spaces as separators, even though TSAN in compiler-rt supports
other separators like ':' or ','.
CTest uses ':' to separate sanitizer options by default.
The documentation for other sanitizers mentions ':' as separator,
but TSAN only lists spaces, which is probably where this mismatch originated.

Patch provided by  upsj

Differential Revision: https://reviews.llvm.org/D87144
2020-10-01 01:10:13 +02:00
Joachim Protze 6104b30446 [OpenMP][OMPT] Update OMPT tests for newly added GOMP interface patches
This patch updates the expected results for the GOMP interface patches: D87267, D87269, and D87271.
The taskwait-depend test is changed to really use taskwait-depend and copied to an task_if0-depend test.

To pass the tests, the handling of the return address was fixed.

Differential Revision: https://reviews.llvm.org/D87680
2020-10-01 00:53:41 +02:00
Joachim Protze 55cff5b288 [OpenMP][libomptarget] make omp_get_initial_device 5.1 compliant
OpenMP 5.1 defines omp_get_initial_device to return the same value as omp_get_num_devices.
Since this change is also 5.0 compliant, no versioning is needed.

Differential Revision: https://reviews.llvm.org/D88149
2020-10-01 00:51:11 +02:00
JonChesterfield d256797c90 [nfc][libomptarget] Drop parameter to named_sync
[nfc][libomptarget] Drop parameter to named_sync

named_sync has one call site (in sync.cu) where it always passed L1_BARRIER.
Folding this into the call site and dropping the macro is a simplification.

amdgpu doesn't have ptx' bar.sync instruction. A correct implementation of
__kmpc_impl_named_sync in terms of shared memory is much easier if it can
assume that the barrier argument is this constant. Said implementation is left
for a second patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D88474
2020-09-29 23:12:21 +01:00
Manoel Roemmer c816ee13ad [OpenMP][VE plugin] Fixing failure to build VE plugin with consolidated error handling in libomptarget
The libomptarget VE plugin [[
http://lab.llvm.org:8014/builders/clang-ve-ninja/builds/8937/steps/build-unified-tree/logs/stdio
| fails zu build ]] after ae95ceeb8f .

Differential Revision: https://reviews.llvm.org/D88476
2020-09-29 17:38:01 +02:00
Joseph Huber 0103df7903 [OpenMP] Add Missing _static Director for OpenMP Documentation
Summary:
Adding a missing directory needed for generating Sphinx documentation without
errors. Directory current contains a placeholder image just to populate the
directory.
2020-09-27 15:35:47 -04:00
Ye Luo ffd159d8e9 [OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL
It allows customizing MAX_SM for non-flagship GPU and reduces graphic memory usage.

In addition, so far the size is hard-coded up to __CUDA_ARCH__ 700 and is already a hassle for 800.
Introduce MAX_SM for 800 and protect future arch

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D88185
2020-09-24 12:39:59 -04:00
Peyton, Jonathan L ee1c04a926 [OpenMP] Fix if0 task with dependencies in the runtime
The current GOMP interface for serialized tasks does not take into
account task dependencies. Add the check and wait for dependencies.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=46573

Differential Revision: https://reviews.llvm.org/D87271
2020-09-24 09:47:53 -05:00
Peyton, Jonathan L 9089b4a5c5 [OpenMP] Introduce GOMP taskwait depend in the runtime
This change introduces the GOMP_taskwait_depend() function. It implements
the OpenMP 5.0 feature of #pragma omp taskwait with depend() clause by
wrapping around __kmpc_omp_wait_deps().

Differential Revision: https://reviews.llvm.org/D87269
2020-09-24 09:45:14 -05:00
Peyton, Jonathan L 72ada5ae6c [OpenMP] Introduce GOMP mutexinoutset in the runtime
Encapsulate GOMP task dependencies in separate class and introduce the
new mutexinoutset dependency type. This separate class allows
future GOMP task APIs easier access to the task dependency functionality
and better ability to propagate new dependency types to all existing GOMP
task APIs which use task dependencies.

Differential Revision: https://reviews.llvm.org/D87267
2020-09-24 09:45:13 -05:00
Peyton, Jonathan L ea34d95e0a [OpenMP] Introduce GOMP teams support in runtime
Implement GOMP_teams_reg() function which enables GOMP support of the
standalone teams construct. The GOMP_parallel* functions were modified
to call __kmp_fork_call() unconditionally so that the teams-specific
code could be reused within __kmp_fork_call() instead of reproduced
inside the GOMP_* functions.

Differential Revision: https://reviews.llvm.org/D87167
2020-09-24 09:45:13 -05:00
Ye Luo 03111e5e7a [OpenMP] Protect unrecogonized CUDA error code
If an error code can not be recognized by cuGetErrorString, errStr remains null and causes crashing at DP() printing.
Protect this case.

Reviewed By: jhuber6, tianshilei1992

Differential Revision: https://reviews.llvm.org/D87980
2020-09-21 13:43:08 -04:00
Joseph Huber 1c4c21489f [OpenMP] Initial Support for OpenMP Webpage Documentation
Summary:
Adding support for generated html documentation for OpenMP. Changing
Cmake files to build the documentation and adding the base templates for
future documentation to be added.

Reviewers: jdoerfert

Subscribers: aaron.ballman arphaman guansong mgorny openmp-commits sstefan1 yaxunl

Tags: #OpenMP

Differential Revision: https://reviews.llvm.org/D87797
2020-09-18 16:32:22 -04:00
JonChesterfield a9be2b5cb2 [libomptarget] Disable build of amdgpu plugin as it doesn't build with rocm. 2020-09-18 18:10:27 +01:00
Joseph Huber c3e6054b07 [OpenMP] Additional Information for Libomptarget Mappings
Summary:
This patch adds additonal support for priting infromation from Libomptarget for
already existing maps and printing the final data mapped on the device at
device destruction.

Reviewers: jdoerfort gkistanova

Subscribers: guansong openmp-commits sstefan1 yaxunl

Tags: #OpenMP

Differential Revision: https://reviews.llvm.org/D87722
2020-09-15 18:12:57 -04:00
Raul Tambre c42f96cb23 [CMake][OpenMP] Simplify getting CUDA library directory
LLVM now requires CMake 3.13.4 so we can simplify this.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D87195
2020-09-11 21:19:11 +03:00
Joseph Huber ae209397b1 [OpenMP] Begin Printing Information Dumps In Libomptarget and Plugins
Summary:
This patch starts adding support for adding information dumps to libomptarget
and rtl plugins. The information printing is controlled by the
LIBOMPTARGET_INFO environment variable introduced in D86483. The goal of this
patch is to provide the user with additional information about the device
during kernel execution and providing the user with information dumps in the
case of failure. This patch added the ability to dump the pointer mapping table
as well as printing the number of blocks and threads in the cuda RTL.

Reviewers: jdoerfort gkistanova	ye-luo

Subscribers: guansong openmp-commits sstefan1 yaxunl ye-luo

Tags: #OpenMP

Differential Revision: https://reviews.llvm.org/D87165
2020-09-09 12:03:56 -04:00
Pushpinder Singh 7634c64b61 [OpenMP][AMDGPU] Use DS_Max_Warp_Number instead of WARPSIZE
The size of worker_rootS should have been DS_Max_Warp_Number.
This reduces memory usage by deviceRTL on AMDGPU from around 2.3GB
to around 770MB.

Reviewed By: JonChesterfield, jdoerfert

Differential Revision: https://reviews.llvm.org/D87084
2020-09-07 05:15:21 -04:00
Raul Tambre 21c0e74c9e [CMake][OpenMP] Remove old dead CMake code
LLVM requires CMake 3.13.4 so remove code behind checks for an older version.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D87191
2020-09-07 10:56:56 +03:00
Joseph Huber ae95ceeb8f [OpenMP] Consolidate error handling and debug messages in Libomptarget
Summary:

This patch consolidates the error handling and messaging routines to a single
file omptargetmessage. The goal is to simplify the error handling interface
prior to adding more error handling support

Reviewers: jdoerfert grokos ABataev AndreyChurbanov ronlieb JonChesterfield ye-luo tianshilei1992

Subscribers: danielkiss guansong jvesely kerbowa nhaehnle openmp-commits sstefan1 yaxunl
2020-09-01 15:28:19 -04:00
Alexey Bataev 6aa7228a62 [LIBOMPTARGET]Do not try to optimize bases for the next parameters.
PrivateArgumentManager shall immediately allocate firstprivates if they
are bases for the next parameters and the next paramaters rely on the
fact that the base musst be allocated already.

Differential Revision: https://reviews.llvm.org/D86781
2020-08-28 15:46:31 -04:00
Shilei Tian 46e0ced762 [OpenMP] Fixed wrong test command in the test private_mapping.c
The test command in `private_mapping.c` was set to expect failure by mistake. It is fixed in this patch.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D86758
2020-08-28 12:19:46 -04:00
Joseph Huber 7a5a74ea96 [OpenMP] Always emit debug messages that indicate offloading failure
Summary:

This patch changes the libomptarget runtime to always emit debug messages that
occur before offloading failure. The goal is to provide users with information
about why their application failed in the target region rather than a single
failure message. This is only done in regions that precede offloading failure
so this should not impact runtime performance. if the debug environment
variable is set then the message is forwarded to the debug output as usual.

A new environment variable was added for future use but does nothing in this
current patch. LIBOMPTARGET_INFO will be used to report runtime information to
the user if requrested, such as grid size, SPMD usage, or data mapping. It will
take an integer indicating the level of information verbosity and a value of 0
will disable it.

Reviewers: jdoerfort

Subscribers: guansong sstefan1 yaxunl ye-luo

Tags: #OpenMP

Differential Revision: https://reviews.llvm.org/D86483
2020-08-26 19:30:41 -04:00
JonChesterfield 5d989fb37d [libomptarget][amdgpu] Improve thread safety, remove dead code 2020-08-26 22:04:03 +01:00
Jon Chesterfield 28fbf422f2 [libomptarget][amdgpu] Update plugin CMake to work with latest rocr library 2020-08-26 20:01:42 +01:00
AndreyChurbanov 1596ea80fd [OpenMP] Fix import library installation with MinGW
Patch by mati865@gmail.com

Differential Revision: https://reviews.llvm.org/D86552
2020-08-26 21:56:01 +03:00
AndreyChurbanov 09af378f49 [OpenMP] Fix build on macOS sdk 10.12 and newer
Patch by nihui (Ni Hui)

Differential Revision: https://reviews.llvm.org/D76755
2020-08-26 16:52:46 +03:00
Shilei Tian 0775c1dfbc [OpenMP] Pack first-private arguments to improve efficiency of data transfer
In this patch, we pack all small first-private arguments, allocate and transfer them all at once to reduce the number of data transfer which is very expensive.

Let's take the test case as example.
```
int main() {
  int data1[3] = {1}, data2[3] = {2}, data3[3] = {3};
  int sum[16] = {0};
#pragma omp target teams distribute parallel for map(tofrom: sum) firstprivate(data1, data2, data3)
  for (int i = 0; i < 16; ++i) {
    for (int j = 0; j < 3; ++j) {
      sum[i] += data1[j];
      sum[i] += data2[j];
      sum[i] += data3[j];
    }
  }
}
```
Here `data1`, `data2`, and `data3` are three first-private arguments of the target region. In the previous `libomptarget`, it called data allocation and data transfer three times, each of which allocated and transferred 12 bytes. With this patch, it only calls allocation and transfer once. The size is `(12+4)*3=48` where 12 is the size of each array and 4 is the padding to keep the address aligned with 8. It is implemented in this way:
1. First collect all information for those *first*-private arguments. _private_ arguments are not the case because private arguments don't need to be mapped to target device. It just needs a data allocation. With the patch for memory manager, the data allocation could be very cheap, especially for the small size. For each qualified argument, push a place holder pointer `nullptr` to the `vector` for kernel arguments, and we will update them later.
2. After we have all information, create a buffer that can accommodate all arguments plus their paddings. Copy the arguments to the buffer at the right place, i.e. aligned address.
3. Allocate a target memory with the same size as the host buffer, transfer the host buffer to target device, and finally update all place holder pointers in the arguments `vector`.

The reason we only consider small arguments is, the data transfer is asynchronous. Therefore, for the large argument, we could continue to do things on the host side meanwhile, hopefully, the data is also being transferred. The "small" is defined by that the argument size is less than a predefined value. Currently it is 1024. I'm not sure whether it is a good one, and that is an open question. Another question is, do we need to make it configurable via an environment variable?

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D86307
2020-08-25 16:06:29 -04:00
Dimitry Andric 47b0262d3f Add <stdarg.h> include to kmp_os.h, to get the va_list type, required
after cde8f4c164. Sort system includes, while here.
2020-08-24 22:45:02 +02:00
Dimitry Andric cde8f4c164 Move special va_list handling to kmp_os.h
Instead of copying and pasting the same `#ifdef` expressions in multiple
places, define a type and a pair of macros in `kmp_os.h`, to handle
whether `va_list` is pointer-like or not:

* `kmp_va_list` is the type to use for `__kmp_fork_call()`
* `kmp_va_deref()` dereferences a `va_list`, if necessary
* `kmp_va_addr_of()` takes the address of a `va_list`, if necessary

Also add FreeBSD to the list of OSes that has a non pointer-like
va_list. This can now be easily extended to other OSes too.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D86397
2020-08-24 22:31:56 +02:00
AndreyChurbanov d0f4f5a182 [OpenMP] Check if _MSC_VER is defined before using it
Patch by mati865@gmail.com

Differential Revision: https://reviews.llvm.org/D86448
2020-08-24 17:50:38 +03:00
Shilei Tian f93b42a629 [NFC][OpenMP] Remove outdated comments about potential issues
The issue mentioned has been fixed in D84996
2020-08-24 01:21:06 +00:00
Shilei Tian 0289696751 [OpenMP] Introduce target memory manager
Target memory manager is introduced in this patch which aims to manage target
memory such that they will not be freed immediately when they are not used
because the overhead of memory allocation and free is very large. For CUDA
device, cuMemFree even blocks the context switch on device which affects
concurrent kernel execution.

The memory manager can be taken as a memory pool. It divides the pool into
multiple buckets according to the size such that memory allocation/free
distributed to different buckets will not affect each other.

In this version, we use the exact-equality policy to find a free buffer. This
is an open question: will best-fit work better here? IMO, best-fit is not good
for target memory management because computation on GPU usually requires GBs of
data. Best-fit might lead to a serious waste. For example, there is a free
buffer of size 1960MB, and now we need a buffer of size 1200MB. If best-fit,
the free buffer will be returned, leading to a 760MB waste.

The allocation will happen when there is no free memory left, and the memory
free on device will take place in the following two cases:
1. The program ends. Obviously. However, there is a little problem that plugin
library is destroyed before the memory manager is destroyed, leading to a fact
that the call to target plugin will not succeed.
2. Device is out of memory when we request a new memory. The manager will walk
through all free buffers from the bucket with largest base size, pick up one
buffer, free it, and try to allocate immediately. If it succeeds, it will
return right away rather than freeing all buffers in free list.

Update:
A threshold (8KB by default) is set such that users could control what size of memory
will be managed by the manager. It can also be configured by an environment variable
`LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD`.

Reviewed By: jdoerfert, ye-luo, JonChesterfield

Differential Revision: https://reviews.llvm.org/D81054
2020-08-19 23:12:23 -04:00
Shilei Tian 83c3d07994 [OpenMP] Refactored the function `DeviceTy::data_exchange`
This patch contains the following changes:
1. Renamed the function `DeviceTy::data_exchange` to `DeviceTy::dataExchange`;
2. Changed the second argument `DeviceTy DstDev` to `DeviceTy &DstDev`;
3. Renamed the last argument.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D86238
2020-08-19 16:08:14 -04:00
Jon Chesterfield 6e1b11087f [libomptarget][amdgpu] Support building with static rocm libraries 2020-08-19 15:44:30 +01:00
George Rokos 32ebdc70f3 [libomptarget][NFC] Sort list of plugins in chronological order
Differential Revision: https://reviews.llvm.org/D86082
2020-08-17 08:33:36 -07:00
Johannes Doerfert 5272d29e2c [OpenMP][CUDA] Keep one kernel list per device, not globally.
Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86039
2020-08-16 14:38:35 -05:00
Johannes Doerfert aa27cfc1e7 [OpenMP][CUDA] Cache the maximal number of threads per block (per kernel)
Instead of calling `cuFuncGetAttribute` with
`CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation,
we can do it for the first one and cache the result as part of the
`KernelInfo` struct. The only functional change is that we now expect
`cuFuncGetAttribute` to succeed and otherwise propagate the error.
Ignoring any error seems like a slippery slope...

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86038
2020-08-16 14:38:33 -05:00
Jon Chesterfield d0b312955f [libomptarget] Implement host plugin for amdgpu
[libomptarget] Implement host plugin for amdgpu

Replacement for D71384. Primary difference is inlining the dependency on atmi
followed by extensive simplification and bugfixes. This is the latest version
from https://github.com/ROCm-Developer-Tools/amd-llvm-project/tree/aomp12 with
minor patches and a rename from hsa to amdgpu, on the basis that this can't be
used by other implementations of hsa without additional work.

This will not build unless the ROCM_DIR variable is passed so won't break other
builds. That variable is used to locate two amdgpu specific libraries that ship
as part of rocm:
libhsakmt at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
libhsa-runtime64 at https://github.com/RadeonOpenCompute/ROCR-Runtime
These libraries build from source. The build scripts in those repos are for
shared libraries, but can be adapted to statically link both into this plugin.

There are caveats.
- This works well enough to run various tests and benchmarks, and will be used
  to support the current clang bring up
- It is adequately thread safe for the above but there will be races remaining
- It is not stylistically correct for llvm, though has had clang-format run
- It has suboptimal memory management and locking strategies
- The debug printing / error handling is inconsistent

I would like to contribute this pretty much as-is and then improve it in-tree.
This would be advantagous because the aomp12 branch that was in use for fixing
this codebase has just been joined with the amd internal rocm dev process.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85742
2020-08-15 23:58:28 +01:00
Joachim Protze 66a3575c28 [OpenMP] Fix releasing of stack memory
Starting with 787eb0c637 I got spurious segmentation faults for some testcases. I could nail it down to `brel` trying to release the "memory" of the node allocated on the stack of __kmpc_omp_wait_deps. With this patch, you will see the assertion triggering for some of the tests in the test suite.

My proposed solution for the issue is to just patch __kmpc_omp_wait_deps:
```
  __kmp_init_node(&node);
-  node.dn.on_stack = 1;
+  // the stack owns the node
+  __kmp_node_ref(&node);
```

What do you think?

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D84472
2020-08-14 10:32:53 +02:00
Joel E. Denny 518a27e559 [OpenMP] Fix ref count dec for implicit map of partial data
D85342 broke this case.  The new test case presents an example.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D85369
2020-08-06 11:39:29 -04:00
Joel E. Denny 8c8bb128df [OpenMP] Fix `target data` exit for array extension
For example:

```
 #pragma omp target data map(tofrom:arr[0:100])
 {
   #pragma omp target exit data map(delete:arr[0:100])
   #pragma omp target enter data map(alloc:arr[98:2])
 }
```

Without this patch, the transfer at the end of the target data region
is broken and fails depending on the target device.  According to my
read of the spec, the transfer shouldn't even be attempted because
`arr[0:100]` isn't (fully) present there.  To fix that, this patch
makes `DeviceTy::getTgtPtrBegin` return null for this case.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D85342
2020-08-05 16:51:25 -04:00
Joel E. Denny 41b1aefecb [OpenMP] Fix `present` diagnostic for array extension
For example, without this patch, the following fails as expected with
or without the `present` modifier, but the `present` modifier doesn't
produce its usual diagnostic:

```
 #pragma omp target data map(alloc: arr[0:2])
 {
   #pragma omp target map(present, tofrom: arr[0:100]) // not fully present
   ;
 }
```

Reviewed By: grokos, vzakhari

Differential Revision: https://reviews.llvm.org/D85320
2020-08-05 16:51:24 -04:00
George Rokos 40470eb27a [libomptarget][NFC] Replace `%ld` with PRId64 for data of type int64_t.
The standard way of printing `int64_t` data is via the PRId64 macro, `ld`
is for `long int` and int64_t is not guaranteed to be typedef'ed as `long int`
on all platforms. E.g. on Windows we get mismatch warnings.

Differential Revision: https://reviews.llvm.org/D85353
2020-08-05 13:28:35 -07:00
Alexey Bataev 6780d5675b [LIBOMPTARGET]Fix order of mapper data for targetDataEnd function.
targetDataMapper function fills arrays with the mapping data in the
direct order. When this function is called by targetDataBegin or
tgt_target_update functions, it works as expected. But targetDataEnd
function processes mapped data in reverse order. In this case, the base
pointer might be deleted before the associated data is deleted. Need to
reverse data, mapped by mapper, too, since it always adds data that must
be deleted at the end of the buffer.
Fixes the test declare_mapper_target_update.cpp.
Also, reduces the memry fragmentation by preallocation the memory
buffers.

Differential Revision: https://reviews.llvm.org/D85216
2020-08-05 13:42:24 -04:00
Joel E. Denny 5ab43989c3 [OpenMP] Fix `omp target update` for array extension
OpenMP TR8 sec. 2.15.6 "target update Construct", p. 183, L3-4 states:

> If the corresponding list item is not present in the device data
> environment and there is no present modifier in the clause, then no
> assignment occurs to or from the original list item.

L10-11 states:

> If a present modifier appears in the clause and the corresponding
> list item is not present in the device data environment then an
> error occurs and the program termintates.

(OpenMP 5.0 also has the first passage but without mention of the
present modifier of course.)

In both passages, I assume "is not present" includes the case of
partially but not entirely present.  However, without this patch, the
target update directive misbehaves in this case both with and without
the present modifier.  For example:

```
 #pragma omp target enter data map(to:arr[0:3])
 #pragma omp target update to(arr[0:5]) // might fail on data transfer
 #pragma omp target update to(present:arr[0:5]) // might fail on data transfer
```

The problem is that `DeviceTy::getTgtPtrBegin` does not return a null
pointer in that case, so `target_data_update` sees the data as fully
present, and the data transfer then might fail depending on the target
device.  However, without the present modifier, there should never be
a failure.  Moreover, with the present modifier, there should always
be a failure, and the diagnostic should mention the present modifier.

This patch fixes `DeviceTy::getTgtPtrBegin` to return null when
`target_data_update` is the caller.  I'm wondering if it should do the
same for more callers.

Reviewed By: grokos, jdoerfert

Differential Revision: https://reviews.llvm.org/D85246
2020-08-05 10:03:31 -04:00
Joel E. Denny 002d61db2b [OpenMP] Fix `present` for exit from `omp target data`
Without this patch, the following example fails but shouldn't
according to OpenMP TR8:

```
 #pragma omp target enter data map(alloc:i)
 #pragma omp target data map(present, alloc: i)
 {
   #pragma omp target exit data map(delete:i)
 } // fails presence check here
```

OpenMP TR8 sec. 2.22.7.1 "map Clause", p. 321, L23-26 states:

> If the map clause appears on a target, target data, target enter
> data or target exit data construct with a present map-type-modifier
> then on entry to the region if the corresponding list item does not
> appear in the device data environment an error occurs and the
> program terminates.

There is no corresponding statement about the exit from a region.
Thus, the `present` modifier should:

1. Check for presence upon entry into any region, including a `target
   exit data` region.  This behavior is already implemented correctly.

2. Should not check for presence upon exit from any region, including
   a `target` or `target data` region.  Without this patch, this
   behavior is not implemented correctly, breaking the above example.

In the case of `target data`, this patch fixes the latter behavior by
removing the `present` modifier from the map types Clang generates for
the runtime call at the end of the region.

In the case of `target`, we have not found a valid OpenMP program for
which such a fix would matter.  It appears that, if a program can
guarantee that data is present at the beginning of a `target` region
so that there's no error there, that data is also guaranteed to be
present at the end.  This patch adds a comment to the runtime to
document this case.

Reviewed By: grokos, RaviNarayanaswamy, ABataev

Differential Revision: https://reviews.llvm.org/D84422
2020-08-05 10:03:31 -04:00
Adrian Pop bf2aa74e51 [OpenMP] support build on msys2/mingw with clang or gcc
RTM Adaptive Locks are supported on msys2/mingw for clang and gcc.

Differential Revision: https://reviews.llvm.org/D81776
2020-08-04 23:15:36 +03:00
AndreyChurbanov 4a04bc8995 [OpenMP] Don't use MSVC workaround with MinGW
Patch by mati865@gmail.com

Differential Revision: https://reviews.llvm.org/D85210
2020-08-04 18:48:25 +03:00
David Blaikie 0c938a8dd8 OpenMP: Fix typo variabls -> variables 2020-08-03 17:00:15 -07:00
Shilei Tian f2400f024d [OpenMP] Fixed the issue that target memory deallocation might be called when they're being used
This patch fixed the issue that target memory might be deallocated when
they're still being used or before they're used.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84996
2020-07-31 18:54:18 -04:00
Joachim Protze 03116a9f8c [OpenMP] Use weak attribute in interface only for static library
This is to address the issue reported at:
https://bugs.llvm.org/show_bug.cgi?id=46863

Since weak is meaningless for a shared library interface function, this patch
disables the attribute, when the OpenMP library is built as shared library.

ompt_start_tool is not an interface function, but a internally called function
possibly implemented by an OMPT tool.
This function needs to be weak if possible to allow overwriting ompt_start_tool
with a function implementation built into the application.

Differential Revision: https://reviews.llvm.org/D84871
2020-07-31 12:29:05 +02:00
Shilei Tian 0f10165626 [OpenMP] Refactored the function `targetDataEnd`
Refactored the function `targetDataEnd` to make preparation of fixing
the issue of ahead-of-time target memory deallocation. This patch only
renamed `targetDataEnd` related variables and functions to conform
with LLVM code standard.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84991
2020-07-30 21:39:26 -04:00
Shilei Tian 8218eee269 [OpenMP] Refactored the function `target`
Refactored the function `target` to make preparation for fixing the
issue of ahead-of-time device memory deallocation.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84816
2020-07-30 21:05:55 -04:00
Alexey Bataev 622e46156d [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region.
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84767
2020-07-30 11:18:33 -04:00
Alexey Bataev b69357c2f4 Revert "[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region."
This reverts commit 142d0d3ed8 to
investigate undefined behavior revealed by buildbots.
2020-07-30 10:57:56 -04:00
Alexey Bataev 142d0d3ed8 [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region.
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.
It applies only for global pointers.

Differential Revision: https://reviews.llvm.org/D84767
2020-07-30 09:40:05 -04:00
Joel E. Denny cee52dd026 [OpenMP] Implement TR8 `present` motion modifier in runtime (2/2)
This patch implements OpenMP runtime support for the OpenMP TR8
`present` motion modifier for `omp target update` directives.  The
previous patch in this series implements Clang front end support.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D84712
2020-07-29 12:18:50 -04:00
Shilei Tian 30440924d4 [OpenMP] Replaced mutex lock/unlock in `target` with `std::lock_guard`
Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84799
2020-07-28 20:31:40 -04:00
Joel E. Denny 65564e5eaf Revert "[OpenMP] Implement TR8 `present` motion modifier in runtime (2/2)"
This reverts commit 2cb926a447.

It depends on 3c3faae497, which is being
reverted.
2020-07-28 20:30:05 -04:00
Shilei Tian 3ce69d4d50 [NFC][OpenMP] Renamed all variable and function names in `target` to conform with LLVM code standard
This patch only touched variables and functions in `target`.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84797
2020-07-28 20:11:09 -04:00
Joel E. Denny 2cb926a447 [OpenMP] Implement TR8 `present` motion modifier in runtime (2/2)
This patch implements OpenMP runtime support for the OpenMP TR8
`present` motion modifier for `omp target update` directives.  The
previous patch in this series implements Clang front end support.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D84712
2020-07-28 19:15:18 -04:00
Jinsong Ji d28f86723f Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support"
This reverts commit bf544fa1c3.

Fixed the typo in PPCInstrInfo.cpp.
2020-07-28 14:00:11 +00:00
Joel E. Denny 9b4826d18b [OpenMP] Fix libomptarget negative tests to expect abort
On runtime failures, D83963 causes the runtime to abort instead of
merely exiting with a non-zero value, but many tests in the
libomptarget test suite still expect the former behavior.  This patch
updates the test suite and was discussed in post-commit comments on
D83963 and D84557.
2020-07-28 09:02:16 -04:00
Joachim Protze e2f5444c9c [OpenMP][Tests] Enable nvptx64 testing for most libomptarget tests
Also add $BUILD/lib to the LIBRARY_PATH to fix
https://bugs.llvm.org/show_bug.cgi?id=46836.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D84557
2020-07-28 11:08:24 +02:00
Jinsong Ji bf544fa1c3 Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support"
This reverts commit adffce7153.

This is breaking test-suite, revert while investigation.
2020-07-27 21:07:00 +00:00
Ye Luo 9323166601 [OpenMP] Add more pass-through functions in DeviceTy
Summary:
1. Add DeviceTy::data_alloc, DeviceTy::data_delete, DeviceTy::data_alloc, DeviceTy::synchronize pass-through functions. Avoid directly accessing Device.RTL
2. Fix the type of the first argument of synchronize_ty in rth.h, device id is int32_t which is consistent with other functions.

Reviewers: tianshilei1992, jdoerfert

Reviewed By: tianshilei1992

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D84487
2020-07-27 16:08:30 -04:00
Jinsong Ji adffce7153 [PowerPC] Remove QPX/A2Q BGQ/BGP CNK support
Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html
no one is making use of QPX/A2Q/BGQ/BGP CNK anymore.

This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang,
CNK support in openmp/polly.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D83915
2020-07-27 19:24:39 +00:00
Johannes Doerfert 9c87466c39 [OpenMP] Use `abort` not `error` for fatal runtime exceptions
See PR46515 for the rational but generally, we want to *really* abort
not gracefully shut down.

Reviewed By: grokos, ABataev

Differential Revision: https://reviews.llvm.org/D83963
2020-07-24 15:15:38 -05:00
David Truby bb099c87ab [openmp] Don't copy exports into the source folder by default.
Additionally fix the copy if enabled on multi-config targets.

Summary:
This changes the copy command for libomp.so to use the output of the target as
the source of the copy, rather than trying to find it based on
${LIBOMP_LIBRARY_DIR}, which appears to be incorrect in multi-config generator
builds.

Reviewers: jdoerfert

Subscribers: mgorny, yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D84148
2020-07-24 14:34:50 +01:00
Shilei Tian c0185dc7df Revert "[OpenMP] Wait for kernel prior to memory deallocation"
This reverts commit 9b2832c089.
2020-07-22 23:03:36 -04:00
Shilei Tian 9b2832c089 [OpenMP] Wait for kernel prior to memory deallocation
Summary:
In the function `target`, memory deallocation and `target_data_end` is called
immediately returning from launching kernel. This might cause a race condition
that the corresponding memory is still being used by the kernel and a potential
issue that when the kernel starts to execute, its required data have already
been deallocated, especially when multiple kernels running concurrently. Since
nevertheless, we will block the thread issuing the target offloading at the end
of the target, we just move the synchronization ahead a little bit to make sure
the correctness.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D84381
2020-07-22 22:55:34 -04:00
Louis Dionne afa1afd410 [CMake] Bump CMake minimum version to 3.13.4
This upgrade should be friction-less because we've already been ensuring
that CMake >= 3.13.4 is used.

This is part of the effort discussed on llvm-dev here:

  http://lists.llvm.org/pipermail/llvm-dev/2020-April/140578.html

Differential Revision: https://reviews.llvm.org/D78648
2020-07-22 14:25:07 -04:00
Joel E. Denny 708752b2f6 [OpenMP] Implement TR8 `present` map type modifier in runtime (2/2)
This implements OpenMP runtime support for the OpenMP TR8 `present`
map type modifier.  The previous patch in this series implements Clang
front end support.  See that patch summary for behaviors that are not
yet supported.

Reviewed By: grokos, jdoerfert

Differential Revision: https://reviews.llvm.org/D83062
2020-07-22 14:04:58 -04:00
Joel E. Denny fc247c8f3c Revert "[OpenMP] Implement TR8 `present` map type modifier in runtime (2/2)"
This reverts commit 45b8f7ec35.

It attempts to use debug macros `DPxMOD` and `DPxPTR` in release
builds.  Will fix and reapply later.
2020-07-22 11:22:08 -04:00
Joel E. Denny 45b8f7ec35 [OpenMP] Implement TR8 `present` map type modifier in runtime (2/2)
This implements OpenMP runtime support for the OpenMP TR8 `present`
map type modifier.  The previous patch in this series implements Clang
front end support.  See that patch summary for behaviors that are not
yet supported.

Reviewed By: grokos, jdoerfert

Differential Revision: https://reviews.llvm.org/D83062
2020-07-22 10:15:32 -04:00
Joachim Protze ae31d7838c [OpenMP][NFC] pass on env variables to libomptarget tests 2020-07-22 12:14:45 +02:00
Saiyedul Islam 741e55aeed [OpenMP] Temporarily disable failing runtime tests for clang-12
Following tests were disabled for clang-11 after upgrading to
version 5.0 in D82963:

1. openmp/runtime/test/env/kmp_set_dispatch_buf.c
2. openmp/runtime/test/worksharing/for/kmp_set_dispatch_buf.c

They are also failing for clang-12. Thus this temporary disabling
until they are fixed.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D84241
2020-07-21 15:32:46 +00:00
AndreyChurbanov 617787ea77 [OpenMP] add missed REQUIRES:ompt for 2 OMPT tests 2020-07-21 16:31:17 +03:00
AndreyChurbanov 5a8779169e [OpenMP] libomp build fix without OMPT_SUPPORT 2020-07-21 16:03:17 +03:00
AndreyChurbanov 917f842159 [OpenMP] libomp cleanup: add checks of bad memory access
Add check of frm to prevent array out-of-bound access;
add check of new_nproc to prevent access of unallocated hot_teams array;
add check of location info pointer to prevent NULL dereference;
add check of d_tn pointer to prevent NULL dereference in release build.
These checks make static analyzers happier.

This is second part of the patch from https://reviews.llvm.org/D84062.
2020-07-21 00:12:46 +03:00
AndreyChurbanov 787eb0c637 [OpenMP] libomp cleanup: add check of input global tid parameter
Add check of negative gtid before indexing __kmp_threads.
This makes static analyzers happier.
This is the first part of the patch split in two parts.

Differential Revision: https://reviews.llvm.org/D84062
2020-07-20 23:49:58 +03:00
Joachim Protze f226171429 [OpenMP][Tests][NFC] Mark compatibility with older versions of clang 2020-07-20 13:53:29 +02:00
AndreyChurbanov 86fb2db49b [OpenMP] libomp cleanup: check presence of hwloc objects CORE, PACKAGE
hwloc documentation guarantees the only object that is always present
in the topology is PU. We can check the presence of other objects
in the topology, just in case.

Differential Revision: https://reviews.llvm.org/D84065
2020-07-18 01:15:37 +03:00
AndreyChurbanov 62d88a1c79 [OpenMP] libomp: add itt notifications for teams construct on host
Add barrier/region notification for parallel inside teams construct
when number of teams is 1, as VTune only shows outer level regions for
simplicity.

Differential Revision: https://reviews.llvm.org/D84024
2020-07-17 21:10:25 +03:00
serge-sans-paille 515bc8c155 Harmonize Python shebang
Differential Revision: https://reviews.llvm.org/D83857
2020-07-16 21:53:45 +02:00
AndreyChurbanov ffd8f00931 [openmp] libomp: added itt notifications for task, taskwait, taskgroup
Add releasing->acquire edges for child task->taskwait and
child task->end of taskgroup.

Differential Revision: https://reviews.llvm.org/D83804
2020-07-16 14:28:46 +03:00
George Rokos 140ab574a1 [OpenMP][Offload] Declare mapper runtime implementation
Libomptarget patch adding runtime support for "declare mapper".
Patch co-developed by Lingda Li and George Rokos.

Differential revision: https://reviews.llvm.org/D68100
2020-07-15 18:11:43 -07:00
Johannes Doerfert 5937434677 [OpenMP] Silence unused symbol warning with proper ifdefs 2020-07-11 11:57:42 -05:00
Johannes Doerfert c98699582a [OpenMP][NFC] Remove unused (always fixed) arguments
There are various runtime calls in the device runtime with unused, or
always fixed, arguments. This is bad for all sorts of reasons. Clean up
two before as we match them in OpenMPOpt now.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D83268
2020-07-11 00:51:51 -05:00
Johannes Doerfert cd0ea03e6f [OpenMP][NFC] Remove unused and untested code from the device runtime
Summary:
We carried a lot of unused and untested code in the device runtime.
Among other reasons, we are planning major rewrites for which reduced
size is going to help a lot.

The number of code lines reduced by 14%!

Before:
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CUDA                            13            489            841           2454
C/C++ Header                    14            322            493           1377
C                               12            117            124            559
CMake                            4             64             64            262
C++                              1              6              6             39
-------------------------------------------------------------------------------
SUM:                            44            998           1528           4691
-------------------------------------------------------------------------------

After:
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CUDA                            13            366            733           1879
C/C++ Header                    14            317            484           1293
C                               12            117            124            559
CMake                            4             64             64            262
C++                              1              6              6             39
-------------------------------------------------------------------------------
SUM:                            44            870           1411           4032
-------------------------------------------------------------------------------

Reviewers: hfinkel, jhuber6, fghanim, JonChesterfield, grokos, AndreyChurbanov, ye-luo, tianshilei1992, ggeorgakoudis, Hahnfeld, ABataev, hbae, ronlieb, gregrodgers

Subscribers: jvesely, yaxunl, bollu, guansong, jfb, sstefan1, aaron.ballman, openmp-commits, cfe-commits

Tags: #clang, #openmp

Differential Revision: https://reviews.llvm.org/D83349
2020-07-10 19:09:41 -05:00
Joachim Protze 0fa0cf8638 [OpenMP][Tests] Update compatibility with GCC (NFC)
Commit 95a28df5c provided implementation for GOMP*_nonmonotonic*runtime*
functions. Now the tests succeed with gcc 9 and 10
2020-07-08 00:27:19 +02:00
Ye Luo c5348aecd7 [OpenMP] Use primary context in CUDA plugin
Summary:
Retaining per device primary context is preferred to creating a context owned by the plugin.

From CUDA documentation
1. Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used." from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html
2. Right under cuCtxCreate. In most cases it is recommended to use cuDevicePrimaryCtxRetain. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf
3. The primary context is unique per device and shared with the CUDA runtime API. These functions allow integration with other libraries using CUDA.  https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX

Two issues are addressed by this patch:
1. Not using the primary context caused interoperability issue with libraries like cublas, cusolver. CUBLAS_STATUS_EXECUTION_FAILED and cudaErrorInvalidResourceHandle
2. On OLCF summit, "Error returned from cuCtxCreate" and "CUDA error is: invalid device ordinal"

Regarding the flags of the primary context. If it is inactive, we set CU_CTX_SCHED_BLOCKING_SYNC. If it is already active, we respect the current flags.

Reviewers: grokos, ABataev, jdoerfert, protze.joachim, AndreyChurbanov, Hahnfeld

Reviewed By: jdoerfert

Subscribers: openmp-commits, yaxunl, guansong, sstefan1, tianshilei1992

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82718
2020-07-07 10:14:51 -04:00
Saiyedul Islam 38d6640ba5 [libomptarget] Implement atomic inc and fence functions for AMDGCN using clang builtins
This function uses __builtin_amdgcn_atomic_inc32():
  uint32_t atomicInc(uint32_t *address, uint32_t max);

These functions use __builtin_amdgcn_fence():
__kmpc_impl_threadfence()
__kmpc_impl_threadfence_block()
__kmpc_impl_threadfence_system()

They will take place of current mechanism of directly calling IR functions.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D83132
2020-07-07 06:36:25 +00:00
Peyton, Jonathan L 95a28df5c4 [OpenMP] Add GOMP 5.0 loop entry points
This patch adds missing GOMP_5.0 loop entry points which incorporate
new non-monotonic default into entry point name.  Since monotonic
schedules are a subset of nonmonotonic, it is acceptable to use
monotonic as the implementation.  This patch simply has the nonmonotonic
(and possibly non-monontonic) versions of the loop entry points as
wrappers around the monotonic ones.

Differential Revision: https://reviews.llvm.org/D73922
2020-07-06 17:22:26 -05:00
Joachim Protze 6d9626d2da [OpenMP][Tests] Fix/Mark compatibilty for GCC
Reviewed by: Hahnfeld, saiislam

Differential Revision: https://reviews.llvm.org/D82267
2020-07-06 23:56:09 +02:00
Saiyedul Islam 4c4bda1630 [OpenMP] Temporarily disable failing runtime tests for OpenMP 5.0
Following tests are failing after upgrading to version 5.0 but are passing
for version 4.5:
1. openmp/runtime/test/env/kmp_set_dispatch_buf.c
2. openmp/runtime/test/worksharing/for/kmp_set_dispatch_buf.c

To be enabled as soon as these tests are fixed.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D82963
2020-07-06 14:04:43 +00:00
Joachim Protze 8289f2891e [OpenMP][Tests] Flag compatibility of OpenMP runtime tests with GCC versions
If the compilation fails, the test is marked as unsupported.
-> This will never change for a specific version of gcc

If the linking fails, the test is marked as expected to fail.
-> This might change as LLVM/OpenMP implements the missing GOMP interface function

Reviewed by: Hahnfeld

Differential Revision: https://reviews.llvm.org/D83077
2020-07-05 22:49:54 +02:00
Joachim Protze 30205865d9 [OpenMP][OMPT] Fix ifdefs for OMPT code
Fixes build with LIBOMP_OMPT_SUPPORT=off

Reported by: Jason Edson

Reviewed by: Hahnfeld

Differential Revision: https://reviews.llvm.org/D83171
2020-07-05 22:39:25 +02:00
Fangrui Song 6ba4380ed6 [libomptarget][test] Fix text relocations by adding -fPIC 2020-07-05 12:51:28 -07:00
Joachim Protze 3fc97f9636 [OpenMP][Tests] NFC use type macro in printf 2020-07-05 09:17:18 +02:00
Joachim Protze 47cb8a0f0b [OpenMP][OMPT]Add event callbacks for taskwait with depend
This adds the missing event callbacks to express dependencies on included tasks
and taskwait with depend clause.

The test fails for GCC, see bug report:
https://bugs.llvm.org/show_bug.cgi?id=46573

Reviewed by: hbae

Differential Revision: https://reviews.llvm.org/D81891
2020-07-03 09:58:31 +02:00
Jonas Hahnfeld 0e0483bf5c [OpenMP][CMake] Fix version detection of testing compiler
When configuring in-tree, the correct names are LLVM_VERSION_MAJOR
and LLVM_VERSION_MINOR. This has been wrong since the code was added
in commits fc473dee98 and 821649229e.
2020-07-02 19:39:30 +02:00
Ye Luo 45bb073da8 [OpenMP] fix clang warning about printf format in CUDA plugin
Summary: Warnings are printed by clang when building LIBOMPTARGET_ENABLE_DEBUG=ON due incorrect format string.

Reviewers: tianshilei1992, jdoerfert

Reviewed By: tianshilei1992

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82789
2020-06-29 22:35:39 -04:00
AndreyChurbanov 7f3d9cc1c0 [openmp][NFC] Cleanup: guard __kmp_mic_type by KMP_MIC_SUPPORTED macro.
Differential Revision: https://reviews.llvm.org/D82301
2020-06-29 14:14:56 +03:00
Joachim Protze d4230c67bf [OpenMP][Tool] Fix buffer overflow in ompt-multiplex.h
Reviewed by: runlieb

Differential Revision: https://reviews.llvm.org/D82452
2020-06-29 12:44:33 +02:00
Han Zhu 1eaebe192f [openmp] Use config.test_extra_flags in archer and multiplex tests
Summary:
`config.test_extra_flags` is passed in from `lit.site.cfg.in` files, but they're not used in the LIT configs. This variable can be useful for distros which don't have the standard c/c++ headers in the default search paths. Since the tests run clang on c/c++ source code, we rely on `test_extra_flags` to pass in the necessary header files.

This is a similar setup that's also done in litomptarget https://github.com/llvm/llvm-project/blob/master/openmp/libomptarget/test/lit.cfg#L42 and openmp/runtime.

Reviewers: jdoerfert, jdenny, protze.joachim

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82516
2020-06-25 11:58:52 -07:00
Ye Luo 6e5f64c44f [OpenMP] Adopt std::set in HostDataToTargetMap
Summary:
lookupMapping took significant time due to linear complexity searching.
This is bad for offloading from multiple host threads because lookupMapping is protected by mutex.
Use std::set for logarithmic complexity searching.

Before my change.
libomptarget inclusive time 16.7 sec, exclusive time 8.6 sec.
After the change
libomptarget inclusive time 7.3 sec, exclusive time 0.4 sec.

Most of the overhead of libomptarget (exclusive time) is gone.

Reviewers: jdoerfert, grokos

Reviewed By: grokos

Subscribers: tianshilei1992, yaxunl, guansong, sstefan1

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82264
2020-06-24 12:22:45 -04:00
Joachim Protze 73b7ff4e16 [OpenMP] NFC: Create OpenMP release notes file 2020-06-24 13:42:32 +02:00
Joachim Protze 63a3c5925d [OpenMP][OMPT] Pass mutexinoutset to the tool
Adds OMPT support for the mutexinoutset dependency

Reviewed by: hbae

Differential Revision: https://reviews.llvm.org/D81890
2020-06-19 12:51:18 +02:00
Shilei Tian aaf50adb53 Revert "[OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info"
This reverts commit ee1bf45e1d.
2020-06-17 15:01:16 -04:00
Shilei Tian ee1bf45e1d [OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info
DeviceID is added for some cases that we only have the __tgt_async_info but do
not know its corresponding device id. However, to communicate with target
plugins, we need that information.

Event is added for another way to synchronize.
2020-06-17 14:29:09 -04:00
Alexey Bataev 08029595ca [OPENMP]Fix overflow during counting the number of iterations.
Summary:
The OpenMP loops are normalized and transformed into the loops from 0 to
max number of iterations. In some cases, original scheme may lead to
overflow during calculation of number of iterations. If it is unknown,
if we can end up with overflow or not (the bounds are not constant and
  we cannot define if there is an overflow), cast original type to the
  unsigned.

Reviewers: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, openmp-commits, cfe-commits, caomhin

Tags: #clang, #openmp

Differential Revision: https://reviews.llvm.org/D81881
2020-06-17 08:47:01 -04:00
Joachim Protze 8580af3f7d subdirectories should not use cmake project command 2020-06-17 09:38:56 +02:00
Joachim Protze e9b8ed1fd7 [OpenMP][Tool] Header-only multiplexing of OMPT tools
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76012
2020-06-17 09:16:46 +02:00
Joachim Protze cbea36903e [OpenMP][OMPT] Add callbacks for doacross loops
Adds the callbacks for ordered with source/sink dependencies.

The test for task dependencies changed, because callbach.h now actually prints
the passed dependencies and the test also checks for the address.

Reviewed by: hbae

Differential Revision: https://reviews.llvm.org/D81807
2020-06-16 16:53:40 +02:00
Joachim Protze 9e5aefc5f9 [OpenMP][Tests] fix data race in an OpenMP runtime test
Reviewed by: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D81804
2020-06-15 18:48:35 +02:00
Joachim Protze d056d7592a [OpenMP][Tool] Extend reuse of OMPT testing
This patch allows to specify a prefix (default:empty) to be included into print-out
written by callback.h.
Also adding a cmake target to find the header file from other tests.

Reviewed by: jdoerfert

Differential Revision: https://reviews.llvm.org/D76008
2020-06-14 15:55:32 +02:00
Joachim Protze add8d90cb3 [OpenMP] support alloc of serialized tasks
Reviewed by: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D81497
2020-06-14 15:55:32 +02:00
Joachim Protze e7577d1d76 Remove mention of counter from Archer readme
The feature was removed before upstreaming Archer, so the documentation is wrong
2020-06-05 14:31:03 +02:00
Shilei Tian a014fbbc21 [OpenMP] Improve D2D memcpy to use more efficient driver API
Summary:
In current implementation, D2D memcpy is first to copy data back to host and then
copy from host to device. This is very efficient if the device supports D2D
memcpy, like CUDA.

In this patch, D2D memcpy will first try to use native supported driver API. If
it fails, fall back to original way. It is worth noting that D2D memcpy in this
scenerio contains two ideas:
- Same devices: this is the D2D memcpy in the CUDA context.
- Different devices: this is the PeerToPeer memcpy in the CUDA context.
My implementation merges this two parts. It chooses the best API according to
the source device and destination device.

Reviewers: jdoerfert, AndreyChurbanov, grokos

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D80649
2020-06-04 16:59:06 -04:00
AndreyChurbanov abe64360ae [openmp] Fixed nonmonotonic schedule implementation.
Differential Revision: https://reviews.llvm.org/D80942
2020-06-04 15:39:45 +03:00
Joachim Protze 10995c77b4 [OpenMP][OMPT] Fix and add event callbacks for detached tasks
The OpenMP spec has the task-fulfill event for a call to omp_fulfill_event.
If the task did not yet finish execution, ompt_task_early_fulfill is used,
otherwise ompt_task_late_fulfill.
If a task does not complete, when the execution finishes (i.e., the task goes
in detached mode), ompt_task_detach instead of ompt_task_complete must be
used, when the next task is scheduled.

A test for both cases is included, which only work with clang-11+

Reviewed By: hbae

Differential revision: https://reviews.llvm.org/D80843
2020-06-02 09:52:40 +02:00
AndreyChurbanov 5e111c5df8 [openmp] Fixed taskloop recursive splitting so that taskloop tasks have
same parent tasks.

Differential Revision: https://reviews.llvm.org/D80577
2020-06-01 17:51:02 +03:00
Joachim Protze 3895148d7c [OpenMP] Fix a race in task queue reallocation
__kmp_realloc_task_deque implicitly assumes, that the task queue is full
(ntasks == size), therefore tail = size in line 319.
An assertion is added to document this assumption.

The first check for a full queue is before the locking and might not hold
when the lock is taken. So, we need to check again for this condition when
we have the lock.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D80480
2020-05-25 10:23:22 +02:00
AndreyChurbanov 57d8b8d6f0 [openmp] Fixed hang if detached task was serialized.
The patch fixes https://bugs.llvm.org/show_bug.cgi?id=45904.

Differential Revision: https://reviews.llvm.org/D79944
2020-05-18 15:32:13 +03:00
Joachim Protze d23131a3c0 [OpenMP] Fix race condition in the completion/freeing of detached tasks
Spurious assertion failures are symptoms of a race condition for the handling
of detached tasks:
Assertion failure at kmp_tasking.cpp(3744): taskdata->td_flags.complete == 1.
Assertion failure at kmp_tasking.cpp(710): taskdata->td_flags.executing == 0.

in the case of detach=true, all accesses to taskdata in __kmp_task_finish need
to happen before (~line 873):

taskdata->td_flags.proxy = TASK_PROXY;

This assignment signals to __kmp_fulfill_event, that the task will need to be
freed there. So, conceptionally the ownership of taskdata is moved.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D79702
2020-05-17 12:28:38 +02:00
Manoel Roemmer 6b9e43c67e [Openmp][VE] Libomptarget plugin for NEC SX-Aurora
This patch adds a libomptarget plugin for the NEC SX-Aurora TSUBASA Vector
Engine (VE target).  The code is largely based on the existing generic-elf
plugin and uses the NEC VEO and VEOSINFO libraries for offloading.

Differential Revision: https://reviews.llvm.org/D76843
2020-05-12 10:47:30 +02:00
Joel E. Denny dd5ba4b585 [OpenMP][NFC] Fix `not` sustitution in tests
D78566 introduced a `\bnot\b` lit substitution in OpenMP test suites.
However, that would corrupt a command like
`FileCheck -implicit-check-not` or any file name like `%t.not`.  We
could use lookbehind/lookahead assertions to avoid such cases, but
this patch switches to `%not` (suggested during the D78566 review) as
a safer option.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D79529
2020-05-11 14:53:48 -04:00
Shilei Tian cb038927ef [OpenMP] Fix an issue of wrong return type of DeviceRTLTy::getNumOfDevices
Summary: There is a typo in DeviceRTLTy::getNumOfDevices that the type of its return value is bool. It will lead to a problem of wrong device number returned from omp_get_num_devices.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D79255
2020-05-03 15:59:06 -04:00
Ron Lieberman ee9c53d271 [libomptarget] Initialize reference parameter IsNew within Device::getOrAllocTgtPtr
The two locals IsNew and Pointer_IsNew were uninitialized at declaration, and then passed by
reference to Device.getOrAllocTgtPtr which in turn did not assign on all
paths within the function. This resulted in occasional runtime failures in one application.
Device::getOrAllocTgtPtr will now initialize IsNew to false on entry to function.

Differential Revision: https://reviews.llvm.org/D78744
2020-04-24 15:33:37 -05:00
Joel E. Denny 5f6aa9680c [OpenMP] target_data_begin: fail on device alloc fail
Without this patch, target_data_begin continues after an illegal
mapping or an out-of-memory error on the device.  With this patch, it
terminates the runtime with an error instead.

The new test exercises only illegal mappings.  I didn't think of a
good way to exercise out-of-memory errors from the test suite.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D78170
2020-04-21 17:10:50 -04:00
Joel E. Denny ba942610f6 [OpenMP] Add scaffolding for negative runtime tests
Without this patch, the openmp project's test suites do not appear to
have support for negative tests.  However, D78170 needs to add a test
that an expected runtime failure occurs.

This patch makes `not` visible in all of the openmp project's test
suites.  In all but `libomptarget/test`, it should be possible for a
test author to insert `not` before a use of the lit substitution for
running a test program.  In `libomptarget/test`, that substitution is
target-specific, and its value is `echo` when the target is not
available.  In that case, inserting `not` before a lit substitution
would expect an `echo` fail, so this patch instead defines a separate
lit substitution for expected runtime fails.

Reviewed By: jdoerfert, Hahnfeld

Differential Revision: https://reviews.llvm.org/D78566
2020-04-21 17:10:50 -04:00
Bryan Chan b86ff5f6ef [OpenMP] Sync writes to child thread's data before reduction
On systems with weak memory consistency, this patch fixes an intermittent crash
in the reduction function called by __kmp_hyper_barrier_gather, which suffers
from a race on a child thread's data.

Reviewed-By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D77603
2020-04-14 14:31:06 -04:00
Shilei Tian 4031bb982b [OpenMP] Refined CUDA plugin to put all CUDA operations into class
Summary: Current implementation mixed everything up so that there is almost no encapsulation. In this patch, all CUDA related operations are put into a new class DeviceRTLTy and only necessary functions are exposed. In addition, all C++ code now conforms with LLVM code standard, keeping those API functions following C style.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: jfb, yaxunl, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77951
2020-04-13 13:32:46 -04:00
Shilei Tian feed674dec [OpenMP] Introduce stream pool to make sure the correctness of device synchr...
...onization

Summary: In previous patch, in order to optimize performance, we only synchronize once
for each target region. The syncrhonization is via stream synchronization.
However, in the extreme situation, the performce might be bad. Consider the
following case: There is a task that requires transferring huge amount of data
(call many times of data transferring function). It is scheduled to the first
stream. And then we have 255 very light tasks scheduled to the remaining 255
streams (by default we have 256 streams). They can be finished before we do
synchronization at the end of the first task. Next, we get another very huge
task. It will be scheduled again to the first stream. Now the first task
finishes its kernel launch and call stream synchronization. Right now, the
stream already contains two kernels, and the synchronization will wait until the
two kernels finish instead of just the first one for the first task.

In this patch, we introduce stream pool. After each synchronization, the stream
will be returned back to the pool to make sure that for each synchronization,
only expected operations are waited.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: gregrodgers, yaxunl, lildmh, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77412
2020-04-11 07:08:56 -04:00
Shilei Tian 03ff643d2e [OpenMP] Put old APIs back and added new _async series for backward compatibility
Summary: According to comments on bi-weekly meeting, this patch put back old APIs and added new `_async` series

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77822
2020-04-09 22:40:58 -04:00
Shilei Tian 32ed29271f [OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream
Summary:
This patch introduces two things for offloading:
1. Asynchronous data transferring: those functions are suffix with `_async`. They have one more argument compared with their synchronous counterparts: `__tgt_async_info*`, which is a new struct that only has one field, `void *Identifier`. This struct is for information exchange between different asynchronous operations. It can be used for stream selection, like in this case, or operation synchronization, which is also used. We may expect more usages in the future.
2. Optimization of stream selection for data mapping. Previous implementation was using asynchronous device memory transfer but synchronizing after each memory transfer. Actually, if we say kernel A needs four memory copy to device and two memory copy back to host, then we can schedule these seven operations (four H2D, two D2H, and one kernel launch) into a same stream and just need synchronization after memory copy from device to host. In this way, we can save a huge overhead compared with synchronization after each operation.

Reviewers: jdoerfert, ye-luo

Reviewed By: jdoerfert

Subscribers: yaxunl, lildmh, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77005
2020-04-07 14:55:47 -04:00
Kazuaki Ishizaki 4201679110 [OpenMP] NFC: Fix trivial typo
Differential Revision: https://reviews.llvm.org/D77430
2020-04-04 12:06:54 +09:00
Vitaly Buka c9ae3c5e10 [openmp] Disable tests flaky on Debian
https://bugs.llvm.org/show_bug.cgi?id=45397
2020-04-01 21:58:05 -07:00
JonChesterfield 09834f9761 [libomptarget][nfc] Move non-freestanding headers out of common
Summary:
[libomptarget][nfc] Move non-freestanding headers out of common

Lowers the bar for building deviceRTL.
Drops math.h entirely as it wasn't used and libm is a big dependency.

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: jdoerfert

Subscribers: jvesely, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77071
2020-03-31 23:43:18 +01:00
Alexey Bataev 0fca766458 [OPENMP50]Fix PR45117: Orphaned task reduction should be allowed.
Add support for orpahned task reductions.
2020-03-27 17:47:30 -04:00
Henry Kao 236ac68fa5 [OpenMP] Add memory barrier to solve data race
Data race occurs when acquiring lock for critical section
triggering assertion failure. Added barrier to ensure
all memory is commited before checking assertion.

Reviewed By: Hahnfeld

Differential Revision: https://reviews.llvm.org/D76780
2020-03-27 16:32:28 -04:00