Commit Graph

97 Commits

Author SHA1 Message Date
Shilei Tian ec97866454 [OpenMP] Make bug49334.cpp more reproducible
`bug49334.cpp` cannot detect data race in `libomptarget` efficiently. It
is reported that with `N = 256` and `BS = 16`, the data race can be reproduced
more steadily. The next coming pathces will fix it so this patch is expected to
fail now.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104552
2021-06-18 18:35:41 -04:00
Joseph Huber df965513a9 [OpenMP] Add an information flag for device data transfers
This patch adds an information flag that indicated when data is being copied to
and from the device. This will be helpful for finding redundant or unnecessary
data transfers in applications.

Reviewed By: jdoerfert, grokos

Differential Revision: https://reviews.llvm.org/D103927
2021-06-08 20:23:27 -04:00
Pushpinder Singh b0d68c7141 [AMDGPU][Libomptarget] Mark lambda_by_value test as XFAIL
Reason: Missing printf definition

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103078
2021-05-25 12:16:54 +00:00
George Rokos d0bc04d6b9 [libomptarget] Fix a bug whereby firstprivates are not copied over to the device
The check for the TO flag when processing firstprivates is missing. As a result,
sometimes the device copy of a firstprivate never gets initialized. Currectly we
try to force lambda structs to be allocated immediately by marking them as a
non-firstprivate, so that PrivateArgumentManagerTy::addArg allocates memory for
them immediately. However, calling addArg with IsFirstPrivate=false makes the
function skip initializing the device copy. Whether an argument is firstprivate
and whether we need to allocate memory immediately are not synonyms, so this
patch introduces one more control variable for immediate allocation and sets it
apart from initialization.

Differential Revision: https://reviews.llvm.org/D102890
2021-05-21 10:52:08 -07:00
Jon Chesterfield ea68ad6e26 [libomptarget] Disable test bug49334 on amdgpu
[libomptarget] Disable test bug49334 on amdgpu

Hangs on amdgpu, do not know why. Disable to unblock build.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D102017
2021-05-20 15:46:56 +01:00
Michael Kruse 34ed3e6337 [OpenMP] Test unified shared memory tests only on systems that support it.
Add a `REQUIRES: unified_shared_memory` option to tests that use `#pragma omp requires unified_shared_memory`.

For CUDA, the feature tag is derived from LIBOMPTARGET_DEP_CUDA_ARCH which itself is derived using [[ https://cmake.org/cmake/help/latest/module/FindCUDA.html#commands | cuda_select_nvcc_arch_flags ]]. The latter determines which compute capability the GPU in the system supports. To ensure that this is the CUDA arch being used, we could also set the `-Xopenmp-target -march=` flag.
In the absence of an NVIDIA GPU, LIBOMPTARGET_DEP_CUDA_ARCH will be 35. That is, in that case we are assuming unified_shared_memory is not available. CUDA plugin testing could be disabled entirely in this case, but this currently depends on `LIBOMPTARGET_CAN_LINK_LIBCUDA OR LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA`, not on whether the hardware is actually available.

For all other targets, nothing changes and we are assuming unified shared memory is available. This might need refinement if not the case.

This tries to fix the [[ http://meinersbur.de:8011/#/builders/143 | OpenMP Offloading Buildbot ]] that, although brand-new, only has a Pascal-generation (sm_61) GPU installed. Hence, tests that require unified shared memory are currently failing. I wish I had known in advance.

Reviewed By: protze.joachim, tianshilei1992

Differential Revision: https://reviews.llvm.org/D101498
2021-05-13 11:08:04 -05:00
Joseph Huber a15f8589f4 [libomptarget] Add support for target memory allocators to cuda RTL
Summary:
The allocator interface added in D97883 allows the RTL to allocate shared and
host-pinned memory from the cuda plugin. This patch adds support for these to
the runtime.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D102000
2021-05-07 10:27:02 -04:00
Pushpinder Singh ae845d6426 [AMDGPU][OpenMP] Enable Libomptarget runtime tests
This enables the runtime tests on amdgpu targets.
10 tests have been marked as XFAIL on amdgcn currently mostly due to
missing printf.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D99656
2021-05-03 05:56:42 +00:00
Michael Kruse 3244a8b536 [OpenMP][CMake] Pass --cuda-path to regression tests.
The OpenMP runtime can be compiled using a CUDA installed at non-default
location with the -DCUDA_TOOLKIT_ROOT_DIR setting. However, check-openmp
will fail afterwards because Clang needs to know where to find the CUDA
headers.

Fix by passing -cuda-path to Clang using the value of
CUDA_TOOLKIT_ROOT_DIR which has been determined by CMake. Also set
LD_LIBRARY_PATH such that it can find the cuda runtime when executing.
This will ensure that the regression test do not depend on the current
environment, but use the environment it was configured for.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D101266
2021-04-27 16:27:40 -05:00
Joachim Protze 24f836e8fd [OpenMP][libomptarget] Separate lit tests for different offloading targets (2/2)
This patch fuses the RUN lines for most libomptarget tests. The previous patch
D101315 created separate test targets for each supported offloading triple.

This patch updates the RUN lines in libomptarget tests to use a generic run
line independent of the offloading target selected for the lit instance.

In cases, where no RUN line was defined for a specific offloading target,
the corresponding target is declared as XFAIL. If it turns out that a test
actually supports the target, the XFAIL line can be removed.

Differential Revision: https://reviews.llvm.org/D101326
2021-04-27 15:54:32 +02:00
Joachim Protze b845217b1d [OpenMP][libomptarget] Separate lit tests for different offloading targets (1/2)
This patch creates a separate test directory for each offloading target to be
tested. This allows to test multiple architectures in one configuration, while
still see all failing tests separately. The lit test names include the target
triple, so that it will be easier to spot the failing target.

This patch also allows to mark expected failing tests based on the
target-triple, as the currently used triple is added to the lit "features":
```
// XFAIL: nvptx64-nvidia-cuda
```

Differential Revision: https://reviews.llvm.org/D101315
2021-04-27 12:30:01 +02:00
Joseph Huber 2b6f20082e [OpenMP] Add function for setting LIBOMPTARGET_INFO at runtime
Summary:
This patch adds a new runtime function __tgt_set_info_flag that allows the
user to set the information level at runtime without using the environment
variable. Using this will require an extern function, but will eventually be
added into an auxilliary library for OpenMP support functions.

This patch required moving the current InfoLevel to a global variable which must
be instantiated by each plugin.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D100774
2021-04-22 12:48:11 -04:00
Alexey Bataev ca70512099 [OPENMP]Mark test as unsupported to avoid possible unexpected passes,
NFC.
2021-04-22 08:06:25 -07:00
Giorgis Georgakoudis a2dbfb6b72 [OpenMP] Simplify offloading parallel call codegen
This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions.

Reviewed By: jdoerfert, Meinersbur

Differential Revision: https://reviews.llvm.org/D95976
2021-04-21 18:46:07 -07:00
Alexey Bataev 079884225a [OPENMP]Fix PR49698: OpenMP declare mapper causes segmentation fault.
The implicitly generated mappings for allocation/deallocation in mappers
runtime should be mapped as implicit, also no need to clear member_of
flag to avoid ref counter increment. Also, the ref counter should not be
incremented for the very first element that comes from the mapper
function.

Differential Revision: https://reviews.llvm.org/D100673
2021-04-21 10:38:31 -07:00
Joseph Huber 83d4b2e2e0 [OpenMP] Add info for device table changes
Summary:
This patch adds a feature to print information whenever the host-device pointer
mapping table is changed by inserting or removing an entry. This introduces a
new bit field for LIBOMPTARGET_INFO at position 0x8.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D100600
2021-04-15 18:39:48 -04:00
Hansang Bae 3da61ddae7 [OpenMP] Define omp_is_initial_device() variants in omp.h
omp_is_initial_device() is marked as a built-in function in the current
compiler, and user code guarded by this call may be optimized away,
resulting in undesired behavior in some cases. This patch provides a
possible fix for such cases by defining the routine as a variant
function and removing it from builtin list.

Differential Revision: https://reviews.llvm.org/D99447
2021-04-06 16:58:01 -05:00
Alexey Bataev 0411b23319 [OPENMP]Map data field with l-value reference types.
Added initial support dfor the mapping of the data members with l-value
reference types.

Differential Revision: https://reviews.llvm.org/D98812
2021-03-29 07:07:09 -07:00
Joseph Huber 807466ef28 [OpenMP] Restore backwards compatibility for libomptarget
Summary:
The changes introduced in D87946 changed the API for libomptarget
functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x
but was not given a backward-compatible interface. This change will
require people using Clang 13.x or 12.x to recompile their offloading
programs.

Reviewed By: jdoerfert cchen

Differential Revision: https://reviews.llvm.org/D98358
2021-03-11 09:52:11 -05:00
Joel E. Denny bfe5452b93 [OpenMP] Fix lone target exit data
Without this patch, an `omp target exit data` before the runtime is
initialized produces a runtime error.  This patch fixes that by
changing `__tgt_target_data_end_mapper` to call `CheckDeviceAndCtors`
like many other runtime routines.

Discussed at
<https://lists.llvm.org/pipermail/openmp-dev/2021-March/003920.html>.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D97907
2021-03-04 12:03:42 -05:00
Joel E. Denny 10c18c69f2 [OpenMP] Fix support for device as host
Without this patch, when the offload device is set to
`omp_get_initial_device()`, the runtime fails with an error diagnostic
when entering target regions or target data regions.

However, OpenMP 5.1, sec. 2.14.5 "target Construct", "Restrictions",
p. 203, L3-5 states:

> The device clause expression must evaluate to a non-negative integer
> value that is less than or equal to the value of
> omp_get_num_devices().

Sec. 3.7.7 "omp_get_initial_device", p. 412, L2-3 states:

> The value of the device number is the value returned by the
> omp_get_num_devices routine.

Similarly, OpenMP 5.0, sec. 2.12.5 "target Construct", "Restrictions",
p. 174 L30-32 states:

> The device clause expression must evaluate to a non-negative integer
> value less than the value of omp_get_num_devices() or to the value
> of omp_get_initial_device().

This patch fixes this behavior by changing the runtime to behave as if
offloading is disabled whenever it finds the offload device (either
from a `device` clause or the default device) is set to the host
device.  In the case of mandatory offloading when
`omp_get_num_devices() == 0`, it incorporates the behavior proposed
for OpenMP 5.2 in OpenMP spec github issue 2669.

Reviewed By: grokos, RaviNarayanaswamy

Differential Revision: https://reviews.llvm.org/D97616
2021-03-04 12:03:42 -05:00
Alexey Bataev 0caf736d7e [OPENMP50]Mapping of the subcomponents with the 'default' mappers.
If the mapped structure has data members, which have 'default' mappers,
need to map these members individually using their 'default' mappers.

Differential Revision: https://reviews.llvm.org/D92195
2021-03-02 07:11:06 -08:00
Shilei Tian e5da63d5a9 [OpenMP] Fixed a crash when offloading to x86_64 with target nowait
PR#49334 reports a crash when offloading to x86_64 with `target nowait`,
which is caused by referencing a nullptr. The root cause of the issue is, when
pushing a hidden helper task in `__kmp_push_task`, it also maps the gtid to its
shadow gtid, which is wrong.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97329
2021-02-24 12:37:30 -05:00
Joel E. Denny d2147b1a87 [OpenMP] Fix always,from and delete for data absent at exit
Without this patch, there's a runtime error for those map types at
exit from an "omp target data" or at "omp target exit data", but the
spec says the list item should be ignored.

This patch tests that fix in data_absent_at_exit.c, and it also
improves other testing for data that is not fully present at exit.

Reviewed By: grokos, RaviNarayanaswamy

Differential Revision: https://reviews.llvm.org/D96999
2021-02-19 11:09:26 -05:00
Alexey Bataev 60d71a286b [OPENMP50]Allow overlapping mapping in target constructs.
OpenMP 5.0 removed a lot of restriction for overlapped mapped items
comparing to OpenMP 4.5. Patch restricts the checks for overlapped data
mappings only for OpenMP 4.5 and less and reorders mapping of the
arguments so, that present and alloc mappings are processed first and
then all others.

Differential Revision: https://reviews.llvm.org/D86119
2021-02-16 14:42:08 -08:00
Shilei Tian b68a6b09e6 [OpenMP][libomptarget] Fixed an issue that device sync is skipped if the kernel doesn't have any argument
Currently if there is not kernel argument, device synchronization will
be skipped. This can lead to two issues:
1. If there is any device error, it will not be captured;
2. The target region might end before the kernel is done, which is not spec
   conformant.

The test added in this patch only runs on NVPTX platform, although it will not
be executed by Phab at all. It also requires `not` which is not available on most
systems.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D96067
2021-02-04 20:14:24 -05:00
Shilei Tian 0f0ce3c12e [OpenMP][NVPTX] Take functions in `deviceRTLs` as `convergent`
OpenMP device compiler (similar to other SPMD compilers) assumes that
functions are convergent by default to avoid invalid transformations, such as
the bug (https://bugs.llvm.org/show_bug.cgi?id=49021).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95971
2021-02-03 20:58:12 -05:00
Joseph Huber e4eaf9d820 [OpenMP] Add support for mapping names in mapper API
Summary:
The custom mapper API did not previously support the mapping names added previously. This means they were not present if a user requested debugging information while using the mapper functions. This adds basic support for passing the mapped names to the runtime library.

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D94806
2021-01-21 09:26:44 -05:00
Joseph Huber fe5d51a489 [OpenMP] Add using bit flags to select Libomptarget Information
Summary:
This patch adds more fine-grained support over which information is output from the libomptarget runtime when run with the environment variable LIBOMPTARGET_INFO set. An extensible set of flags can be used to pick and choose which information the user is interested in.

Reviewers: jdoerfert JonChesterfield grokos

Differential Revision: https://reviews.llvm.org/D93727
2021-01-04 12:03:15 -05:00
cchen 7036fe8a0c [libomptarget] Add support for target update non-contiguous
This patch is the runtime support for https://reviews.llvm.org/D84192.

In order not to modify the tgt_target_data_update information but still be
able to pass the extra information for non-contiguous map item (offset,
count, and stride for each dimension), this patch overload arg when
the maptype is set as OMP_TGT_MAPTYPE_DESCRIPTOR. The origin arg is for
passing the pointer information, however, the overloaded arg is an
array of descriptor_dim:

```
struct descriptor_dim {
  int64_t offset;
  int64_t count;
  int64_t stride
};
```

and the array size is the dimension size. In addition, since we
have count and stride information in descriptor_dim, we can replace/overload the
arg_size parameter by using dimension size.

Reviewed By: grokos, tianshilei1992

Differential Revision: https://reviews.llvm.org/D82245
2020-11-19 11:33:27 -06:00
Alexey Bataev dcde6f17fd Revert "[libomptarget] Add support for target update non-contiguous"
This reverts commit 6847bcec1a. It breaks
the build of libomptarget.
2020-11-10 07:49:00 -08:00
cchen 6847bcec1a [libomptarget] Add support for target update non-contiguous
This patch is the runtime support for https://reviews.llvm.org/D84192.

In order not to modify the tgt_target_data_update information but still be
able to pass the extra information for non-contiguous map item (offset,
count, and stride for each dimension), this patch overload arg when
the maptype is set as OMP_TGT_MAPTYPE_DESCRIPTOR. The origin arg is for
passing the pointer information, however, the overloaded arg is an
array of descriptor_dim:

```
struct descriptor_dim {
  int64_t offset;
  int64_t count;
  int64_t stride
};
```

and the array size is the dimension size. In addition, since we
have count and stride information in descriptor_dim, we can replace/overload the
arg_size parameter by using dimension size.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D82245
2020-11-06 20:55:33 -06:00
Shilei Tian f5eebc25cc [OpenMP] Fixed an issue in the test case parallel_offloading_map
There is a non-conforming use of variable-sized array in the test case `parallel_offloading_map.c`. This patch fixed it.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D90642
2020-11-03 15:59:16 -05:00
Joachim Protze 71041a8b6b [OpenMP][libomptarget][Tests] fix failing test
D88149 updated `omp_get_initial_device` behavior to conform with OpenMP 5.1.
omp_get_initial_device() == omp_get_num_devices()
2020-11-03 13:15:33 +01:00
Shilei Tian e20d64c3d9 [Clang][OpenMP] Fixed an issue of segment fault when using target nowait
The implementation of target nowait just wraps the target region into a task. The essential four parameters (base ptr, ptr, size, mapper) are taken as firstprivate such that they will be copied to the private location. When there is no user-defined mapper, the mapper variable will be nullptr. However, it will be still copied to the corresponding place. Therefore, a memcpy will be generated and the source pointer will be nullptr, causing a segmentation fault. The root cause is when calling `emitOffloadingArraysArgument`, the last argument `Options` has a field about whether it requires a task. It only takes depend clause into account. In this patch, the nowait clause is also included.

There're two things that will be done in another patches:
1. target data nowait has not been supported yet. D90099 added the support.
2. When there is no mapper, the mapper array can be nullptr no matter whether it requires outer task or not. It can avoid an unnecessary data copy. This is an optimization that is covered in D90101.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89844
2020-10-26 22:33:22 -04:00
George Rokos 5adb3a6d86 [libomptarget] Fix copy-to motion for PTR_AND_OBJ entries where PTR is a struct member.
This patch fixes a problem whereby the pointee object of a PTR_AND_OBJ entry with a `map(to)` motion clause can be overwritten on the device even if its reference counter is >=1.

Currently, we check the reference counter of the parent struct in order to determine whether the motion clause should be respected, but since the pointee object is not part of the struct, it's got its own reference counter which should be used to enqueue the copy or discard it.

The same behavior has already been implemented in targetDataEnd (omptarget.cpp:539-540), but we somehow missed doing the same in targetDataBegin.

Differential Revision: https://reviews.llvm.org/D89597
2020-10-16 16:14:01 -07:00
Joseph Huber ae209397b1 [OpenMP] Begin Printing Information Dumps In Libomptarget and Plugins
Summary:
This patch starts adding support for adding information dumps to libomptarget
and rtl plugins. The information printing is controlled by the
LIBOMPTARGET_INFO environment variable introduced in D86483. The goal of this
patch is to provide the user with additional information about the device
during kernel execution and providing the user with information dumps in the
case of failure. This patch added the ability to dump the pointer mapping table
as well as printing the number of blocks and threads in the cuda RTL.

Reviewers: jdoerfort gkistanova	ye-luo

Subscribers: guansong openmp-commits sstefan1 yaxunl ye-luo

Tags: #OpenMP

Differential Revision: https://reviews.llvm.org/D87165
2020-09-09 12:03:56 -04:00
Alexey Bataev 6aa7228a62 [LIBOMPTARGET]Do not try to optimize bases for the next parameters.
PrivateArgumentManager shall immediately allocate firstprivates if they
are bases for the next parameters and the next paramaters rely on the
fact that the base musst be allocated already.

Differential Revision: https://reviews.llvm.org/D86781
2020-08-28 15:46:31 -04:00
Shilei Tian 46e0ced762 [OpenMP] Fixed wrong test command in the test private_mapping.c
The test command in `private_mapping.c` was set to expect failure by mistake. It is fixed in this patch.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D86758
2020-08-28 12:19:46 -04:00
Joseph Huber 7a5a74ea96 [OpenMP] Always emit debug messages that indicate offloading failure
Summary:

This patch changes the libomptarget runtime to always emit debug messages that
occur before offloading failure. The goal is to provide users with information
about why their application failed in the target region rather than a single
failure message. This is only done in regions that precede offloading failure
so this should not impact runtime performance. if the debug environment
variable is set then the message is forwarded to the debug output as usual.

A new environment variable was added for future use but does nothing in this
current patch. LIBOMPTARGET_INFO will be used to report runtime information to
the user if requrested, such as grid size, SPMD usage, or data mapping. It will
take an integer indicating the level of information verbosity and a value of 0
will disable it.

Reviewers: jdoerfort

Subscribers: guansong sstefan1 yaxunl ye-luo

Tags: #OpenMP

Differential Revision: https://reviews.llvm.org/D86483
2020-08-26 19:30:41 -04:00
Shilei Tian 0775c1dfbc [OpenMP] Pack first-private arguments to improve efficiency of data transfer
In this patch, we pack all small first-private arguments, allocate and transfer them all at once to reduce the number of data transfer which is very expensive.

Let's take the test case as example.
```
int main() {
  int data1[3] = {1}, data2[3] = {2}, data3[3] = {3};
  int sum[16] = {0};
#pragma omp target teams distribute parallel for map(tofrom: sum) firstprivate(data1, data2, data3)
  for (int i = 0; i < 16; ++i) {
    for (int j = 0; j < 3; ++j) {
      sum[i] += data1[j];
      sum[i] += data2[j];
      sum[i] += data3[j];
    }
  }
}
```
Here `data1`, `data2`, and `data3` are three first-private arguments of the target region. In the previous `libomptarget`, it called data allocation and data transfer three times, each of which allocated and transferred 12 bytes. With this patch, it only calls allocation and transfer once. The size is `(12+4)*3=48` where 12 is the size of each array and 4 is the padding to keep the address aligned with 8. It is implemented in this way:
1. First collect all information for those *first*-private arguments. _private_ arguments are not the case because private arguments don't need to be mapped to target device. It just needs a data allocation. With the patch for memory manager, the data allocation could be very cheap, especially for the small size. For each qualified argument, push a place holder pointer `nullptr` to the `vector` for kernel arguments, and we will update them later.
2. After we have all information, create a buffer that can accommodate all arguments plus their paddings. Copy the arguments to the buffer at the right place, i.e. aligned address.
3. Allocate a target memory with the same size as the host buffer, transfer the host buffer to target device, and finally update all place holder pointers in the arguments `vector`.

The reason we only consider small arguments is, the data transfer is asynchronous. Therefore, for the large argument, we could continue to do things on the host side meanwhile, hopefully, the data is also being transferred. The "small" is defined by that the argument size is less than a predefined value. Currently it is 1024. I'm not sure whether it is a good one, and that is an open question. Another question is, do we need to make it configurable via an environment variable?

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D86307
2020-08-25 16:06:29 -04:00
Shilei Tian 0289696751 [OpenMP] Introduce target memory manager
Target memory manager is introduced in this patch which aims to manage target
memory such that they will not be freed immediately when they are not used
because the overhead of memory allocation and free is very large. For CUDA
device, cuMemFree even blocks the context switch on device which affects
concurrent kernel execution.

The memory manager can be taken as a memory pool. It divides the pool into
multiple buckets according to the size such that memory allocation/free
distributed to different buckets will not affect each other.

In this version, we use the exact-equality policy to find a free buffer. This
is an open question: will best-fit work better here? IMO, best-fit is not good
for target memory management because computation on GPU usually requires GBs of
data. Best-fit might lead to a serious waste. For example, there is a free
buffer of size 1960MB, and now we need a buffer of size 1200MB. If best-fit,
the free buffer will be returned, leading to a 760MB waste.

The allocation will happen when there is no free memory left, and the memory
free on device will take place in the following two cases:
1. The program ends. Obviously. However, there is a little problem that plugin
library is destroyed before the memory manager is destroyed, leading to a fact
that the call to target plugin will not succeed.
2. Device is out of memory when we request a new memory. The manager will walk
through all free buffers from the bucket with largest base size, pick up one
buffer, free it, and try to allocate immediately. If it succeeds, it will
return right away rather than freeing all buffers in free list.

Update:
A threshold (8KB by default) is set such that users could control what size of memory
will be managed by the manager. It can also be configured by an environment variable
`LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD`.

Reviewed By: jdoerfert, ye-luo, JonChesterfield

Differential Revision: https://reviews.llvm.org/D81054
2020-08-19 23:12:23 -04:00
Joel E. Denny 518a27e559 [OpenMP] Fix ref count dec for implicit map of partial data
D85342 broke this case.  The new test case presents an example.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D85369
2020-08-06 11:39:29 -04:00
Joel E. Denny 8c8bb128df [OpenMP] Fix `target data` exit for array extension
For example:

```
 #pragma omp target data map(tofrom:arr[0:100])
 {
   #pragma omp target exit data map(delete:arr[0:100])
   #pragma omp target enter data map(alloc:arr[98:2])
 }
```

Without this patch, the transfer at the end of the target data region
is broken and fails depending on the target device.  According to my
read of the spec, the transfer shouldn't even be attempted because
`arr[0:100]` isn't (fully) present there.  To fix that, this patch
makes `DeviceTy::getTgtPtrBegin` return null for this case.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D85342
2020-08-05 16:51:25 -04:00
Joel E. Denny 41b1aefecb [OpenMP] Fix `present` diagnostic for array extension
For example, without this patch, the following fails as expected with
or without the `present` modifier, but the `present` modifier doesn't
produce its usual diagnostic:

```
 #pragma omp target data map(alloc: arr[0:2])
 {
   #pragma omp target map(present, tofrom: arr[0:100]) // not fully present
   ;
 }
```

Reviewed By: grokos, vzakhari

Differential Revision: https://reviews.llvm.org/D85320
2020-08-05 16:51:24 -04:00
Joel E. Denny 5ab43989c3 [OpenMP] Fix `omp target update` for array extension
OpenMP TR8 sec. 2.15.6 "target update Construct", p. 183, L3-4 states:

> If the corresponding list item is not present in the device data
> environment and there is no present modifier in the clause, then no
> assignment occurs to or from the original list item.

L10-11 states:

> If a present modifier appears in the clause and the corresponding
> list item is not present in the device data environment then an
> error occurs and the program termintates.

(OpenMP 5.0 also has the first passage but without mention of the
present modifier of course.)

In both passages, I assume "is not present" includes the case of
partially but not entirely present.  However, without this patch, the
target update directive misbehaves in this case both with and without
the present modifier.  For example:

```
 #pragma omp target enter data map(to:arr[0:3])
 #pragma omp target update to(arr[0:5]) // might fail on data transfer
 #pragma omp target update to(present:arr[0:5]) // might fail on data transfer
```

The problem is that `DeviceTy::getTgtPtrBegin` does not return a null
pointer in that case, so `target_data_update` sees the data as fully
present, and the data transfer then might fail depending on the target
device.  However, without the present modifier, there should never be
a failure.  Moreover, with the present modifier, there should always
be a failure, and the diagnostic should mention the present modifier.

This patch fixes `DeviceTy::getTgtPtrBegin` to return null when
`target_data_update` is the caller.  I'm wondering if it should do the
same for more callers.

Reviewed By: grokos, jdoerfert

Differential Revision: https://reviews.llvm.org/D85246
2020-08-05 10:03:31 -04:00
Joel E. Denny 002d61db2b [OpenMP] Fix `present` for exit from `omp target data`
Without this patch, the following example fails but shouldn't
according to OpenMP TR8:

```
 #pragma omp target enter data map(alloc:i)
 #pragma omp target data map(present, alloc: i)
 {
   #pragma omp target exit data map(delete:i)
 } // fails presence check here
```

OpenMP TR8 sec. 2.22.7.1 "map Clause", p. 321, L23-26 states:

> If the map clause appears on a target, target data, target enter
> data or target exit data construct with a present map-type-modifier
> then on entry to the region if the corresponding list item does not
> appear in the device data environment an error occurs and the
> program terminates.

There is no corresponding statement about the exit from a region.
Thus, the `present` modifier should:

1. Check for presence upon entry into any region, including a `target
   exit data` region.  This behavior is already implemented correctly.

2. Should not check for presence upon exit from any region, including
   a `target` or `target data` region.  Without this patch, this
   behavior is not implemented correctly, breaking the above example.

In the case of `target data`, this patch fixes the latter behavior by
removing the `present` modifier from the map types Clang generates for
the runtime call at the end of the region.

In the case of `target`, we have not found a valid OpenMP program for
which such a fix would matter.  It appears that, if a program can
guarantee that data is present at the beginning of a `target` region
so that there's no error there, that data is also guaranteed to be
present at the end.  This patch adds a comment to the runtime to
document this case.

Reviewed By: grokos, RaviNarayanaswamy, ABataev

Differential Revision: https://reviews.llvm.org/D84422
2020-08-05 10:03:31 -04:00
Alexey Bataev 622e46156d [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region.
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84767
2020-07-30 11:18:33 -04:00
Alexey Bataev b69357c2f4 Revert "[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region."
This reverts commit 142d0d3ed8 to
investigate undefined behavior revealed by buildbots.
2020-07-30 10:57:56 -04:00
Alexey Bataev 142d0d3ed8 [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region.
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.
It applies only for global pointers.

Differential Revision: https://reviews.llvm.org/D84767
2020-07-30 09:40:05 -04:00