Commit Graph

395 Commits

Author SHA1 Message Date
Jon Chesterfield 214387c2c6 [libomptarget][nvptx] Reduce calls to cuda header
[libomptarget][nvptx] Reduce calls to cuda header

Remove use of clock_t in favour of a builtin. Drop a preprocessor branch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94731
2021-01-15 02:16:33 +00:00
Jon Chesterfield 6e7094c14b [libomptarget][nvptx][nfc] Move target_impl functions out of header
[libomptarget][nvptx][nfc] Move target_impl functions out of header

This removes most of the differences between the two target_impl.h.

Also change name mangling from C to C++ for __kmpc_impl_*_lock.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D94728
2021-01-15 00:19:48 +00:00
Shilei Tian 547b032ccc [OpenMP] Remove omptarget-nvptx from deps as it is no longer a valid target
`omptarget-nvptx` is still a dependence for `check-libomptarget-nvtpx`
although it has been removed by D94573.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94725
2021-01-14 19:16:11 -05:00
Shilei Tian 64e9e9aeee [OpenMP] Dropped unnecessary define when compiling deviceRTLs for NVPTX
The comment said CUDA 9 header files use the `nv_weak` attribute which
`clang` is not yet prepared to handle. It's three years ago and now things have
changed. Based on my test, removing the definition doesn't have any problem on
my machine with CUDA 11.1 installed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94700
2021-01-14 13:55:12 -05:00
Shilei Tian 763c1f9933 [OpenMP] Drop the static library libomptarget-nvptx
For NVPTX target, OpenMP provides a static library `libomptarget-nvptx`
built by NVCC, and another bitcode `libomptarget-nvptx-sm_{$sm}.bc` generated by
Clang. When compiling an OpenMP program, the `.bc` file will be fed to `clang`
in the second run on the program that compiles the target part. Then the generated
PTX file will be fed to `ptxas` to generate the object file, and finally the driver
invokes `nvlink` to generate the binary, where the static library will be appened
to `nvlink`.

One question is, why do we need two libraries? The only difference is, the static
library contains `omp_data.cu` and the bitcode library doesn't. It's unclear why
they were implemented in this way, but per D94565, there is no issue if we also
include the file into the bitcode library. Therefore, we can safely drop the
static library.

This patch is about the change in OpenMP. The driver will be updated as well if
this patch is accepted.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D94573
2021-01-14 13:34:25 -05:00
Jon Chesterfield 5d165f0b89 [libomptarget][amdgpu] Fix kernel launch tracing to match previous behavior
Restore control of kernel launch tracing to be >= 1 as it was before

export LIBOMPTARGET_KERNEL_TRACE=1

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94695
2021-01-14 18:13:22 +00:00
Jon Chesterfield 84e0b14a0a [libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL
[libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D94565
2021-01-13 03:51:11 +00:00
Shilei Tian 68ff52ffea [OpenMP] Fixed the link error that cannot find static data member
Constant static data member can be defined in the class without another
define after the class in C++17. Although it is C++17, Clang can still handle it
even w/o the flag for C++17. Unluckily, GCC cannot handle that.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D94541
2021-01-12 16:48:28 -05:00
Jon Chesterfield 33e2494bea [libomptarget][amdgpu][nfc] Fix build on centos
[libomptarget][amdgpu][nfc] Fix build on centos

rtl.cpp replaced 224 with a #define from elf.h, but that
doesn't work on a centos 7 build machine with an old elf.h

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D94528
2021-01-12 19:40:03 +00:00
Shilei Tian bdd1ad5e5c [OpenMP] Fixed include directories for OpenMP when building OpenMP with LLVM_ENABLE_RUNTIMES
Some LLVM headers are generated by CMake. Before the installation,
LLVM's headers are distributed everywhere, some of which are in
`${LLVM_SRC_ROOT}/llvm/include/llvm`, and some are in
`${LLVM_BINARY_ROOT}/include/llvm`. After intallation, they're all in
`${LLVM_INSTALLATION_ROOT}/include/llvm`.

OpenMP now depends on LLVM headers. Some headers depend on headers generated
by CMake. When building OpenMP along with LLVM, a.k.a via `LLVM_ENABLE_RUNTIMES`,
we need to tell OpenMP where it can find those headers, especially those still
have not been copied/installed.

Reviewed By: jdoerfert, jhuber6

Differential Revision: https://reviews.llvm.org/D94534
2021-01-12 14:32:38 -05:00
Shilei Tian 0871d6d516 [OpenMP] Move memory manager to plugin and make it a common interface
The lifetime of `libomptarget` and its opened plugins are not aligned
and it's hard for `libomptarget` to determine when the plugins are destroyed.
As a result, some issues (see D94256 for details) occur on some platforms.
Actually, if we take target memory as target resources, same as other resources,
such as CUDA streams, in each plugin, then the memory manager should also be in
the plugin. Also considering some platforms may want to opt out the feature, it
makes sense to move the memory manager to plugin, make it a common interface, and
let plguin developers determine whether they need it. This is what this patch does.
CUDA plugin is taken as example to show how to integrate it. In this way, we can
also get a bonus that different thresholds can be set for different platforms.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D94379
2021-01-11 21:33:42 -05:00
Shilei Tian a81c68ae6b [OpenMP] Take elf_common.c as a interface library
For now `elf_common.c` is taken as a common part included into
different plugin implementations directly via
`#include "../../common/elf_common.c"`, which is not a best practice. Since it
is simple enough such that we don't need to create a real library for it, we just
take it as a interface library so that other targets can link it directly. Another
advantage of this method is, we don't need to add the folder into header search
path which can potentially pollute the search path.

VE and AMD platforms have not been tested because I don't have target machines.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94443
2021-01-11 17:34:26 -05:00
Shilei Tian 175c336a1c [OpenMP] Remove copy constructor of `RTLInfoTy`
Multiple `RTLInfoTy` objects are stored in a list `AllRTLs`. Since
`RTLInfoTy` contains a `std::mutex`, it is by default not a copyable object.
In order to support `AllRTLs.push_back(...)` which is currently used, a customized
copy constructor is provided. Every time we need to add a new data member into
`RTLInfoTy`, we should keep in mind not forgetting to add corresponding assignment
in the copy constructor. In fact, the only use of the copy constructor is to push
the object into the list, we can of course write it in a way that first emplace
a new object back, and then use the reference to the last element. In this way we
don't need the copy constructor anymore. If the element is invalid, we just need
to pop it, and that's what this patch does.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94361
2021-01-09 13:01:01 -05:00
Joseph Huber 2ce16810f2 [OpenMP] Always print error messages in libomptarget CUDA plugin
Summary:
Currently error messages from the CUDA plugins are only printed to the user if they have debugging enabled. Change this behaviour to always print the messages that result in offloading failure. This improves the error messages by indidcating what happened when the error occurs in the plugin library, such as a segmentation fault on the device.

Reviewed by: jdoerfert

Differential Revision: https://reviews.llvm.org/D94263
2021-01-07 17:47:32 -05:00
Shilei Tian 5acdae1f9a [OpenMP] Fixed an issue that wrong LLVM headers might be included when building libomptarget
Wrong LLVM headers might be included if we don't set `include_directories`
to a right place. This will cause a compilation error if LLVM is installed in
system directories.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93737
2021-01-06 17:07:36 -05:00
Shilei Tian e2a623094f [OpenMP] Fixed the test environment when building along with LLVM
Currently all built libraries in OpenMP are anywhere if building along
with LLVM. It is not an issue if we don't execute any test. However, almost all
tests for `libomptarget` fails because in the lit configuration, we only set
`<build_dir>/libomptarget` to `LD_LIBRARY_PATH` and `LIBRARY_PATH`. Since those
libraries are everywhere, `clang` can no longer find `libomptarget.so` or those
deviceRTLs anymore.

In this patch, we set a unified path for all built libraries, no matter whether
it is built along with LLVM or not. In this way, our lit configuration can work
propoerly.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93736
2021-01-06 17:06:16 -05:00
George Rokos dec02904d2 [libomptarget] Allow calls to omp_target_memcpy with 0 size.
Differential Revision: https://reviews.llvm.org/D94095
2021-01-05 16:03:53 -08:00
Joseph Huber fe5d51a489 [OpenMP] Add using bit flags to select Libomptarget Information
Summary:
This patch adds more fine-grained support over which information is output from the libomptarget runtime when run with the environment variable LIBOMPTARGET_INFO set. An extensible set of flags can be used to pick and choose which information the user is interested in.

Reviewers: jdoerfert JonChesterfield grokos

Differential Revision: https://reviews.llvm.org/D93727
2021-01-04 12:03:15 -05:00
Jon Chesterfield 76bfbb74d3 [libomptarget][amdgpu] Call into deviceRTL instead of ockl
[libomptarget][amdgpu] Call into deviceRTL instead of ockl

Amdgpu codegen presently emits a call into ockl. The same functionality
is already present in the deviceRTL. Adds an amdgpu specific entry point
to avoid the dependency. This lets simple openmp code (specifically, that
which doesn't use libm) run without rocm device libraries installed.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D93356
2021-01-04 16:48:47 +00:00
Atmn 907886cc5b [OpenMP][Libomptarget][NFC] Use CMake Variables
This patchs adds CMake variables to add subdirectories and include
directories for libomptarget and explicitly gives the location of source
files.

Differential Revision: https://reviews.llvm.org/D93290
2020-12-16 19:05:15 -05:00
Jon Chesterfield b607837c75 [libomptarget][nfc] Replace static const with enum
[libomptarget][nfc] Replace static const with enum

Semantically identical. Replaces 0xff... with ~0 to spare counting the f.
Has the advantage that the compiler doesn't need to prove the 4/8 byte
value dead before discarding it, and sidesteps the compilation question
associated with what static means for a single source language.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93328
2020-12-16 16:40:37 +00:00
Giorgis Georgakoudis e007b32864 [OpenMP] Add time profiling for libomptarget
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93055
2020-12-11 18:53:37 -08:00
Jon Chesterfield ce93de3bb2 [libomptarget][nfc] Remove data_sharing type aliasing
[libomptarget][nfc] Remove data_sharing type aliasing

Libomptarget previous used __kmpc_data_sharing_slot to access values of type
__kmpc_data_sharing_{worker,master}_slot_static. This aliasing violation was
benign in practice. The master type has since been removed, so a single type
can be used instead.

This is particularly helpful for the transition to an openmp deviceRTL, as the
c++/openmp compiler for amdgcn currently rejects the flexible array member for
being an incomplete type. Serves the same purpose as abandoned D86324.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93075
2020-12-11 02:13:34 +00:00
Jon Chesterfield 7c59614394 [libomptarget][amdgpu] clang-format src/rtl.cpp 2020-12-09 19:45:51 +00:00
Jon Chesterfield c9bc414840 [libomptarget][amdgpu] Let default number of teams equal number of CUs 2020-12-09 19:35:34 +00:00
Jon Chesterfield e191d31159 [libomptarget][amdgpu] Robust handling of device_environment symbol 2020-12-09 19:21:51 +00:00
Jon Chesterfield cab9f69235 [libomptarget][amdgpu] Improve diagnostics on arch mismatch 2020-12-09 18:55:53 +00:00
Jon Chesterfield 71f4693020 [libomptarget][amdgpu] Add plumbing to call into hostrpc lib, if linked 2020-12-07 15:24:01 +00:00
Jon Chesterfield e1b8e8a1f4 [libomptarget][amdgpu] Skip device_State allocation when using bss global 2020-12-06 12:13:56 +00:00
Jon Chesterfield f628eef98a [libomptarget][amdgpu] Fix latent race in load binary 2020-12-04 16:29:09 +00:00
Jon Chesterfield ae9d96a656 [libomptarget][amdgpu] Address compiler warnings, drive by fixes
[libomptarget][amdgpu] Address compiler warnings, drive by fixes

Initialize some variables, remove unused ones.
Changes the debug printing condition to align with the aomp test suite.

Differential Revision: https://reviews.llvm.org/D92559
2020-12-03 11:09:12 +00:00
Pushpinder Singh afc09c6fe4 [libomptarget][AMDGPU] Remove MaxParallelLevel
Removes MaxParallelLevel references from rtl.cpp and drops
resulting dead code.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D92463
2020-12-03 00:27:03 -05:00
Jon Chesterfield 89a0f48c58 [libomptarget][cuda] Detect missing symbols in plugin at build time
[libomptarget][cuda] Detect missing symbols in plugin at build time

Passes -z,defs to the linker. Error on unresolved symbol references.

Otherwise, those unresolved symbols present as target code running on the host
as the plugin fails to load. This is significantly harder to debug than a link
time error. Flag matches that passed by amdgcn and ve plugins.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D92143
2020-11-27 15:39:41 +00:00
cchen 7036fe8a0c [libomptarget] Add support for target update non-contiguous
This patch is the runtime support for https://reviews.llvm.org/D84192.

In order not to modify the tgt_target_data_update information but still be
able to pass the extra information for non-contiguous map item (offset,
count, and stride for each dimension), this patch overload arg when
the maptype is set as OMP_TGT_MAPTYPE_DESCRIPTOR. The origin arg is for
passing the pointer information, however, the overloaded arg is an
array of descriptor_dim:

```
struct descriptor_dim {
  int64_t offset;
  int64_t count;
  int64_t stride
};
```

and the array size is the dimension size. In addition, since we
have count and stride information in descriptor_dim, we can replace/overload the
arg_size parameter by using dimension size.

Reviewed By: grokos, tianshilei1992

Differential Revision: https://reviews.llvm.org/D82245
2020-11-19 11:33:27 -06:00
Joseph Huber da8bec47ab [OpenMP] Add Location Fields to Libomptarget Runtime for Debugging
Summary:
Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values.

Reviewers: jdoerfert grokos

Differential Revision: https://reviews.llvm.org/D87946
2020-11-19 12:01:53 -05:00
Joseph Huber 5378c6a4bf [OpenMP] Add Support for Mapping Names in Libomptarget RTL
Summary:
This patch adds basic support for priting the source location and names for the mapped variables. This patch does not support names for custom mappers. This is based on D89802.

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D90172
2020-11-18 16:01:59 -05:00
Joseph Huber 97e55cfef5 [OpenMP] Add Passing in Original Declaration Names To Mapper API
Summary:
This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;"

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D89802
2020-11-18 15:28:39 -05:00
Alexey Bataev dcde6f17fd Revert "[libomptarget] Add support for target update non-contiguous"
This reverts commit 6847bcec1a. It breaks
the build of libomptarget.
2020-11-10 07:49:00 -08:00
cchen 6847bcec1a [libomptarget] Add support for target update non-contiguous
This patch is the runtime support for https://reviews.llvm.org/D84192.

In order not to modify the tgt_target_data_update information but still be
able to pass the extra information for non-contiguous map item (offset,
count, and stride for each dimension), this patch overload arg when
the maptype is set as OMP_TGT_MAPTYPE_DESCRIPTOR. The origin arg is for
passing the pointer information, however, the overloaded arg is an
array of descriptor_dim:

```
struct descriptor_dim {
  int64_t offset;
  int64_t count;
  int64_t stride
};
```

and the array size is the dimension size. In addition, since we
have count and stride information in descriptor_dim, we can replace/overload the
arg_size parameter by using dimension size.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D82245
2020-11-06 20:55:33 -06:00
Jon Chesterfield 93cbf622fc [libomptarget][nfc] Build amdgcn deviceRTL with nogpulib 2020-11-04 11:29:22 +00:00
Shilei Tian f5eebc25cc [OpenMP] Fixed an issue in the test case parallel_offloading_map
There is a non-conforming use of variable-sized array in the test case `parallel_offloading_map.c`. This patch fixed it.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D90642
2020-11-03 15:59:16 -05:00
Joachim Protze 71041a8b6b [OpenMP][libomptarget][Tests] fix failing test
D88149 updated `omp_get_initial_device` behavior to conform with OpenMP 5.1.
omp_get_initial_device() == omp_get_num_devices()
2020-11-03 13:15:33 +01:00
Atmn Patel a95b25b29e [Libomptarget][NFC] Move global Libomptarget state to a struct
Presently, there a number of global variables in libomptarget (devices,
RTLs, tables, mutexes, etc.) that are not placed within a struct. This
patch places them into a struct ``PluginManager``. All of the functions
that act on this data remain free.

Differential Revision: https://reviews.llvm.org/D90519
2020-11-03 00:10:18 -05:00
Jon Chesterfield dee7704829 [AMDGPU] Add __builtin_amdgcn_grid_size
[AMDGPU] Add __builtin_amdgcn_grid_size

Similar to D76772, loads the data from the dispatch pointer. Marked invariant.

Patch also updates the openmp devicertl to use this builtin.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D90251
2020-10-29 16:25:13 +00:00
Benjamin Kramer 207cf71fa9 Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API"
This reverts commit d981c7b758 and
a87d7b3d44. Test fails under msan.
2020-10-28 13:58:14 +01:00
Joseph Huber d981c7b758 [OpenMP] Add Support for Mapping Names in Libomptarget RTL
Summary:
This patch adds basic support for priting the source location and names for the
mapped variables. This patch does not support names for custom mappers. This is
based on D89802. The names information currently will be printed out only in
debug mode or using env LIBOMPTARGET_INFO during execution. But the information
is added when availible to the Device and Private data structures. To get the
information out the code must be built with debug symbols on using -g or
-Rpass=openmp-opt

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D90172
2020-10-27 16:53:05 -04:00
Joseph Huber a87d7b3d44 [OpenMP] Add Passing in Original Declaration Names To Mapper API
Summary:
This patch adds support for passing in the original delcaration name in the
source file to the libomptarget runtime. This will allow the runtime to provide
more intelligent debugging messages. This patch takes the original expression
parsed from the OpenMP map / update clause and provides a textual
representation if it was explicitly mapped, otherwise it takes the name of the
variable declaration as a fallback. The information in passed to the runtime in
a global array of strings that matches the existing ident_t source location
strings using ";name;filename;column;row;;". See
clang/test/OpenMP/target_map_names.cpp for an example of the generated output
for a given map clause.

Reviewers: jdoervert

Differential Revision: https://reviews.llvm.org/D89802
2020-10-27 16:09:19 -04:00
Shilei Tian e20d64c3d9 [Clang][OpenMP] Fixed an issue of segment fault when using target nowait
The implementation of target nowait just wraps the target region into a task. The essential four parameters (base ptr, ptr, size, mapper) are taken as firstprivate such that they will be copied to the private location. When there is no user-defined mapper, the mapper variable will be nullptr. However, it will be still copied to the corresponding place. Therefore, a memcpy will be generated and the source pointer will be nullptr, causing a segmentation fault. The root cause is when calling `emitOffloadingArraysArgument`, the last argument `Options` has a field about whether it requires a task. It only takes depend clause into account. In this patch, the nowait clause is also included.

There're two things that will be done in another patches:
1. target data nowait has not been supported yet. D90099 added the support.
2. When there is no mapper, the mapper array can be nullptr no matter whether it requires outer task or not. It can avoid an unnecessary data copy. This is an optimization that is covered in D90101.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89844
2020-10-26 22:33:22 -04:00
Shilei Tian 3091ed099f [OpenMP] Fixed a potential integer overflow
`size_t` has different width on 32- and 64-bit architecture, but the
computation to floor to power of two assumed it is 64-bit, which can cause an
integer overflow. In this patch, architecture detection is added so that the
operation for 64-bit `size_t`. Thank Luke for reporting the issue.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89878
2020-10-22 21:22:19 -04:00
Jon Chesterfield 26790ed248 [libomptarget] Require LLVM source tree to build libomptarget
[libomptarget] Require LLVM source tree to build libomptarget

This is to permit reliably #including files from the LLVM tree in libomptarget,
as an improvement on the copy and paste that is currently in use. See D87841
for the first example of removing duplication given this new requirement.

The weekly openmp dev call reached consensus on this approach. See also D87841
for some alternatives that were considered. In the future, we may want to
introduce a new top level repo for shared constants, or start using the ADT
library within openmp.

This will break sufficiently exotic build systems, trivial fixes as below.

Building libomptarget as part of the monorepo will continue to work.
If openmp is built separately, it now requires a cmake macro indicating
where to find the LLVM source tree.

If openmp is built separately, without the llvm source tree already on disk,
the build machine will need a copy of a subset of the llvm source tree and
the cmake macro indicating where it is.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D89426
2020-10-21 18:53:00 +01:00