Commit Graph

1465 Commits

Author SHA1 Message Date
AndreyChurbanov a60bc55c69 [OpenMP] libomp: cleanup parsing of OMP_ALLOCATOR env variable.
Differential Revision: https://reviews.llvm.org/D94932
2021-01-19 16:21:22 +03:00
Kelvin Li 9d81073acb [OpenMP][Docs] Fix typos in FAQ (NFC) 2021-01-18 18:55:58 -05:00
AndreyChurbanov aa3a59e0c6 [OpenMP][NFC] Fix test
The test fails if memkind library is accessible.
2021-01-19 00:05:34 +03:00
Shilei Tian 9bf843bdc8 Revert "[OpenMP] Added the support for hidden helper task in RTL"
This reverts commit ed939f853d.
2021-01-18 06:57:52 -05:00
Chandler Carruth f855751c12 Fix openmp CMake build on non-Linux AArch64 systems.
This just checks for `/proc/cpuinfo` existing before reading it.

Tested on an ARM macOS machine.
2021-01-17 16:18:31 -08:00
Shilei Tian ed939f853d [OpenMP] Added the support for hidden helper task in RTL
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks.  We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.

Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.

Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D77609
2021-01-16 14:13:35 -05:00
Jon Chesterfield 214387c2c6 [libomptarget][nvptx] Reduce calls to cuda header
[libomptarget][nvptx] Reduce calls to cuda header

Remove use of clock_t in favour of a builtin. Drop a preprocessor branch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94731
2021-01-15 02:16:33 +00:00
Jon Chesterfield 6e7094c14b [libomptarget][nvptx][nfc] Move target_impl functions out of header
[libomptarget][nvptx][nfc] Move target_impl functions out of header

This removes most of the differences between the two target_impl.h.

Also change name mangling from C to C++ for __kmpc_impl_*_lock.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D94728
2021-01-15 00:19:48 +00:00
Shilei Tian 547b032ccc [OpenMP] Remove omptarget-nvptx from deps as it is no longer a valid target
`omptarget-nvptx` is still a dependence for `check-libomptarget-nvtpx`
although it has been removed by D94573.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94725
2021-01-14 19:16:11 -05:00
Shilei Tian 64e9e9aeee [OpenMP] Dropped unnecessary define when compiling deviceRTLs for NVPTX
The comment said CUDA 9 header files use the `nv_weak` attribute which
`clang` is not yet prepared to handle. It's three years ago and now things have
changed. Based on my test, removing the definition doesn't have any problem on
my machine with CUDA 11.1 installed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94700
2021-01-14 13:55:12 -05:00
Shilei Tian 763c1f9933 [OpenMP] Drop the static library libomptarget-nvptx
For NVPTX target, OpenMP provides a static library `libomptarget-nvptx`
built by NVCC, and another bitcode `libomptarget-nvptx-sm_{$sm}.bc` generated by
Clang. When compiling an OpenMP program, the `.bc` file will be fed to `clang`
in the second run on the program that compiles the target part. Then the generated
PTX file will be fed to `ptxas` to generate the object file, and finally the driver
invokes `nvlink` to generate the binary, where the static library will be appened
to `nvlink`.

One question is, why do we need two libraries? The only difference is, the static
library contains `omp_data.cu` and the bitcode library doesn't. It's unclear why
they were implemented in this way, but per D94565, there is no issue if we also
include the file into the bitcode library. Therefore, we can safely drop the
static library.

This patch is about the change in OpenMP. The driver will be updated as well if
this patch is accepted.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D94573
2021-01-14 13:34:25 -05:00
Jon Chesterfield 5d165f0b89 [libomptarget][amdgpu] Fix kernel launch tracing to match previous behavior
Restore control of kernel launch tracing to be >= 1 as it was before

export LIBOMPTARGET_KERNEL_TRACE=1

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94695
2021-01-14 18:13:22 +00:00
Terry Wilmarth 4fe17ada55 [OpenMP] Fix hierarchical barrier
Hierarchical barrier is an experimental barrier algorithm that uses aspects
of machine hierarchy to define the barrier tree structure. This patch fixes
offset calculation in hierarchical barrier. The offset is used to store info
on a flag about sleeping threads waiting on a location stored in the flag.
This commit also fixes a potential deadlock in hierarchical barrier when
using infinite blocktime by adjusting the offset value of leaf kids so that
it matches the value of leaf state. It also adds testing of default barriers
with infinite blocktime, and also tests hierarchical barrier algorithm with
both default and infinite blocktime.

Patch by Terry Wilmarth and Nawrin Sultana.

Differential Revision: https://reviews.llvm.org/D94241
2021-01-13 10:22:57 -06:00
Joseph Huber a957634942 [OpenMP] Add documentation for error messages and release notes
Add extra information to the runtime page describing the error messages and add information to the release notes for clang 12.0

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94562
2021-01-13 11:00:41 -05:00
Jon Chesterfield 84e0b14a0a [libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL
[libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D94565
2021-01-13 03:51:11 +00:00
Hansang Bae bba3a82b56 [OpenMP] Use persistent memory for omp_large_cap_mem
This change enables volatile use of persistent memory for omp_large_cap_mem*
on supported systems. It depends on libmemkind's support for persistent memory,
and requirements/details can be found at the following url.

https://pmem.io/2020/01/20/memkind-dax-kmem.html

Differential Revision: https://reviews.llvm.org/D94353
2021-01-12 20:35:27 -06:00
Hansang Bae 6f0f022038 [OpenMP] Update allocator trait key/value definitions
Use new definitions introduced in 5.1 specification.

Differential Revision: https://reviews.llvm.org/D94277
2021-01-12 20:09:45 -06:00
Shilei Tian 01f1273fe2 [OpenMP] Fixed a typo in openmp/CMakeLists.txt 2021-01-12 17:00:49 -05:00
Shilei Tian 68ff52ffea [OpenMP] Fixed the link error that cannot find static data member
Constant static data member can be defined in the class without another
define after the class in C++17. Although it is C++17, Clang can still handle it
even w/o the flag for C++17. Unluckily, GCC cannot handle that.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D94541
2021-01-12 16:48:28 -05:00
Jon Chesterfield 33e2494bea [libomptarget][amdgpu][nfc] Fix build on centos
[libomptarget][amdgpu][nfc] Fix build on centos

rtl.cpp replaced 224 with a #define from elf.h, but that
doesn't work on a centos 7 build machine with an old elf.h

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D94528
2021-01-12 19:40:03 +00:00
Shilei Tian bdd1ad5e5c [OpenMP] Fixed include directories for OpenMP when building OpenMP with LLVM_ENABLE_RUNTIMES
Some LLVM headers are generated by CMake. Before the installation,
LLVM's headers are distributed everywhere, some of which are in
`${LLVM_SRC_ROOT}/llvm/include/llvm`, and some are in
`${LLVM_BINARY_ROOT}/include/llvm`. After intallation, they're all in
`${LLVM_INSTALLATION_ROOT}/include/llvm`.

OpenMP now depends on LLVM headers. Some headers depend on headers generated
by CMake. When building OpenMP along with LLVM, a.k.a via `LLVM_ENABLE_RUNTIMES`,
we need to tell OpenMP where it can find those headers, especially those still
have not been copied/installed.

Reviewed By: jdoerfert, jhuber6

Differential Revision: https://reviews.llvm.org/D94534
2021-01-12 14:32:38 -05:00
Shilei Tian 0871d6d516 [OpenMP] Move memory manager to plugin and make it a common interface
The lifetime of `libomptarget` and its opened plugins are not aligned
and it's hard for `libomptarget` to determine when the plugins are destroyed.
As a result, some issues (see D94256 for details) occur on some platforms.
Actually, if we take target memory as target resources, same as other resources,
such as CUDA streams, in each plugin, then the memory manager should also be in
the plugin. Also considering some platforms may want to opt out the feature, it
makes sense to move the memory manager to plugin, make it a common interface, and
let plguin developers determine whether they need it. This is what this patch does.
CUDA plugin is taken as example to show how to integrate it. In this way, we can
also get a bonus that different thresholds can be set for different platforms.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D94379
2021-01-11 21:33:42 -05:00
Shilei Tian a81c68ae6b [OpenMP] Take elf_common.c as a interface library
For now `elf_common.c` is taken as a common part included into
different plugin implementations directly via
`#include "../../common/elf_common.c"`, which is not a best practice. Since it
is simple enough such that we don't need to create a real library for it, we just
take it as a interface library so that other targets can link it directly. Another
advantage of this method is, we don't need to add the folder into header search
path which can potentially pollute the search path.

VE and AMD platforms have not been tested because I don't have target machines.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94443
2021-01-11 17:34:26 -05:00
Shilei Tian 7be3285248 [OpenMP] Not set OPENMP_STANDALONE_BUILD=ON when building OpenMP along with LLVM
For now, `*_STANDALONE_BUILD` is set to ON even if they're built along
with LLVM because of issues mentioned in the comments. This can cause some issues.
For example, if we build OpenMP along with LLVM, we'd like to copy those OpenMP
headers to `<prefix>/lib/clang/<version>/include` such that `clang` can find
those headers without using `-I <prefix>/include` because those headers will be
copied to `<prefix>/include` if it is built standalone.

In this patch, we fixed the dependence issue in OpenMP such that it can be built
correctly even with `OPENMP_STANDALONE_BUILD=OFF`. The issue is in the call to
`add_lit_testsuite`, where `clang` and `clang-resource-headers` are passed as
`DEPENDS`. Since we're building OpenMP along with LLVM, `clang` is set by CMake
to be the C/C++ compiler, therefore these two dependences are no longer needed,
where caused the dependence issue.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93738
2021-01-10 16:46:19 -05:00
Shilei Tian 175c336a1c [OpenMP] Remove copy constructor of `RTLInfoTy`
Multiple `RTLInfoTy` objects are stored in a list `AllRTLs`. Since
`RTLInfoTy` contains a `std::mutex`, it is by default not a copyable object.
In order to support `AllRTLs.push_back(...)` which is currently used, a customized
copy constructor is provided. Every time we need to add a new data member into
`RTLInfoTy`, we should keep in mind not forgetting to add corresponding assignment
in the copy constructor. In fact, the only use of the copy constructor is to push
the object into the list, we can of course write it in a way that first emplace
a new object back, and then use the reference to the last element. In this way we
don't need the copy constructor anymore. If the element is invalid, we just need
to pop it, and that's what this patch does.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94361
2021-01-09 13:01:01 -05:00
Shilei Tian 676c7cb0c0 [OpenMP] Added the support for cache line size 256 for A64FX
Fugaku supercomputer is built with the Fujitsu A64FX microprocessor, whose cache line is 256. In current libomp, we only have cache line size 128 for PPC64 and otherwise 64. This patch added the support of cache line 256 for A64FX. It's worth noting that although A64FX is a variant of AArch64, this property is not shared. As a result, in light of UCX source code (392443ab92/src/ucs/arch/aarch64/cpu.c (L17)), we can only determine by checking whether the CPU is FUJITSU A64FX.

Reviewed By: jdoerfert, Hahnfeld

Differential Revision: https://reviews.llvm.org/D93169
2021-01-09 11:58:47 -05:00
Joseph Huber 2ce16810f2 [OpenMP] Always print error messages in libomptarget CUDA plugin
Summary:
Currently error messages from the CUDA plugins are only printed to the user if they have debugging enabled. Change this behaviour to always print the messages that result in offloading failure. This improves the error messages by indidcating what happened when the error occurs in the plugin library, such as a segmentation fault on the device.

Reviewed by: jdoerfert

Differential Revision: https://reviews.llvm.org/D94263
2021-01-07 17:47:32 -05:00
Johannes Doerfert 9ae171bcd3 [OpenMP][Docs] Add remarks intro section
Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D93735
2021-01-07 14:31:17 -06:00
Joseph Huber abb174bbc1 [OpenMP] Add example in Libomptarget Information docs
Add an example to the OpenMP Documentation on the LIBOMPTARGET_INFO environment variable

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94246
2021-01-07 15:00:51 -05:00
Hansang Bae fb1c528526 [OpenMP] Use c_int/c_size_t in Fortran target memory routine interface
The Fortran interface is now in line with 5.1 specification.

Differential Revision: https://reviews.llvm.org/D94042
2021-01-06 16:28:30 -06:00
Shilei Tian 5acdae1f9a [OpenMP] Fixed an issue that wrong LLVM headers might be included when building libomptarget
Wrong LLVM headers might be included if we don't set `include_directories`
to a right place. This will cause a compilation error if LLVM is installed in
system directories.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93737
2021-01-06 17:07:36 -05:00
Shilei Tian e2a623094f [OpenMP] Fixed the test environment when building along with LLVM
Currently all built libraries in OpenMP are anywhere if building along
with LLVM. It is not an issue if we don't execute any test. However, almost all
tests for `libomptarget` fails because in the lit configuration, we only set
`<build_dir>/libomptarget` to `LD_LIBRARY_PATH` and `LIBRARY_PATH`. Since those
libraries are everywhere, `clang` can no longer find `libomptarget.so` or those
deviceRTLs anymore.

In this patch, we set a unified path for all built libraries, no matter whether
it is built along with LLVM or not. In this way, our lit configuration can work
propoerly.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93736
2021-01-06 17:06:16 -05:00
George Rokos dec02904d2 [libomptarget] Allow calls to omp_target_memcpy with 0 size.
Differential Revision: https://reviews.llvm.org/D94095
2021-01-05 16:03:53 -08:00
Joseph Huber fe5d51a489 [OpenMP] Add using bit flags to select Libomptarget Information
Summary:
This patch adds more fine-grained support over which information is output from the libomptarget runtime when run with the environment variable LIBOMPTARGET_INFO set. An extensible set of flags can be used to pick and choose which information the user is interested in.

Reviewers: jdoerfert JonChesterfield grokos

Differential Revision: https://reviews.llvm.org/D93727
2021-01-04 12:03:15 -05:00
Jon Chesterfield 76bfbb74d3 [libomptarget][amdgpu] Call into deviceRTL instead of ockl
[libomptarget][amdgpu] Call into deviceRTL instead of ockl

Amdgpu codegen presently emits a call into ockl. The same functionality
is already present in the deviceRTL. Adds an amdgpu specific entry point
to avoid the dependency. This lets simple openmp code (specifically, that
which doesn't use libm) run without rocm device libraries installed.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D93356
2021-01-04 16:48:47 +00:00
Hansang Bae 82a29a62ab [OpenMP] Add definition/interface for target memory routines
The change includes new routines introduced in 5.1 and Fortran
interface.

Differential Revision: https://reviews.llvm.org/D93505
2021-01-04 08:12:57 -06:00
Terry Wilmarth 6b316febb4 [OpenMP] libomp: Handle implicit conversion warnings
This patch partially prepares the runtime source code to be built with
-Wconversion, which should trigger warnings if any implicit conversions
can possibly change a value. For builds done with icc or gcc, all such
warnings are handled in this patch. clang gives a much longer list of
warnings, particularly for sign conversions, which the other compilers
don't report. The -Wconversion flag is commented into cmake files, but
I'm not going to turn it on. If someone thinks it is important, and wants
to fix all the clang warnings, they are welcome to.

Types of changes made here involve either improving the consistency of types
used so that no conversion is needed, or else performing careful explicit
conversions, when we're sure a problem won't arise.

Patch is a combination of changes by Terry Wilmarth and Johnny Peyton.

Differential Revision: https://reviews.llvm.org/D92942
2020-12-31 00:39:57 +03:00
Joseph Huber 631501b1f9 [OpenMP] Fixing typo on memory size in Documenation 2020-12-23 11:46:26 -05:00
Joseph Huber 6e60346495 [OpenMP] Fixing Typo in Documentation 2020-12-23 09:17:51 -05:00
Joseph Huber 1c19804ebf [OpenMP] Add OpenMP Documentation for Libomptarget environment variables
Add support to the OpenMP web pages for environment variables supported
by Libomptarget and their usage.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93723
2020-12-22 17:41:27 -05:00
Johannes Doerfert 7b0f9dd79a [OpenMP][Docs] Fix Typo 2020-12-22 13:06:23 -06:00
Shilei Tian 1eb082c2ea [OpenMP][Docs] Fixed a typo in the doc that can mislead users to a CMake error
When setting `LLVM_ENABLE_RUNTIMES`, lower case word should be used;
otherwise, it can cause a CMake error that specific path is not found.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D93719
2020-12-22 14:05:58 -05:00
Johannes Doerfert 9cb748724e [OpenMP][Docs] Add FAQ entry about math and complex on GPUs
Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D93718
2020-12-22 13:05:04 -06:00
Shilei Tian 612ddc3117 [OpenMP][Docs] Updated the faq about building an OpenMP offloading capable compiler
After some issues about building runtimes along with LLVM were fixed,
building an OpenMP offloading capable compiler is pretty simple. This patch updates
the FAQ part in the doc.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93671
2020-12-22 13:14:53 -05:00
Johannes Doerfert 994bb6eb7d [OpenMP][NFC] Provide a new remark and documentation
If a GPU function is externally reachable we give up trying to find the
(unique) kernel it is called from. This can hinder optimizations. Emit a
remark and explain mitigation strategies.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D93439
2020-12-17 14:38:26 -06:00
Hansang Bae e1fd202489 [OpenMP] Add definitions for 5.1 interop to omp.h 2020-12-17 13:03:59 -06:00
Atmn 907886cc5b [OpenMP][Libomptarget][NFC] Use CMake Variables
This patchs adds CMake variables to add subdirectories and include
directories for libomptarget and explicitly gives the location of source
files.

Differential Revision: https://reviews.llvm.org/D93290
2020-12-16 19:05:15 -05:00
Jon Chesterfield b607837c75 [libomptarget][nfc] Replace static const with enum
[libomptarget][nfc] Replace static const with enum

Semantically identical. Replaces 0xff... with ~0 to spare counting the f.
Has the advantage that the compiler doesn't need to prove the 4/8 byte
value dead before discarding it, and sidesteps the compilation question
associated with what static means for a single source language.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93328
2020-12-16 16:40:37 +00:00
Peyton, Jonathan L 5aafdd7b88 [OpenMP] Introduce new file wrapper class for runtime
Introduce new kmp_safe_raii_file_t class with RAII semantics for file
open/close. It is essentially a wrapper around the C-style FILE* object.
This also unifies the way we error report if a file can't be opened.

Differential Revision: https://reviews.llvm.org/D92604
2020-12-15 14:46:30 -06:00
Hansang Bae 171ca93c54 [OpenMP] Initialize runtime in the forked child process
This patch enables serial initialization in the forked child process
to fix unstable runtime behavior when used with Python-based AI tools.

Differential Revision: https://reviews.llvm.org/D93230
2020-12-15 07:29:28 -06:00