Commit Graph

1614 Commits

Author SHA1 Message Date
Shilei Tian 2df65f87c1 [OpenMP] Fixed a crash in hidden helper thread
It is reported that after enabling hidden helper thread, the program
can hit the assertion `new_gtid < __kmp_threads_capacity` sometimes. The root
cause is explained as follows. Let's say the default `__kmp_threads_capacity` is
`N`. If hidden helper thread is enabled, `__kmp_threads_capacity` will be offset
to `N+8` by default. If the number of threads we need exceeds `N+8`, e.g. via
`num_threads` clause, we need to expand `__kmp_threads`. In
`__kmp_expand_threads`, the expansion starts from `__kmp_threads_capacity`, and
repeatedly doubling it until the new capacity meets the requirement. Let's
assume the new requirement is `Y`.  If `Y` happens to meet the constraint
`(N+8)*2^X=Y` where `X` is the number of iterations, the new capacity is not
enough because we have 8 slots for hidden helper threads.

Here is an example.
```
#include <vector>

int main(int argc, char *argv[]) {
  constexpr const size_t N = 1344;
  std::vector<int> data(N);

#pragma omp parallel for
  for (unsigned i = 0; i < N; ++i) {
    data[i] = i;
  }

#pragma omp parallel for num_threads(N)
  for (unsigned i = 0; i < N; ++i) {
    data[i] += i;
  }

  return 0;
}
```
My CPU is 20C40T, then `__kmp_threads_capacity` is 160. After offset,
`__kmp_threads_capacity` becomes 168. `1344 = (160+8)*2^3`, then the assertions
hit.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D98838
2021-03-18 18:25:36 -04:00
Jon Chesterfield 626a31de15 [libomptarget] Add register usage info to kernel metadata
Add register usage information to the runtime metadata so that it can be used during kernel launch (that change will be in a different commit). Add this information to the kernel trace.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D98829
2021-03-18 17:00:42 +00:00
Jon Chesterfield dbf8f2b089 Revert "[libomptarget] Build amdgcn devicertl by default"
This reverts commit e23f3502d9.
It broke the build of openmp for clang built without amdgcn
support. D98746, under review, would allow this to reland.
2021-03-17 11:34:44 +00:00
Hansang Bae a6f9cb6adc [OpenMP] Add runtime interface for OpenMP 5.1 error directive
The proposed new interface is for supporting `at(execution)` clause in the
error directive.

Differential Revision: https://reviews.llvm.org/D98448
2021-03-16 08:55:25 -05:00
Johannes Doerfert 0a954a528b [OpenMP][FIX] Repair accidental replacement of _shfl_sync with _shfl
This was broken accidentally in D95752.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D98677
2021-03-15 22:46:00 -05:00
Jon Chesterfield e23f3502d9 [libomptarget] Build amdgcn devicertl by default
[libomptarget] Build amdgcn devicertl by default

The cmake for this looks for an llvm install and does the right thing when
building as part of enable_runtimes. It will probably do the right thing
in other settings - at least, it won't try to build this with gcc.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D98658
2021-03-15 23:17:50 +00:00
Peyton, Jonathan L 7085f04573 [OpenMP] Remove unused cpu_stackoffset member 2021-03-15 16:52:04 -05:00
Jon Chesterfield bb38d7ff05 [libomptarget][nfc][amdgcn] Use precise triple for devicertl build 2021-03-15 20:24:13 +00:00
Jon Chesterfield d0bc85f04a [libomptarget][nfc] Drop unused DEVICE macro
[libomptarget][nfc] Drop unused DEVICE macro

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D98655
2021-03-15 20:12:50 +00:00
Jon Chesterfield 7da76aaaf4 [libomptarget] Build amdgpu plugin by default
[libomptarget] Build amdgpu plugin by default

This will build the amdgpu plugin if cmake is able to find the hsa
runtime library, which will be the case if rocm is installed or if
the hsa library has been installed somewhere cmake looks.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D98654
2021-03-15 20:12:01 +00:00
Jon Chesterfield bcb3f0f867 [libomptarget] Fix devicertl build
[libomptarget] Fix devicertl build

The target specific functions in target_interface are extern C, but the
implementations for nvptx were mostly C++ mangling. That worked out as
a quirk of DEVICE macro expanding to nothing, except for shuffle.h which
only forward declared the functions with C++ linkage.

Also implements GetWarpSize, as used by shuffle, and includes target_interface
in nvptx target_impl.cu to help catch future divergence between interface and
implementation.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D98651
2021-03-15 19:50:22 +00:00
Jon Chesterfield f675b3df48 [libomptarget] Drop assert.h, use freestanding for amdgcn devicertl
[libomptarget] Drop assert.h, use freestanding for amdgcn devicertl

Promotes the runtime assert to a link time error for the unimplemented
fallback functions. Enables amdgcn to build with only clang provided
headers, which makes it less likely to break other builds when enabled.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D98649
2021-03-15 18:50:09 +00:00
Jon Chesterfield 156842937f [libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding
[libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding

The glibc headers are a periodic source of problems compiling the devicertl.
This patch resolves the following error run into while building llvm on a slightly
different linux system.
```
In file included from .../lib/clang/13.0.0/include/inttypes.h:21:
In file included from /usr/include/inttypes.h:25:
/usr/include/features.h:461:12: fatal error: 'sys/cdefs.h' file not found
#  include <sys/cdefs.h>
           ^~~~~~~~~~~~~
```
As a second patch, removing assert.h from shuffle will let amdgcn build as
-ffreestanding, at which point only the headers that clang itself provides are
used and interactions with the host glibc are eliminated. Doing the same for
nvptx is complicated by printf handling but also seems worthwhile.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D98565
2021-03-15 16:54:58 +00:00
George Rokos 2468fdd9af [libomptarget] Add allocator support for target memory
This patch adds the infrastructure for allocator support for target memory.
Three allocators are introduced for device, host and shared memory.
The corresponding API functions have the llvm_ prefix temporarily, until they become part of the OpenMP standard.

Differential Revision: https://reviews.llvm.org/D97883
2021-03-13 03:47:07 -08:00
Johannes Doerfert 5449fbb5d4 [OpenMP][NFC] Use `AsyncInfo` as the variable name for a `__tgt_async_info`
Reviewed By: grokos, tianshilei1992

Differential Revision: https://reviews.llvm.org/D96444
2021-03-11 23:31:34 -06:00
Johannes Doerfert 66ba494b49 [OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant
The shuffle idiom is differently implemented in our supported targets.
To reduce the "target_impl" file we now move the shuffle idiom in it's
own self-contained header that provides the implementation for AMDGPU
and NVPTX. A fallback can be added later on.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D95752
2021-03-11 23:31:30 -06:00
Joseph Huber 807466ef28 [OpenMP] Restore backwards compatibility for libomptarget
Summary:
The changes introduced in D87946 changed the API for libomptarget
functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x
but was not given a backward-compatible interface. This change will
require people using Clang 13.x or 12.x to recompile their offloading
programs.

Reviewed By: jdoerfert cchen

Differential Revision: https://reviews.llvm.org/D98358
2021-03-11 09:52:11 -05:00
Leonard Chan baf637dcde Rename top-level LICENSE.txt files to LICENSE.TXT
This makes all the license filenames uniform across subprojects.

Differential Revision: https://reviews.llvm.org/D98380
2021-03-10 21:26:24 -08:00
AndreyChurbanov aaf16b80dd [OpenMP] libomp: eliminate pause from atomic CAS loops
For clang this change is NFC cleanup, because clang
never calls atomic functions from runtime library.

Basically, pause is good in spin-loops waiting for something.
Atomic CAS loops do not wait for anything,
each CAS failure means some other thread progressed.

Performance experiments show that the pause only causes unnecessary slowdown
on CPUs with slow pause instruction, no difference on CPUs with fast pause
instruction, removal of the pause gives lesser binary size which is good.

Differential Revision: https://reviews.llvm.org/D97079
2021-03-09 18:30:08 +03:00
AndreyChurbanov e4492b6f31 [OpenMP] NFC: temporarily disable assertion until the bug with dependences is fixed 2021-03-08 22:18:30 +03:00
Shilei Tian c41ae246ac [OpenMP][Clang][NVPTX] Only build one bitcode library for each SM
In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on
NVPTX target. We don't need to have macros in source code to select right functions
based on CUDA version. we don't need to compile multiple bitcode libraries of
different CUDA versions for each SM. We don't need to worry about future
compatibility with newer CUDA version.

`-target-feature +ptx61` is used in this patch, which corresponds to the highest
PTX version that CUDA 9.2 can support.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97198
2021-03-08 12:03:04 -05:00
Peyton, Jonathan L e2738b3758 [OpenMP] Fix potential integer overflow in dynamic schedule code
Restrict the chunk_size * chunk_num to only occur for valid
chunk_nums and reimplement calculating the limit to avoid overflow.

Differential Revision: https://reviews.llvm.org/D96747
2021-03-08 09:43:05 -06:00
tlwilmar 97d000cfc6 Added API for "masked" construct via two entrypoints: __kmpc_masked,
and __kmpc_end_masked. The "master" construct is deprecated. Changed
proc-bind keyword from "master" to "primary". Use of both master
construct and master as proc-bind keyword is still allowed, but
deprecated.

Remove references to "master" in comments and strings, and replace
with "primary" or "primary thread". Function names and variables were
not touched, nor were references to deprecated master construct. These
can be updated over time. No new code should refer to master.
2021-03-05 09:29:57 -06:00
Joel E. Denny d0eb25a643 [OpenMP] Encapsulate more in checkDeviceAndCtors
This patch just encapsulates some repeated code.  To do so, it
relocates some functions from interface.cpp to omptarget.cpp.  It also
adjusts them to the LLVM coding style.

This patch is almost NFC except some `DP` messages are a bit
different.  For example, messages like "Entering target region" are
now emitted even if offload is disabled, but a subsequent "Offload is
disabled" is then emitted.

Reviewed By: jdoerfert, grokos

Differential Revision: https://reviews.llvm.org/D97908
2021-03-04 12:03:42 -05:00
Joel E. Denny bfe5452b93 [OpenMP] Fix lone target exit data
Without this patch, an `omp target exit data` before the runtime is
initialized produces a runtime error.  This patch fixes that by
changing `__tgt_target_data_end_mapper` to call `CheckDeviceAndCtors`
like many other runtime routines.

Discussed at
<https://lists.llvm.org/pipermail/openmp-dev/2021-March/003920.html>.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D97907
2021-03-04 12:03:42 -05:00
Joel E. Denny 10c18c69f2 [OpenMP] Fix support for device as host
Without this patch, when the offload device is set to
`omp_get_initial_device()`, the runtime fails with an error diagnostic
when entering target regions or target data regions.

However, OpenMP 5.1, sec. 2.14.5 "target Construct", "Restrictions",
p. 203, L3-5 states:

> The device clause expression must evaluate to a non-negative integer
> value that is less than or equal to the value of
> omp_get_num_devices().

Sec. 3.7.7 "omp_get_initial_device", p. 412, L2-3 states:

> The value of the device number is the value returned by the
> omp_get_num_devices routine.

Similarly, OpenMP 5.0, sec. 2.12.5 "target Construct", "Restrictions",
p. 174 L30-32 states:

> The device clause expression must evaluate to a non-negative integer
> value less than the value of omp_get_num_devices() or to the value
> of omp_get_initial_device().

This patch fixes this behavior by changing the runtime to behave as if
offloading is disabled whenever it finds the offload device (either
from a `device` clause or the default device) is set to the host
device.  In the case of mandatory offloading when
`omp_get_num_devices() == 0`, it incorporates the behavior proposed
for OpenMP 5.2 in OpenMP spec github issue 2669.

Reviewed By: grokos, RaviNarayanaswamy

Differential Revision: https://reviews.llvm.org/D97616
2021-03-04 12:03:42 -05:00
Hansang Bae b6c2f538b2 [OpenMP] Add allocator support for target memory
This is a preview of allocator support for target memory that depends on the
offload runtime API which allocates memory as described below.

llvm_omp_target_alloc_host(size_t size, int device_num);
-- Returns non-migratable memory owned by host.
-- Memory is accessible by host and device(s).

llvm_omp_target_alloc_shared(size_t size, int device_num);
-- Returns migratable memory owned by host and device.
-- Memory is accessible by host and device.

llvm_omp_target_alloc_device(size_t size, int device_num);
-- Returns memory owned by device.
-- Memory is only accessible by device.

New memory space and predefined allocator names are
-- llvm_omp_target_host_mem_space
-- llvm_omp_target_shared_mem_space
-- llvm_omp_target_device_mem_space
-- llvm_omp_target_host_mem_alloc
-- llvm_omp_target_shared_mem_alloc
-- llvm_omp_target_device_mem_alloc

Differential Revision: https://reviews.llvm.org/D96669
2021-03-02 16:45:12 -06:00
Alexey Bataev 0caf736d7e [OPENMP50]Mapping of the subcomponents with the 'default' mappers.
If the mapped structure has data members, which have 'default' mappers,
need to map these members individually using their 'default' mappers.

Differential Revision: https://reviews.llvm.org/D92195
2021-03-02 07:11:06 -08:00
Peyton, Jonathan L e83380fccc [OpenMP] Fix clang-cl build error regarding TSX intrinsics
Fix for https://bugs.llvm.org/show_bug.cgi?id=49339

The CMake check for the RTM intrinsics needs the -mrtm flag to be set
during the test. This way clang-cl correctly detects it has the
_xbegin() intrinsic. Otherwise, the CMake check fails.

Differential Revision: https://reviews.llvm.org/D97413
2021-03-02 07:47:42 -06:00
AndreyChurbanov 1df6e58e55 [OpenMP] libomp minor cleanup
Cleanup changes:
- check value read from file;
- remove dead code;
- make unsigned variable to read hexadecimal number to;
- add debug assertion to check ref count.

Differential Revision: https://reviews.llvm.org/D96893
2021-02-26 00:44:51 +03:00
AndreyChurbanov 4932101177 [OpenMP] libomp: fix ittnotify stack stitching for teams construct
Stitching id could be overridden causing reference of destroyed object
when number of teams is 1. The patch separates stitching id store
location for teams and parallel nested in teams.

Differential Revision: https://reviews.llvm.org/D96562
2021-02-26 00:23:24 +03:00
Peyton, Jonathan L d12ae7db99 [OpenMP] Fix accidental addition of use omp_lib_kinds
Fortran header accidentally had use omp_lib_kinds added inside a
subroutine and function. This patch removes the lines.
2021-02-25 12:49:56 -06:00
Harmen Stoppels a54f160b3a Prefer /usr/bin/env xxx over /usr/bin/xxx where xxx = perl, python, awk
Allow users to use a non-system version of perl, python and awk, which is useful
in certain package managers.

Reviewed By: JDevlieghere, MaskRay

Differential Revision: https://reviews.llvm.org/D95119
2021-02-25 11:32:27 +01:00
Vyacheslav Zakharin 6baeeb9efa [libomptarget] Fixed MSVC build fail caused by __attribute__((used)).
Differential Revision: https://reviews.llvm.org/D97348
2021-02-24 09:59:39 -08:00
Joachim Protze 2fbce374c8 [OpenMP][Tests][NFC] rename macro to avoid naming clash
Rename a macro use missed in e0f3acc5d3
2021-02-24 18:46:56 +01:00
Shilei Tian e5da63d5a9 [OpenMP] Fixed a crash when offloading to x86_64 with target nowait
PR#49334 reports a crash when offloading to x86_64 with `target nowait`,
which is caused by referencing a nullptr. The root cause of the issue is, when
pushing a hidden helper task in `__kmp_push_task`, it also maps the gtid to its
shadow gtid, which is wrong.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97329
2021-02-24 12:37:30 -05:00
Joachim Protze f3a72509a7 [OpenMP][Tests][NFC] lit might also be known as llvm-lit.py 2021-02-24 18:32:24 +01:00
Manoel Roemmer 542d9c2154 [libomptarget] Load images in order of registration
This makes sure that images are loaded in the order in which they are registered with libomptarget.

If a target can load multiple images and these images depend on each other (for example if one image contains the programs target regions and one image contains library code), then the order in which images are loaded can be important for symbol resolution (for example, in the VE plugin).
In this case: because the same code exist in the host binaries, the order in which the host linker loads them (which is also the order in which images are registered with libomptarget) is the order in which the images have to be loaded onto the device.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95530
2021-02-24 18:15:41 +01:00
Joachim Protze e0f3acc5d3 [OpenMP][Tests][NFC] rename macro to avoid naming clash
Rename a macro and macro use missed in 35ab6d6390
2021-02-24 18:13:28 +01:00
Joachim Protze 35ab6d6390 [OpenMP][Tests][NFC] rename macro to avoid naming clash
When including <ostream>, the register_callback macro of the OMPT callback.h
clashes with a function defined in ostream. This patch renames the macro
and includes ompt into the macro name.
2021-02-24 18:03:54 +01:00
Shilei Tian f6c2984a09 [OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM
`ptx71` is not supported in release version of LLVM yet. As a result,
the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned
in D97004. Since the support in D97004 is just a WA for releease, and we'll not
use it in the near future, using `ptx70` for CUDA 11 is feasible.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97195
2021-02-23 13:20:21 -05:00
Peyton, Jonathan L 56223b1e91 [OpenMP] Help static loop code avoid over/underflow
This code alleviates some pathological loop parameters (lower,
upper, stride) within calculations involved in the static loop code.  It
bounds the chunk size to the trip count if it is greater than the trip
count and also minimizes problematic code for when trip count < nth.

Differential Revision: https://reviews.llvm.org/D96426
2021-02-22 13:22:01 -06:00
Peyton, Jonathan L 1b968467c0 [OpenMP] Remove shutdown attempt on Windows process detach
Only attempt shutdown if lpReserved is NULL. The Windows documentation
states:
When handling DLL_PROCESS_DETACH, a DLL should free resources such as
heap memory only if the DLL is being unloaded dynamically (the
lpReserved parameter is NULL). If the process is terminating (the
lpReserved parameter is non-NULL), all threads in the process except the
current thread either have exited already or have been explicitly
terminated by a call to the ExitProcess function, which might leave some
process resources such as heaps in an inconsistent state. In this case,
it is not safe for the DLL to clean up the resources. Instead, the DLL
should allow the operating system to reclaim the memory.

Differential Revision: https://reviews.llvm.org/D96750
2021-02-22 13:15:06 -06:00
Peyton, Jonathan L 8c73be9d86 [OpenMP] Limit number of dispatch buffers
This patch limits the number of dispatch buffers (used for
loop worksharing construct) to between 1 and 4096.

Differential Revision: https://reviews.llvm.org/D96749
2021-02-22 13:14:28 -06:00
Peyton, Jonathan L 55dff8b2e4 [OpenMP] Update HWLOC code for die level detection
Differential Revision: https://reviews.llvm.org/D96748
2021-02-22 13:05:55 -06:00
AndreyChurbanov 1611e5473c [OpenMP] libomp: cleanup some resource leaks
Close mutexattr and condattr local objects to eliminate resource leaks.

Differential Revision: https://reviews.llvm.org/D96892
2021-02-20 23:27:37 +03:00
Shilei Tian 309b00a42e [OpenMP][NFC] clang-format the whole openmp project
Same script as D95318. Test files are excluded.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D97088
2021-02-20 12:46:32 -05:00
Joel E. Denny ef8b3b5ffd [OpenMP] Fix nvptx CUDA_VERSION conversion
As mentioned in PR#49250, without this patch, ptxas for CUDA 9.1 fails
in the following two tests:

- openmp/libomptarget/test/mapping/lambda_mapping.cpp
- openmp/libomptarget/test/offloading/bug49021.cpp

The error looks like:

```
ptxas /tmp/lambda_mapping-081ea9.s, line 828; error   : Not a name of any known instruction: 'activemask'
```

The problem is that our cmake script converts CUDA version strings
incorrectly: 9.1 becomes 9100, but it should be 9010, as shown in
`getCudaVersion` in `clang/lib/Driver/ToolChains/Cuda.cpp`.  Thus,
`openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu`
inadvertently enables `activemask` because it apparently becomes
available in 9.2.  This patch fixes the conversion.

This patch does not fix the other two tests in PR#49250.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97012
2021-02-19 11:09:26 -05:00
Joel E. Denny d2147b1a87 [OpenMP] Fix always,from and delete for data absent at exit
Without this patch, there's a runtime error for those map types at
exit from an "omp target data" or at "omp target exit data", but the
spec says the list item should be ignored.

This patch tests that fix in data_absent_at_exit.c, and it also
improves other testing for data that is not fully present at exit.

Reviewed By: grokos, RaviNarayanaswamy

Differential Revision: https://reviews.llvm.org/D96999
2021-02-19 11:09:26 -05:00
Ron Lieberman 30c0d5b4c3 [OPENMP][AMDGCN] Improvements to print_kernel_trace (bit mask)
allow bit masking to select various trace features.
  bit 0 => Launch tracing           (stderr)
  bit 1 => timing of runtime        (stdout)
  bit 2 => detailed launch tracing  (stderr)
  bit 3 => timing goes to stdout instead of stderr

  example: LIBOMPTARGET_KERNEL_TRACE=7     does it all
           LIBOMPTARGET_KERNEL_TRACE=5     Launch + details
           LIBOMPTARGET_KERNEL_TRACE=2     timings + launch to stderr
           LIBOMPTARGET_KERNEL_TRACE=10    timings + launch to stdout

Differential Revision: https://reviews.llvm.org/D96998
2021-02-19 06:47:22 -05:00