llvm-project

Commit Graph

Author	SHA1	Message	Date
Aakanksha Patil	464e4dc50f	[AMDGPU] Add gfx1034 target Differential Revision: https://reviews.llvm.org/D102306	2021-05-13 14:25:18 -04:00
Jon Chesterfield	10de217209	[libomptarget][amdgpu] Fix truncation error for partial wavefront [libomptarget][amdgpu] Fix truncation error for partial wavefront The partial barrier implementation involves one wavefront resetting and N-1 waiting. This change future proofs against launching with a number of threads that is not a multiple of the wavefront size. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102407	2021-05-13 17:31:57 +01:00
Jon Chesterfield	b049870d3b	[libomptarget][amdgpu] Convert an assert to print and offload_fail [libomptarget][amdgpu] Convert an assert to print and offload_fail The kernel launched is supposed to be present in the binary, but a not yet diagnosed bug means it is missing for some of the qmcpack test cases. Changing from assert to print and offload_fail should help diagnose that and similar bugs. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102378	2021-05-13 17:31:36 +01:00
Michael Kruse	34ed3e6337	[OpenMP] Test unified shared memory tests only on systems that support it. Add a `REQUIRES: unified_shared_memory` option to tests that use `#pragma omp requires unified_shared_memory`. For CUDA, the feature tag is derived from LIBOMPTARGET_DEP_CUDA_ARCH which itself is derived using [[ https://cmake.org/cmake/help/latest/module/FindCUDA.html#commands \| cuda_select_nvcc_arch_flags ]]. The latter determines which compute capability the GPU in the system supports. To ensure that this is the CUDA arch being used, we could also set the `-Xopenmp-target -march=` flag. In the absence of an NVIDIA GPU, LIBOMPTARGET_DEP_CUDA_ARCH will be 35. That is, in that case we are assuming unified_shared_memory is not available. CUDA plugin testing could be disabled entirely in this case, but this currently depends on `LIBOMPTARGET_CAN_LINK_LIBCUDA OR LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA`, not on whether the hardware is actually available. For all other targets, nothing changes and we are assuming unified shared memory is available. This might need refinement if not the case. This tries to fix the [[ http://meinersbur.de:8011/#/builders/143 \| OpenMP Offloading Buildbot ]] that, although brand-new, only has a Pascal-generation (sm_61) GPU installed. Hence, tests that require unified shared memory are currently failing. I wish I had known in advance. Reviewed By: protze.joachim, tianshilei1992 Differential Revision: https://reviews.llvm.org/D101498	2021-05-13 11:08:04 -05:00
Jon Chesterfield	9934571eab	[libomptarget][amdgpu][nfc] Expand errorcheck macros [libomptarget][amdgpu][nfc] Expand errorcheck macros These macros expand to continue, which is confusing, or exit, which is incompatible with continuing execution on offloading fail. Expanding the macros in place makes the code look untidy but the control flow obvious and amenable to improving. In particular, exit becomes easier to eliminate. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102230	2021-05-12 17:30:41 +01:00
Jon Chesterfield	72995a4bdf	[libomptarget][nfc] Add hook to easily disable building amdgcn bclib [libomptarget][nfc] Add hook to easily disable building amdgcn bclib This is useful when building LLVM with a toolchain that can't emit code for amdgcn, e.g. because it overrides the include search path with headers from another architecture, or the clang compiler is missing builtins. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102229	2021-05-11 17:23:09 +01:00
Jon Chesterfield	dedca78d48	[libomptarget][nfc] Drop stringify in macro [libomptarget][nfc] Drop stringify in macro A step towards deleting the macros entirely. Differential Revision: https://reviews.llvm.org/D102228	2021-05-11 12:19:55 +01:00
Jon Chesterfield	6da348569c	[libomptarget] Add support for target allocators to dynamic cuda RTL [libomptarget] Add support for target allocators to dynamic cuda RTL Follow on to D102000 which introduced new calls into libcuda. This patch adds the corresponding entry points to dynamic_cuda, fixing the build for systems that do not have the cuda toolkit installed. Function types and enum from https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102169	2021-05-10 15:27:50 +01:00
Pushpinder Singh	9586937ef5	[AMDGPU][OpenMP] Disable tests when amdgpu-arch fails This patch prevents runtime tests running on systems without amdgpu. Reviewed By: protze.joachim, tianshilei1992 Differential Revision: https://reviews.llvm.org/D102054	2021-05-10 07:37:27 +00:00
Vyacheslav Zakharin	f2f88f3e7a	An attempt to abandon omptarget out-of-tree builds. I want to start using LLVM component libraries in libomptarget to stop duplicating implementations already available in LLVM (e.g. LLVMObject, LLVMSupport, etc.). Without relying on LLVM in all libomptarget builds one has to provide fallback implementation for each used LLVM feature. This is an attempt to stop supporting out-of-llvm-tree builds of libomptarget. I understand that I may need to revert this, if this affects downstream projects in a bad way. Differential Revision: https://reviews.llvm.org/D101509	2021-05-07 12:43:50 -07:00
Joseph Huber	a15f8589f4	[libomptarget] Add support for target memory allocators to cuda RTL Summary: The allocator interface added in D97883 allows the RTL to allocate shared and host-pinned memory from the cuda plugin. This patch adds support for these to the runtime. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D102000	2021-05-07 10:27:02 -04:00
Jon Chesterfield	44ee974e2f	[libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one [libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one D101976 would require a second barrier instance. This NFC to amdgpu makes it simpler to add one (an extra global, one more line in init). Also renames the current barrier to L0. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102016	2021-05-06 23:52:19 +01:00
Jon Chesterfield	7e9351b9de	[libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin [libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin Drops an enum that was identical to a HSA one, localises some functions where they were only called from one TU. Covers everything internalize + adce can identify as dead, except for msgpack::dump which is useful when debugging. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102014	2021-05-06 23:16:32 +01:00
Pushpinder Singh	ae845d6426	[AMDGPU][OpenMP] Enable Libomptarget runtime tests This enables the runtime tests on amdgpu targets. 10 tests have been marked as XFAIL on amdgcn currently mostly due to missing printf. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D99656	2021-05-03 05:56:42 +00:00
Michael Kruse	7308862ff5	[OpenMP][CMake] Use in-project clang as CUDA->IR compiler. If available, use the clang that is already built in the same project as CUDA compiler unless another executable is explicitly defined. This also ensures the generated deviceRTL IR will be consistent with the version of Clang. This patch is required to reliably test OpenMP offloading in a buildbot without either a two-stage build (e.g. with LLVM_ENABLE_RUNTIMES) or a separately installed clang on the worker that will eventually become outdated. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101265	2021-04-30 12:45:52 -05:00
Michael Kruse	3244a8b536	[OpenMP][CMake] Pass --cuda-path to regression tests. The OpenMP runtime can be compiled using a CUDA installed at non-default location with the -DCUDA_TOOLKIT_ROOT_DIR setting. However, check-openmp will fail afterwards because Clang needs to know where to find the CUDA headers. Fix by passing -cuda-path to Clang using the value of CUDA_TOOLKIT_ROOT_DIR which has been determined by CMake. Also set LD_LIBRARY_PATH such that it can find the cuda runtime when executing. This will ensure that the regression test do not depend on the current environment, but use the environment it was configured for. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101266	2021-04-27 16:27:40 -05:00
Joachim Protze	24f836e8fd	[OpenMP][libomptarget] Separate lit tests for different offloading targets (2/2) This patch fuses the RUN lines for most libomptarget tests. The previous patch D101315 created separate test targets for each supported offloading triple. This patch updates the RUN lines in libomptarget tests to use a generic run line independent of the offloading target selected for the lit instance. In cases, where no RUN line was defined for a specific offloading target, the corresponding target is declared as XFAIL. If it turns out that a test actually supports the target, the XFAIL line can be removed. Differential Revision: https://reviews.llvm.org/D101326	2021-04-27 15:54:32 +02:00
Joachim Protze	b845217b1d	[OpenMP][libomptarget] Separate lit tests for different offloading targets (1/2) This patch creates a separate test directory for each offloading target to be tested. This allows to test multiple architectures in one configuration, while still see all failing tests separately. The lit test names include the target triple, so that it will be easier to spot the failing target. This patch also allows to mark expected failing tests based on the target-triple, as the currently used triple is added to the lit "features": ``` // XFAIL: nvptx64-nvidia-cuda ``` Differential Revision: https://reviews.llvm.org/D101315	2021-04-27 12:30:01 +02:00
Jon Chesterfield	58f125493d	[libomptarget] Enable AMDGPU devicertl [libomptarget] Enable AMDGPU devicertl The amdgpu devicertl is written in freestanding openmp and compiles to a bitcode library (per listed gfx arch) with no unresolved symbols. It requires a recent clang, preferably the one from the same monorepo checkout. This is D98658, with printf explicitly stubbed out, after patching clang to no longer require an llvm with the amdgpu target enabled. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101213	2021-04-24 02:24:44 +01:00
Johannes Doerfert	17330a3cb1	[OpenMP] Avoid reading uninitialized parallel level values In a last minute change request for `a2dbfb6b72` we introduced a read of the uninitialized parallel level value in SPMD-mode. We go back to initializing the array early and checking for an adjusted level. Found by the miniqmc unit tests: https://cdash.qmcpack.org/CDash/viewTest.php?onlyfailed&buildid=203434 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101123	2021-04-23 11:21:58 -05:00
Joseph Huber	59b6849012	[OpenMP] Replace global InfoLevel with a reference to an internal one. Summary: This patch improves the implementation of D100774 by replacing the global variable introduced with a function that returns a reference to an internal one. This removes the need to define the variable in every plugin that uses it. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101102	2021-04-23 09:43:46 -04:00
Joseph Huber	2b6f20082e	[OpenMP] Add function for setting LIBOMPTARGET_INFO at runtime Summary: This patch adds a new runtime function __tgt_set_info_flag that allows the user to set the information level at runtime without using the environment variable. Using this will require an extern function, but will eventually be added into an auxilliary library for OpenMP support functions. This patch required moving the current InfoLevel to a global variable which must be instantiated by each plugin. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100774	2021-04-22 12:48:11 -04:00
Alexey Bataev	ca70512099	[OPENMP]Mark test as unsupported to avoid possible unexpected passes, NFC.	2021-04-22 08:06:25 -07:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
Alexey Bataev	079884225a	[OPENMP]Fix PR49698: OpenMP declare mapper causes segmentation fault. The implicitly generated mappings for allocation/deallocation in mappers runtime should be mapped as implicit, also no need to clear member_of flag to avoid ref counter increment. Also, the ref counter should not be incremented for the very first element that comes from the mapper function. Differential Revision: https://reviews.llvm.org/D100673	2021-04-21 10:38:31 -07:00
Hansang Bae	9b98497b44	[OpenMP] Add omp_target_is_accessible() to header files -- Added omp_target_is_accessible to the header files -- Added missing const qualifier to device memory routines Differential Revision: https://reviews.llvm.org/D100420	2021-04-16 07:54:15 -05:00
Joseph Huber	83d4b2e2e0	[OpenMP] Add info for device table changes Summary: This patch adds a feature to print information whenever the host-device pointer mapping table is changed by inserting or removing an entry. This introduces a new bit field for LIBOMPTARGET_INFO at position 0x8. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100600	2021-04-15 18:39:48 -04:00
Hansang Bae	3da61ddae7	[OpenMP] Define omp_is_initial_device() variants in omp.h omp_is_initial_device() is marked as a built-in function in the current compiler, and user code guarded by this call may be optimized away, resulting in undesired behavior in some cases. This patch provides a possible fix for such cases by defining the routine as a variant function and removing it from builtin list. Differential Revision: https://reviews.llvm.org/D99447	2021-04-06 16:58:01 -05:00
Joseph Huber	0af4e74aef	[OpenMP][NFC] Fix typo in libomptarget error message Summary: There was a typo suggesting the user to use `LIBOMPTARGET_DEBUG` instead of `LIBOMPTARGET_INFO`	2021-04-01 12:45:28 -04:00
Joseph Huber	29338459fb	[OpenMP] Trim error messages in CUDA plugin Summary: Remove some of the error messages printed when the CUDA plugin fails. The current error messages can be confusing because they are the first error messages printed after the async stream finds an error. This means that the printed values aren't related to what caused the issue, but are simply the last asyncronous operation that succeeded on the device. Remove these as they can be misleading. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D99510	2021-03-29 12:20:19 -04:00
Alexey Bataev	0411b23319	[OPENMP]Map data field with l-value reference types. Added initial support dfor the mapping of the data members with l-value reference types. Differential Revision: https://reviews.llvm.org/D98812	2021-03-29 07:07:09 -07:00
Joseph Huber	16064e71e9	[OpenMP] Reset async stream properly upon failure Summary: If the call to `synchronize` fails, it will currently block the stream indefinitely if execution is continued from this point. Additionally, if the program exits it will trigger an assertion on the non-null value of the async queue and prevent the runtime from printing debugging information. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D99443	2021-03-26 19:05:06 -04:00
Jon Chesterfield	626a31de15	[libomptarget] Add register usage info to kernel metadata Add register usage information to the runtime metadata so that it can be used during kernel launch (that change will be in a different commit). Add this information to the kernel trace. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D98829	2021-03-18 17:00:42 +00:00
Jon Chesterfield	dbf8f2b089	Revert "[libomptarget] Build amdgcn devicertl by default" This reverts commit `e23f3502d9`. It broke the build of openmp for clang built without amdgcn support. D98746, under review, would allow this to reland.	2021-03-17 11:34:44 +00:00
Johannes Doerfert	0a954a528b	[OpenMP][FIX] Repair accidental replacement of _shfl_sync with _shfl This was broken accidentally in D95752. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D98677	2021-03-15 22:46:00 -05:00
Jon Chesterfield	e23f3502d9	[libomptarget] Build amdgcn devicertl by default [libomptarget] Build amdgcn devicertl by default The cmake for this looks for an llvm install and does the right thing when building as part of enable_runtimes. It will probably do the right thing in other settings - at least, it won't try to build this with gcc. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98658	2021-03-15 23:17:50 +00:00
Jon Chesterfield	bb38d7ff05	[libomptarget][nfc][amdgcn] Use precise triple for devicertl build	2021-03-15 20:24:13 +00:00
Jon Chesterfield	d0bc85f04a	[libomptarget][nfc] Drop unused DEVICE macro [libomptarget][nfc] Drop unused DEVICE macro Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98655	2021-03-15 20:12:50 +00:00
Jon Chesterfield	7da76aaaf4	[libomptarget] Build amdgpu plugin by default [libomptarget] Build amdgpu plugin by default This will build the amdgpu plugin if cmake is able to find the hsa runtime library, which will be the case if rocm is installed or if the hsa library has been installed somewhere cmake looks. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98654	2021-03-15 20:12:01 +00:00
Jon Chesterfield	bcb3f0f867	[libomptarget] Fix devicertl build [libomptarget] Fix devicertl build The target specific functions in target_interface are extern C, but the implementations for nvptx were mostly C++ mangling. That worked out as a quirk of DEVICE macro expanding to nothing, except for shuffle.h which only forward declared the functions with C++ linkage. Also implements GetWarpSize, as used by shuffle, and includes target_interface in nvptx target_impl.cu to help catch future divergence between interface and implementation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98651	2021-03-15 19:50:22 +00:00
Jon Chesterfield	f675b3df48	[libomptarget] Drop assert.h, use freestanding for amdgcn devicertl [libomptarget] Drop assert.h, use freestanding for amdgcn devicertl Promotes the runtime assert to a link time error for the unimplemented fallback functions. Enables amdgcn to build with only clang provided headers, which makes it less likely to break other builds when enabled. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98649	2021-03-15 18:50:09 +00:00
Jon Chesterfield	156842937f	[libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding [libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding The glibc headers are a periodic source of problems compiling the devicertl. This patch resolves the following error run into while building llvm on a slightly different linux system. ``` In file included from .../lib/clang/13.0.0/include/inttypes.h:21: In file included from /usr/include/inttypes.h:25: /usr/include/features.h:461:12: fatal error: 'sys/cdefs.h' file not found # include <sys/cdefs.h> ^~~~~~~~~~~~~ ``` As a second patch, removing assert.h from shuffle will let amdgcn build as -ffreestanding, at which point only the headers that clang itself provides are used and interactions with the host glibc are eliminated. Doing the same for nvptx is complicated by printf handling but also seems worthwhile. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98565	2021-03-15 16:54:58 +00:00
George Rokos	2468fdd9af	[libomptarget] Add allocator support for target memory This patch adds the infrastructure for allocator support for target memory. Three allocators are introduced for device, host and shared memory. The corresponding API functions have the llvm_ prefix temporarily, until they become part of the OpenMP standard. Differential Revision: https://reviews.llvm.org/D97883	2021-03-13 03:47:07 -08:00
Johannes Doerfert	5449fbb5d4	[OpenMP][NFC] Use `AsyncInfo` as the variable name for a `__tgt_async_info` Reviewed By: grokos, tianshilei1992 Differential Revision: https://reviews.llvm.org/D96444	2021-03-11 23:31:34 -06:00
Johannes Doerfert	66ba494b49	[OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant The shuffle idiom is differently implemented in our supported targets. To reduce the "target_impl" file we now move the shuffle idiom in it's own self-contained header that provides the implementation for AMDGPU and NVPTX. A fallback can be added later on. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95752	2021-03-11 23:31:30 -06:00
Joseph Huber	807466ef28	[OpenMP] Restore backwards compatibility for libomptarget Summary: The changes introduced in D87946 changed the API for libomptarget functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x but was not given a backward-compatible interface. This change will require people using Clang 13.x or 12.x to recompile their offloading programs. Reviewed By: jdoerfert cchen Differential Revision: https://reviews.llvm.org/D98358	2021-03-11 09:52:11 -05:00
Shilei Tian	c41ae246ac	[OpenMP][Clang][NVPTX] Only build one bitcode library for each SM In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on NVPTX target. We don't need to have macros in source code to select right functions based on CUDA version. we don't need to compile multiple bitcode libraries of different CUDA versions for each SM. We don't need to worry about future compatibility with newer CUDA version. `-target-feature +ptx61` is used in this patch, which corresponds to the highest PTX version that CUDA 9.2 can support. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97198	2021-03-08 12:03:04 -05:00
Joel E. Denny	d0eb25a643	[OpenMP] Encapsulate more in checkDeviceAndCtors This patch just encapsulates some repeated code. To do so, it relocates some functions from interface.cpp to omptarget.cpp. It also adjusts them to the LLVM coding style. This patch is almost NFC except some `DP` messages are a bit different. For example, messages like "Entering target region" are now emitted even if offload is disabled, but a subsequent "Offload is disabled" is then emitted. Reviewed By: jdoerfert, grokos Differential Revision: https://reviews.llvm.org/D97908	2021-03-04 12:03:42 -05:00
Joel E. Denny	bfe5452b93	[OpenMP] Fix lone target exit data Without this patch, an `omp target exit data` before the runtime is initialized produces a runtime error. This patch fixes that by changing `__tgt_target_data_end_mapper` to call `CheckDeviceAndCtors` like many other runtime routines. Discussed at <https://lists.llvm.org/pipermail/openmp-dev/2021-March/003920.html>. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D97907	2021-03-04 12:03:42 -05:00
Joel E. Denny	10c18c69f2	[OpenMP] Fix support for device as host Without this patch, when the offload device is set to `omp_get_initial_device()`, the runtime fails with an error diagnostic when entering target regions or target data regions. However, OpenMP 5.1, sec. 2.14.5 "target Construct", "Restrictions", p. 203, L3-5 states: > The device clause expression must evaluate to a non-negative integer > value that is less than or equal to the value of > omp_get_num_devices(). Sec. 3.7.7 "omp_get_initial_device", p. 412, L2-3 states: > The value of the device number is the value returned by the > omp_get_num_devices routine. Similarly, OpenMP 5.0, sec. 2.12.5 "target Construct", "Restrictions", p. 174 L30-32 states: > The device clause expression must evaluate to a non-negative integer > value less than the value of omp_get_num_devices() or to the value > of omp_get_initial_device(). This patch fixes this behavior by changing the runtime to behave as if offloading is disabled whenever it finds the offload device (either from a `device` clause or the default device) is set to the host device. In the case of mandatory offloading when `omp_get_num_devices() == 0`, it incorporates the behavior proposed for OpenMP 5.2 in OpenMP spec github issue 2669. Reviewed By: grokos, RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D97616	2021-03-04 12:03:42 -05:00

1 2 3 4 5 ...

517 Commits