llvm-project

Commit Graph

Author	SHA1	Message	Date
Joseph Huber	f1c821fa85	[OpenMP] Add support for dynamic shared memory in new RTL This patch adds support for using dynamic shared memory in the new device runtime. The new function `__kmpc_get_dynamic_shared` will return a pointer to the buffer of dynamic shared memory. Currently the amount of memory allocated is set by an environment variable. In the future this amount will be added to the amount used for the smart stack which will be configured in a similar way. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110006	2021-09-17 21:25:36 -04:00
Joseph Huber	ec02c34b6d	[OpenMP] Add additional fields to device environment This patch adds fields for the device number and number of devices into the device environment struct and debugging values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110004	2021-09-17 21:25:32 -04:00
Joseph Huber	b266bcb135	[OpenMP] Implement __assert_fail in the new device runtime This patch implements the `__assert_fail` function in the new device runtime. This allows users and developers to use the standars assert function inside of the device. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D109886	2021-09-17 21:25:28 -04:00
Shilei Tian	81a1a91c62	[NFC] clang-format -i /openmp/libomptarget/deviceRTLs/interface.h	2021-09-17 12:55:02 -04:00
Hansang Bae	ae2a5facce	[OpenMP][libomptarget] Minor fix in x86_64 plugin Call to remove() was passing invalid address for the file name. Differential Revision: https://reviews.llvm.org/D109846	2021-09-15 15:57:06 -05:00
Ye Luo	2187cbf56f	[OpenMP][libomptarget] Add __tgt_target_return_t enum for __tgt_target_XXX return int The defintion of OFFLOAD_SUCCESS and OFFLOAD_FAIL used in plugin APIs and libomptarget public APIs are not consistent. Create __tgt_target_return_t for libomptarget public APIs. Differential Revision: https://reviews.llvm.org/D109304	2021-09-10 16:11:08 -05:00
Jon Chesterfield	6760234e8d	[libomptarget][amdgpu] Precisely manage hsa lifetime The hsa library must be initialized before any calls into it and destructed after the last call into it. There have been a number of bugs in this area related to member variables which would like to use raii to manage resources acquired from hsa. This patch moves the init/shutdown of hsa into a class, such that when used as the first member variable (could be a base), the lifetime of other member variables are reliably scoped within it. This will allow other classes to use raii reliably when used as member variables within the global. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D109512	2021-09-09 17:28:11 +01:00
Jon Chesterfield	2a581710c1	[openmp] No longer use LIBRARY_PATH to find devicertl Given D109057, change test runner to use the libomptarget-x-bc-path argument instead of the LIBRARY_PATH environment variable to find the device library. Also drop the use of LIBRARY_PATH environment variable as it is far too easy to pull in the device library from an unrelated toolchain by accident with the current setup. No loss in flexibility to developers as the clang commandline used here is still available. Reviewed By: jdoerfert, tianshilei1992 Differential Revision: https://reviews.llvm.org/D109061	2021-09-09 17:16:41 +01:00
Jon Chesterfield	d642156f8f	[libomptarget][nfc] Hoist hsa_init into rtl.cpp	2021-09-09 16:09:34 +01:00
Ye Luo	2cfe1a09d1	[OpenMP][libomptarget][NFC] Change checkDeviceAndCtors return type to bool. What is exactly needed is only a boolean. Pulling OFFLOAD_SUCCESS/FAIL only adds confusion. Differential Revision: https://reviews.llvm.org/D109303	2021-09-07 13:59:27 -05:00
Ye Luo	c3aecf87d5	[OpenMP][libomptarget] Change device vector elements to unique_ptr type Using std::vector<DeviceTy> requires implementing copy constructor and copied assign operator for DeviceTy. Indeed DeviceTy should never be copied. After changing to std::vector<std::unique_ptr<DeviceTy>>, All the unsafe copy constructor and copy assign operator implementations can be removed. Compilers mark them deleted due to mutex or underlying objects and this is the desired behavior. Differential Revision: https://reviews.llvm.org/D109276	2021-09-06 22:28:49 -05:00
Ye Luo	8e5c1b039e	[OpenMP][libomptarget] Change synchronize_ty return type to int32_t Plugins always return int32_t. Stay consistent with other functions which return error status. Differential Revision: https://reviews.llvm.org/D109341	2021-09-06 21:38:54 -05:00
Ron Lieberman	fdac5adee6	[openmp] NFC add bitcode comment	2021-09-02 18:21:39 -05:00
Jon Chesterfield	201e466eba	[libomptarget][amdgpu] Add gfx90a to build list	2021-09-02 18:11:02 +01:00
Jon Chesterfield	3153bdd547	[libomptarget][amdgpu] Drop env variables Use the same debug print as the rest of libomptarget plugins with the same environment control. Also drop the max queue size debugging hook as I don't believe it is still in use, can bring it back near the rest of the env handling in rtl.cpp if someone objects. That makes most of rt.h and all of utils.cpp unused. Clean that up and simplify control flow in a couple of places. Behaviour change is that debug prints that used to use the old environment variable now use the new one and print in slightly different format, and the removal of the max queue size variable. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D108784	2021-09-02 11:02:39 +01:00
Ye Luo	289a1089cd	[libomptarget] Move HostDataToTargetTy states into StatesTy Use unique_ptr to achieve the effect of mutable. Remove mutable keyword of DynRefCount and HoldRefCount Remove std::shared_ptr from UpdateMtx Reviewed By: tianshilei1992, grokos Differential Revision: https://reviews.llvm.org/D109007	2021-09-01 23:36:05 -05:00
Fangrui Song	4d5220faf9	[OpenMP] Fix -Wunused-but-set-parameter in -DLLVM_ENABLE_ASSERTIONS=off builds. NFC	2021-09-01 17:55:13 -07:00
Joel E. Denny	1f9e437065	[OpenMP][AMDGPU] Remove unneeded XFAILs	2021-09-01 18:00:25 -04:00
Joel E. Denny	786a140650	[OpenMP] Use IsHostPtr where needed in rest of omptarget.cpp As started in D107925, this patch replaces the remaining occurrences of `UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin` in `omptarget.cpp` with `IsHostPtr`. The former condition is broken in the rare case that the device and host happen to use the same address for their mapped allocations. I don't know how to write a test that's likely to reveal this case. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107928	2021-09-01 17:31:42 -04:00
Joel E. Denny	d11bab0b73	[OpenMP] Use IsHostPtr where needed for targetDataBegin As discussed in D105990, without this patch, `targetDataBegin` determines whether to transfer data (as opposed to assuming it's in shared memory) using the condition `!UseUSM \|\| HasCloseModifier`. However, this condition is broken if use of discrete memory was forced by `omp_target_associate_ptr`. This patch extends `unified_shared_memory/associate_ptr.c` to reveal this case, and it fixes it using `!IsHostPtr` in `DeviceTy::getTargetPointer` to replace this condition. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107927	2021-09-01 17:31:42 -04:00
Joel E. Denny	fa6c275505	[OpenMP][NFC] Eliminate CopyMember from targetDataEnd This patch is based on comments in D105990. It is NFC according to the following observations: 1. `CopyMember` is computed as `!IsHostPtr && IsLast`. 2. `DelEntry` is true only if `IsLast` is true. We apply those observations in order: ``` if ((DelEntry \|\| Always \|\| CopyMember) && !IsHostPtr) if ((DelEntry \|\| Always \|\| IsLast) && !IsHostPtr) if ((Always \|\| IsLast) && !IsHostPtr) ``` Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107926	2021-09-01 17:31:42 -04:00
Joel E. Denny	8e4836b2a2	[OpenMP] Use IsHostPtr where needed for targetDataEnd As discussed in D105990, without this patch, `targetDataEnd` determines whether to transfer data or delete a device mapping (as opposed to assuming it's in shared memory) using two different conditions, each of which is broken for some cases: 1. `!(UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin)`: The broken case is rare: the device and host might happen to use the same address for their mapped allocations. I don't know how to write a test that's likely to reveal this case, but this patch does fix it, as discussed below. 2. `!UNIFIED_SHARED_MEMORY \|\| HasCloseModifier`: There are at least two broken cases: 1. The `close` modifier might have been specified on an `omp target enter data` but not the corresponding `omp target exit data`, which thus might falsely assume a mapping is in shared memory. The test `unified_shared_memory/close_enter_exit.c` already has a missing deletion as a result, and this patch adds a check for that. This patch also adds the new test `close_member.c` to reveal a missing transfer and deletion. 2. Use of discrete memory might have been forced by `omp_target_associate_ptr`, as in the test `unified_shared_memory/api.c`. In the current `targetDataEnd` implementation, this condition turns out not be used for this case: because the reference count is infinite, a transfer is possible only with an `always` modifier, and this condition is never used in that case. To ensure it's never used for that case in the future, this patch adds the test `unified_shared_memory/associate_ptr.c`. Fortunately, `DeviceTy::getTgtPtrBegin` already has a solution: it reports whether the allocation was found in shared memory via the variable `IsHostPtr`. After this patch, `HasCloseModifier` is no longer used in `targetDataEnd`, and I wonder if the `close` modifier is ever useful on an `omp target data end`. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107925	2021-09-01 17:31:42 -04:00
Jon Chesterfield	cef1199686	Revert "[openmp] No longer use LIBRARY_PATH to find devicertl" This reverts commit `7a228f872f`. Failing test case under CI	2021-09-01 20:44:12 +01:00
Jon Chesterfield	7a228f872f	[openmp] No longer use LIBRARY_PATH to find devicertl Given D109057, change test runner to use the libomptarget-x-bc-path argument instead of the LIBRARY_PATH environment variable to find the device library. Also drop the use of LIBRARY_PATH environment variable as it is far too easy to pull in the device library from an unrelated toolchain by accident with the current setup. No loss in flexibility to developers as the clang commandline used here is still available. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109061	2021-09-01 20:24:34 +01:00
Jon Chesterfield	718e5a9883	[libomptarget] Set runpath on libomptarget, use that to drop LD_LIBRARY_PATH from test runner Using rpath instead of LD_LIBRARY_PATH to find libomp.so and libomptarget.so lets one rerun the already built test executables without setting environment variables and removes the risk of the test runner picking up different libraries to the developer debugging the failure. rpath usually means runpath, which is not transitive, so set runpath on libomptarget itself so that it can find the plugins located next to it, spelled $ORIGIN. This provides sufficient functionality to drop D102043 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D109071	2021-09-01 18:47:56 +01:00
Jon Chesterfield	f8bcbb82a7	[libomptarget] Normalise a cmake debug string, checking it triggers CI	2021-09-01 14:24:28 +01:00
Joel E. Denny	1688b4cf8e	[OpenMP][AMDGPU] XFAIL test where kernels call printf	2021-08-31 22:11:28 -04:00
Joel E. Denny	ec1ebcd302	[OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in runtime (2/2) This patch implements OpenMP runtime support for an original OpenMP extension we have developed to support OpenACC: the `ompx_hold` map type modifier. The previous patch in this series, D106509, implements Clang support and documents the new functionality in detail. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D106510	2021-08-31 16:13:49 -04:00
Joachim Protze	5ea1c37118	[libomptarget][amdcgn] Only add opt/llvm-link dependency if TARGET is available In some build configurations, the target we depend on is not available for declaring the build dependency. We only need to declare the build dependency, if the build target is available in the same build. Fixes the issue raised in https://reviews.llvm.org/D107156#2969862 This patch should go into release/13 together with D108404 Differential Revision: https://reviews.llvm.org/D108868	2021-08-30 17:32:11 +02:00
Shilei Tian	e8fdacfd81	[OpenMP][NVPTX] Fixed missing variables for CUDA free compilation in NVPTX plugin `CU_EVENT_DEFAULT` is defined in CUDA header. It should be added to `openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` for CUDA free build. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108878	2021-08-28 18:08:10 -04:00
Shilei Tian	29df4ab3f3	[OpenMP][Offloading] Add support for event related interfaces This patch adds the support form event related interfaces, which will be used later to fix data race. See D104418 for more details. Reviewed By: jdoerfert, ye-luo Differential Revision: https://reviews.llvm.org/D108528	2021-08-28 16:24:14 -04:00
George Rokos	a2bd44089e	[libomptarget][NFC] Fixed tests which checked for obsolete string "getOrAllocTgtPtr"	2021-08-28 07:35:42 -07:00
Jon Chesterfield	78f92c3810	[openmp][amdgpu] Initial gfx10 offloading implementation Lets wavefront size be 32 for amdgpu openmp, as well as 64. Fixes up as little as possible to pass that through the libraries. This change is end to end, as opposed to updating clang/devicertl/plugin separately. It can be broken up for review/commit if preferred. Posting as-is so that others with a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are probably bugs remaining as well as the todo: for letting grid values vary more. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108708	2021-08-27 12:34:03 +01:00
George Rokos	3819aae6dd	[libomptarget][NFC] Replaced obsolete name "getOrAllocTgtPtr" with new "getTargetPointer" in debug messages.	2021-08-26 18:01:18 -07:00
Jon Chesterfield	3d85342982	[libomptarget][amdgpu][nfc] Rename variables, delete dead code	2021-08-26 19:58:38 +01:00
Jon Chesterfield	68ab93f4d7	[libomptarget][amdgpu][nfc] Rename source files	2021-08-26 18:29:44 +01:00
Jon Chesterfield	a5f4074d85	[libomptarget][amdgpu] Macro for accessing GPU variables from plugin Lets the amdgpu plugin write to omptarget_device_environment to enable debugging. Intend to use in the near future to record the wavesize that a given deviceRTL was compiled with for running on hardware that supports 32 or 64. Patch sets all the attributes that are useful. Notably .data means the variable is set by writing to host memory before copying to the GPU instead of launching a kernel to update the image. Can simplify the plugin slightly to drop the code for patching after load if this is used consistently. NFC on nvptx, cuda plugin seems to work fine without any annotations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108698	2021-08-26 17:28:18 +01:00
Jon Chesterfield	ba0af885e7	[libomptarget][amdgpu][nfc] Make grid value access match devicertl	2021-08-25 15:11:19 +01:00
Jon Chesterfield	9b2c6c07b5	[libomptarget][amdgpu] Refactor debug printing Move most debug printing in rtl.cpp behind DP() macro Adjust the print output for gpu arch mismatch when the architectures match Convert an assert into graceful failure Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108562	2021-08-25 14:57:51 +01:00
Jon Chesterfield	ba8547775b	[libomptarget][amdgpu] Fix debug build from D104696	2021-08-25 01:27:51 +01:00
Michael Kruse	1275ee3041	[OpenMP][amdgcn] Don't use in-tree clang if not available. The use of `$<TARGET_FILE:clang>` was adapted too broadly from D101265. Fixes llvm.org/PR51579 Also see discussion in D108534. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D108640	2021-08-24 12:50:49 -05:00
Pushpinder Singh	9b8b7c1180	[AMDGPU][Libomptarget] Delete g_atl_machine global With uses of g_atl_machine gone, a significant portion of dead code has been removed. This patch depends on D104691 and D104695. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104696	2021-08-24 07:59:40 +00:00
Jon Chesterfield	d26000e4cc	[openmp][devicertl] Freestanding nvptx via stub printf Compiled nvptx devicertl as freestanding, breaking the dependency on host glibc and gcc-multilibs. Thus build it by default. Comes at the cost of #defining out printf. Tried mapping it onto __builtin_printf but that gets transformed back to printf instead of hitting the cuda/openmp lowering transform. Printf could be preserved by one of: - dropping all the standard headers and ffreestanding - providing a header only printf implementation - changing the compiler handling of printf Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D108349	2021-08-23 23:07:47 +01:00
Jon Chesterfield	842f875c8b	[openmp] Use llvm GridValues from devicertl Add include path to the cmakefiles and set the target_impl enums from the llvm constants instead of copying the values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108391	2021-08-23 20:25:24 +01:00
Joachim Protze	4bb36df144	[libomptarget][amdcgn] Add build dependency for llvm-link and opt D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime (LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same cmake instance. We could limit the dependency to cases where libomptarget/plugins are really built. But compared to the whole llvm project, building openmp runtime is negligible and postponing the build of OpenMP runtime after the dependencies are ready seems reasonable. The direct dependency introduced in D107156 and D107320 is necessary for the case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp). Differential Revision: https://reviews.llvm.org/D108404	2021-08-20 01:57:58 +02:00
Jennifer Yu	c274b19866	Add implicit map for a list item appears in a reduction clause. A new rule is added in 5.0: If a list item appears in a reduction, lastprivate or linear clause on a combined target construct then it is treated as if it also appears in a map clause with a map-type of tofrom. Currently map clauses for all capture variables are added implicitly. But missing for list item of expression for array elements or array sections. The change is to add implicit map clause for array of elements used in reduction clause. Skip adding map clause if the expression is not mappable. Noted: For linear and lastprivate, since only variable name is accepted, the map has been added though capture variables. To do so: During the mappable checking, if error, ignore diagnose and skip adding implicit map clause. The changes: 1> Add code to generate implicit map in ActOnOpenMPExecutableDirective, for omp 5.0 and up. 2> Add extra default parameter NoDiagnose in ActOnOpenMPMapClause: Use that to skip error as well as skip adding implicit map during the mappable checking. Note: there are only tow places need to be check for NoDiagnose. Rest of them either the check is for < omp 5.0 or the error already generated for reduction clause. Differential Revision: https://reviews.llvm.org/D108132	2021-08-19 12:53:47 -07:00
Jon Chesterfield	ad0f6e1d98	[openmp] Disable the tests that block CI for amdgpu and host offloading.	2021-08-19 20:43:30 +01:00
Jon Chesterfield	6c75ce1b8b	[libomptarget][nfc] Move lanemask_t type into target_impl.h	2021-08-19 18:50:03 +01:00
Jon Chesterfield	77579b99e9	[openmp][nfc] Replace OMPGridValues array with struct [nfc] Replaces enum indices into an array with a struct. Named the fields to match the enum, leaves memory layout and initialization unchanged. Motivation is to later safely remove dead fields and replace redundant ones with (compile time) computation. It should also be possible to factor some common fields into a base and introduce a gfx10 amdgpu instance with less duplication than the arrays of integers require. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108339	2021-08-19 13:25:42 +01:00
Jon Chesterfield	f420939b82	[libomptarget] Apply D106710 to amdgcn devicertl	2021-08-19 01:34:33 +01:00
Jon Chesterfield	c480792b6a	[libomptarget][nfc][devicertl] Delete unused enums	2021-08-19 00:14:34 +01:00
Jon Chesterfield	21d91a8ef3	[libomptarget][devicertl] Replace lanemask with uint64 at interface Use uint64_t for lanemask on all GPU architectures at the interface with clang. Updates tests. The deviceRTL is always linked as IR so the zext and trunc introduced for wave32 architectures will fold after inlining. Simplification partly motivated by amdgpu gfx10 which will be wave32 and is awkward to express in the current arch-dependant typedef interface. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108317	2021-08-18 20:47:33 +01:00
Joseph Huber	edb8acdc6e	[Libomptarget] Correctly default to Generic if exec_mode is not present Currently, the runtime returns an error when the `exec_mode` global is not present. The expected behvaiour is that the region will default to Generic. This prevents global constructors from being called because they do not contain execution mode globals. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108255	2021-08-18 11:24:28 -04:00
George Rokos	df06ec3057	[libomptarget][NFC] Fix compilation issue with GCC Removed redundant assignment from condition which causes gcc to emit the following error: error: operation on ‘MoveData’ may be undefined [-Werror=sequence-point]	2021-08-10 09:43:43 -07:00
Joel E. Denny	2ced1f338a	[OpenMP][NFC] Simplify targetDataEnd conditions for CopyMember targetDataEnd and targetDataBegin compute CopyMember/copy differently, and I don't see why they should. This patch eliminates one of those differences by making a simplifying NFC change to targetDataEnd. The change is NFC as follows. The change only affects the case when `!UNIFIED_SHARED_MEMORY \|\| HasCloseModifier`. In that case, the following points are always true: * The value of CopyMember is relevant later only if DelEntry = false. * DelEntry = false only if one of the following is true: * IsLast = false. In this case, it's always true that CopyMember = false = IsLast. * `MEMBER_OF && !PTR_AND_OBJ` is true. In this case, CopyMember = IsLast. * Thus, if CopyMember is relevant, CopyMember = IsLast. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D105990	2021-08-10 12:29:55 -04:00
Dimitry Andric	400cd6d2f0	[libomptarget][amdgpu] use --allow-shlib-undefined to link on FreeBSD On FreeBSD, the `environ` symbol is undefined at link time for shared libraries, but resolved by the dynamic linker at runtime. Therefore, allow the symbol to be undefined when creating a shared library, by using the `--allow-shlib-undefined` linker flag, instead of `-z defs` (a.k.a `--no-undefined`). Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107698	2021-08-08 13:52:44 +02:00
Dimitry Andric	71ae2e0221	[libomptarget][amdgpu] don't declare Elf_Note on FreeBSD On FreeBSD, the system `<libelf.h>` already declares `struct Elf_Note` indirectly (via `<sys/elf_common.h>`). This results in compile errors when building the libomptarget amdgpu plugin. Avoid redeclaring `struct Elf_Note` on FreeBSD to fix the errors. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107661	2021-08-06 21:45:26 +02:00
Shilei Tian	28939b6ae5	[NFC] Clean up and clang-format openmp/libomptarget/plugins/cuda/src/rtl.cpp	2021-08-05 22:32:28 -04:00
Lechen Yu	3bc8ce5dd7	[openmp] Add OMPT initialization in libomptarget When loading libomptarget, the init function in libomptarget/src/rtl.cpp will search for the libomptarget_start_tool function using libdl. libomptarget_start_tool will pass those OMPT callbacks related to target constructs to libomptarget Differential Revision: https://reviews.llvm.org/D99803	2021-08-04 18:00:11 +02:00
Jon Chesterfield	567c8c7bfd	[libomptarget][nfc] Only set cuda-path for nvptx tests Remove --cuda-path=CUDA_TOOLKIT_ROOT_DIR-NOTFOUND from the invocation of non-nvptx test cases. Better signal to noise ratio on other architectures. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D107074	2021-07-30 23:01:09 +01:00
Jose M Monsalve Diaz	5424ceeda0	[OpenMP] Fixing llvm-omp-device-info compilation with runtimes When using `-DLLVM_ENABLED_RUNTIMES` instead of `-DLLVM_ENABLED_PROJECTS` the `llvm-omp-device-info` tool is not compiled or installed. In general, no llvm tool would be build on runtimes, because the -DLLVM_BUILD_TOOLS flag is removed by the way runtimes compilation calls cmake again. This patch is simple. Just forward the value of this flag to the runtime cmake command. I'm also removing an unnecessary comment in the compilation of the tool Differential Revision: https://reviews.llvm.org/D107177	2021-07-30 13:09:08 -05:00
Shilei Tian	36d53af4a9	[OpenMP][Offloading] Remove task wait in nowait interfaces All `nowait` series of interfaces in `libomptarget` accept four more arguments (`int32_t depNum, void depList, int32_t noAliasDepNum, void noAliasDepList`) compared with their counterparts w/o `nowait`. These extra arguments were expected for dependence resolution, potentially lowered to device side. Current implementation calls `libomp` function `__kmpc_omp_taskwait`. However, the front end simply ignores them, that these four arguments are not emitted at all. As a consequence, the `depNum` and `noAliasDepNum` are garbage, which could lead to unnecessary task wait. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107164	2021-07-30 11:39:46 -04:00
Joachim Protze	4ffa1478fd	[libomptarget][amdcgn] Add build dependency for opt This patch should fix the build we observe when building LLVM from scratch. Differential Revision: https://reviews.llvm.org/D107156	2021-07-30 15:45:13 +02:00
Jon Chesterfield	a90da62adb	[libomptarget][amdgpu] Update printed plugin name	2021-07-29 14:46:42 +01:00
Jose M Monsalve Diaz	88e66fa60a	[OpenMP] Fixing missing variables when CUDA SDK not in system This patch fixes the error reported in D106751. When there is no CUDA SDK installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE` variables. Using @zsrkmyn sugested fix Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106933	2021-07-27 23:46:15 -05:00
Jose M Monsalve Diaz	313c523995	[OpenMP][Tool] Introducing the `llvm-omp-device-info` tool This patch introduces the `llvm-omp-device-info` tool, which uses the omptarget library and interface to query the device info from all the available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo` Since omptarget usually requires a description structure with executable kernels, I split the initialization of the RTLs and Devices to be able to initialize all possible devices and query each of them. This revision relies on the patch that introduces the print device info. A limitation is that the order in which the devices are initialized, and the corresponding device ID is not necesarily the one seen by OpenMP. The changes are as follows: 1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function 2. Create an `initAllRTLs` method that initializes all available RTLs at runtime 3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information. Example Output: ``` Device (0): print_device_info not implemented Device (1): print_device_info not implemented Device (2): print_device_info not implemented Device (3): print_device_info not implemented Device (4): CUDA Driver Version: 11000 CUDA Device Number: 0 Device Name: Quadro P1000 Global Memory Size: 4236312576 bytes Number of Multiprocessors: 5 Concurrent Copy and Execution: Yes Total Constant Memory: 65536 bytes Max Shared Memory per Block: 49152 bytes Registers per Block: 65536 Warp Size: 32 Threads Maximum Threads per Block: 1024 Maximum Block Dimensions: 1024, 1024, 64 Maximum Grid Dimensions: 2147483647 x 65535 x 65535 Maximum Memory Pitch: 2147483647 bytes Texture Alignment: 512 bytes Clock Rate: 1480500 kHz Execution Timeout: Yes Integrated Device: No Can Map Host Memory: Yes Compute Mode: DEFAULT Concurrent Kernels: Yes ECC Enabled: No Memory Clock Rate: 2505000 kHz Memory Bus Width: 128 bits L2 Cache Size: 1048576 bytes Max Threads Per SMP: 2048 Async Engines: Yes (2) Unified Addressing: Yes Managed Memory: Yes Concurrent Managed Memory: Yes Preemption Supported: Yes Cooperative Launch: Yes Multi-Device Boars: No Compute Capabilities: 61 ``` Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106752	2021-07-27 22:38:35 -04:00
Jose M Monsalve Diaz	d2f85d0910	[OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget` This patch introduces a function in the device's plugin to print the device information. This patch relates to another patch that introduces a CLI tool to obtain the device information from the omplibrary directly. It is inspired by PGI's pgaccelinfo. The modifications are as follows: 1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL. 2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented 3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy` 4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106751	2021-07-27 21:47:57 -04:00
Jose M Monsalve Diaz	5ab6aedda9	[OpenMP] Folding threadLimit and numThreads when single value in kernels The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block` and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and `NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions. The code checks all the kernels, and if their attributes match, the functions are folded. In the future we will explore specializing for multiple values of NumThreads and NumTeams. Depends on D106390 Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D106033	2021-07-27 21:47:12 -04:00
Johannes Doerfert	ed7ec860f0	[OpenMP] Improve alignment handling in the new device runtime	2021-07-27 17:50:27 -05:00
Joseph Huber	e3ee76245e	[Libomptarget] Revert new variable sharing to use the old method The new method of sharing variables introduces a `__kmpc_alloc_shared` call that cannot be removed in the middle end because of its non-constant argument and unconnected free. This patch reverts this to the old method that used a static amount of shared memory for sharing variables. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106905	2021-07-27 18:14:01 -04:00
Johannes Doerfert	67ab875ff5	[OpenMP] Prototype opt-in new GPU device RTL The "old" OpenMP GPU device runtime (D14254) has served us well for many years but modernizing it has caused some pain recently. This patch introduces an alternative which is mostly written from scratch embracing OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual interfaces. This new runtime is opt-in through a clang flag (D106793). The new runtime is currently only build for nvptx and has "-new" in its name. The design is tailored towards middle-end optimizations rather than front-end code generation choices, a trend we already started in the old runtime a while back. In contrast to the old one, state is organized in a simple manner rather than a "smart" one. While this can induce costs it helps optimizations. Our expectation is that the majority of codes can be optimized and a "simple" design is therefore preferable. The new runtime does also avoid users to pay for things they do not use, especially wrt. memory. The unlikely case of nested parallelism is supported but costly to make the more likely case use less resources. The worksharing and reduction implementation have been taken from the old runtime and will be rewritten in the future if necessary. Documentation and debug features are still mostly missing and will be added over time. All external symbols start with `__kmpc` for legacy reasons but should be renamed once we switch over to a single runtime. All internal symbols are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name clashes with user symbols. Differential Revision: https://reviews.llvm.org/D106803	2021-07-27 00:56:05 -05:00
Shilei Tian	e97e0a4fad	[AbstractAttributor] Fold __kmpc_parallel_level if possible Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible. Note that `__kmpc_parallel_level` doesn't take activeness into consideration, based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead of 0, 129, 130, etc. that also indicate activeness. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106154	2021-07-26 22:46:19 -04:00
Jon Chesterfield	2a613a7790	[libomptarget] Build amdgpu plugin without hsa Default to building the amdgpu plugin to use dlopen when hsa is not found instead of disabling it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106600	2021-07-26 09:54:51 +01:00
Jon Chesterfield	93fe84d32f	[libomptarget][nfc] Squash unused variable warning Suppress only current warning on openmp-clang-x86_64-linux-debian Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106777	2021-07-26 09:54:31 +01:00
Jon Chesterfield	dd0b463dd9	[libomptarget][amdgpu] More robust handling of failure to init HSA If hsa_init fails, subsequent calls into hsa are not safe. Except for hsa_init, but we don't retry on failure. This patch: - deletes a print that called into hsa to ask why it can't call into hsa - drops a merge conflict block next to that print - reliably initializes number of devices to zero - skips the plugin destructor contents if the constructor failed to init hsa Tested by making hsa_init return error, and by forcing the dynamic library use which was then deleted from disk. Before this patch, both segv. After it, friendly message about offloading being unavailable. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106774	2021-07-25 23:15:58 +01:00
Jon Chesterfield	e3251f2ec4	Revert "[libomptarget] Build amdgpu plugin without hsa" Inaccurate error handling around hsa_init This reverts commit `e30b3b23a4`.	2021-07-25 21:03:51 +01:00
Jon Chesterfield	e30b3b23a4	[libomptarget] Build amdgpu plugin without hsa Default to building the amdgpu plugin to use dlopen when hsa is not found instead of disabling it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106600	2021-07-25 19:33:36 +01:00
Shilei Tian	f1b8fa55d0	[OpenMP][NVPTX] Disable OpenMPOpt when building deviceRTLs We build `deviceRTLs` with `-O1` by default, which also triggers OpenMPOpt. When the info cache is created, some attributes are removed. As a result, although we mark a few functions `noinline`, they are still inlined when the bitcode library is generated. This can cause an issue in middle end optimization. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106710	2021-07-25 10:38:27 -04:00
Shilei Tian	c2c43132f6	[OpenMP] Fix bug 50022 Bug 50022 [0] reports target nowait fails in certain case, which is added in this patch. The root cause of the failure is, when the second task is created, its parent's `td_incomplete_child_tasks` will not be incremented because there is no parallel region here thus its team is serialized. Therefore, when the initial thread is waiting for its unfinished children tasks, it thought there is only one, the first task, because it is hidden helper task, so it is tracked. The second task will only be pushed to the queue when the first task is finished. However, when the first task finishes, it first decrements the counter of its parent, and then release dependences. Once the counter is decremented, the thread will move on because its counter is reset, but actually, the second task has not been executed at all. As a result, since in this case, the main function finishes, then `libomp` starts to destroy. When the second task is pushed somewhere, all some of the structures might already have already been destroyed, then anything could happen. This patch simply moves `__kmp_release_deps` ahead of decrement of the counter. In this way, we can make sure that the initial thread is aware of the existence of another task(s) so it will not move on. In addition, in order to tackle dependence chain starting with hidden helper thread, when hidden helper task is encountered, we force the task to release dependences. Reference: [0] https://bugs.llvm.org/show_bug.cgi?id=50022 Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D106519	2021-07-23 16:54:11 -04:00
Joseph Huber	e1dedecaa6	[Libomptarget] Add unroll flag to shared variables loop Unrolling this loop provides better performance in practice because it is executed on the device and is likely to be very small. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106692	2021-07-23 16:45:27 -04:00
Shilei Tian	18ce3d3f2c	[OpenMP][Offloading] Fix data race in data mapping by using two locks This patch tries to partially fix one of the two data race issues reported in [1] by introducing a per-entry mutex. Additional discussion can also be found in D104418, which will also be refined to fix another data race problem. Here is how it works. Like before, `DataMapMtx` is still being used for mapping table lookup and update. In any case, we will get a table entry. If we need to make a data transfer (update the data on the device), we need to lock the entry right before releasing `DataMapMtx`, and the issue of data transfer should be after releasing `DataMapMtx`, and the entry is unlocked afterwards. This can guarantee that: 1) issue of data movement is not in critical region, which will not affect performance too much, and also will not affect other threads that don't touch the same entry; 2) if another thread accesses the same entry, the state of data movement is consistent (which requires that a thread must first get the update lock before getting data movement information). For a target that doesn't support async data transfer, issue of data movement is data transfer. This two-lock design can potentially improve concurrency compared with the design that guards data movement with `DataMapMtx` as well. For a target that supports async data movement, we could simply attach the event between the issue of data movement and unlock the entry. For a thread that wants to get the event, it must first get the lock. This can also get rid of the busy wait until the event pointer is valid. Reference: [1] https://bugs.llvm.org/show_bug.cgi?id=49940 Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104555	2021-07-23 16:10:51 -04:00
Abhinav Gaba	f7c92995c0	[OpenMP] Fix CUDA plugin build after `3817ba13ae`. The build was broken on machines that don't have Cuda SDK installed. See https://reviews.llvm.org/D106627 for the original discussion.	2021-07-23 16:50:00 +08:00
Johannes Doerfert	d12ee28e2e	[OpenMP] Simplify the ThreadStackTy for globalization fallback With D106496 we can make the globalization fallback stack much simpler and this version doesn't seem to experience the spurious failures and deadlocks we have seen before. Differential Revision: https://reviews.llvm.org/D106576	2021-07-22 23:57:46 -05:00
Joseph Huber	76c0c0ca86	[OpenMP][NFC] Fix formatting in CUDA plugin	2021-07-22 21:50:40 -04:00
Joseph Huber	3817ba13ae	[OpenMP] Add environment variables to change stack / heap size in the CUDA plugin This patch adds support for two environment variables to configure the device. ``LIBOMPTARGET_STACK_SIZE`` sets the amount of memory in bytes that each thread has for its stack. ``LIBOMPTARGET_HEAP_SIZE`` sets the amount of heap memory that can be allocated using malloc / free on the device. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106627	2021-07-22 21:40:02 -04:00
Jose M Monsalve Diaz	68d6278a6e	[OpenMP] Renaming RT functions `GetNumberOfBlocksInKernel` and `GetNumberOfThreadsInBlock` These functions should follow the camel case convention. These are really easy to change and are needed for D106033. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D106390	2021-07-22 18:17:49 -04:00
Jon Chesterfield	9e05c084e5	[libomptarget][amdgpu][nfc] Normalise license headers Reviewed By: gregrodgers, jdoerfert Differential Revision: https://reviews.llvm.org/D106581	2021-07-22 20:23:41 +01:00
Jon Chesterfield	14e34a83b0	[libomptarget][amdgpu][nfc] Replace use of gelf.h with libelf.h AMDGPU can assume Elf64 so doesn't need to abstract over Elf32 Drop a few other unused headers at the same time. Now only llvm elf and libelf are used by the plugin. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106579	2021-07-22 20:04:13 +01:00
Jon Chesterfield	1a96570621	[libomptarget][amdgpu] Implement dlopen of libhsa AMDGPU plugin equivalent of D95155, build without HSA installed locally Compiles a new file, plugins/amdgpu/dynamic_hsa/hsa.cpp, to an object file that exposes the same symbols that the plugin presently uses from hsa. The object file contains dlopen of hsa and cached dlsym calls. Also provides header files corresponding to the subset that is used. This is behind a feature flag, LIBOMPTARGET_FORCE_DLOPEN_LIBHSA, default off. That allows developers to build against the dlopen/dlsym implementation, e.g. while testing this mode. Enabling by default will cause this plugin to build on a wider variety of machines than it does at present so may break some CI builds. That risk can be minimised by reviewing the header dependencies of the library and ensuring it doesn't use any libraries that are not already used by libomptarget. Separating the implementation from enabling by default in case the latter needs to be rolled back after wider CI results. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106559	2021-07-22 16:54:10 +01:00
Jon Chesterfield	6e9cd3e9f1	[libomptarget][nfc] Improve static assert message in dlwrap Revision of D102858. Raise dlwrap arity argument to template argument so the correct value is given in the error message. E.g. '2 == 1' instead of '2 == trait<>::nargs'. Arity higher than it should be: Before diff ``` $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: error: static_assert failed due to requirement '2 == trait<cudaError_enum (*)(unsigned int)>::nargs' "Arity Error" DLWRAP_INTERNAL(cuInit, 2); ^~~~~~~~~~~~~~~~~~~~~~~~~~ ... $/include/dlwrap.h:166:3: note: expanded from macro 'DLWRAP_COMMON' static_assert(ARITY == trait<decltype(&SYMBOL)>::nargs, "Arity Error"); \ ``` After diff In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16: ``` $/include/dlwrap.h:131:3: error: static_assert failed due to requirement '2UL == 1UL' "Arity Error" static_assert(Requested == Required, "Arity Error"); ^ ~~~~~~~~~~~~~~~~~~~~~ $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in instantiation of function template specialization 'dlwrap::verboseAssert<2UL, 1UL>' requested here DLWRAP_INTERNAL(cuInit, 2); ``` Arity lower than it should be: Before diff ``` $/plugins/cuda/dynamic_cuda/cuda.cpp:131:10: error: no matching function for call to 'dlwrap_cuInit' return dlwrap_cuInit(X); ^~~~~~~~~~~~~ $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: candidate function not viable: requires 0 arguments, but 1 was provided DLWRAP_INTERNAL(cuInit, 0); ``` After diff In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16: ``` $/include/dlwrap.h:131:3: error: static_assert failed due to requirement '0UL == 1UL' "Arity Error" static_assert(Requested == Required, "Arity Error"); ^ ~~~~~~~~~~~~~~~~~~~~~ $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in instantiation of function template specialization 'dlwrap::verboseAssert<0UL, 1UL>' requested here DLWRAP_INTERNAL(cuInit, 0); ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106543	2021-07-22 15:24:20 +01:00
Joseph Huber	a158d3663f	[OpenMP] Fix warnings for uninitialized block counts Summary: Fixes some warning given for uninitialized block counts if the exection mode is not recognized. This shouldn't happen in practice because the execution mode is checked when it's read from the device.	2021-07-22 09:24:07 -04:00
Jon Chesterfield	dc1f6f8b92	[libomptarget][amdgpu][nfc] Drop dead signal pool setup This class is instantiated once in rtl.cpp before hsa_init is called. The hsa_signal_create call therefore fails leaving the pool empty. This signal pool is a legacy from ATMI where it was constructed after hsa_init. Moving the state into the rtl.cpp global class disabled the initial populating of the pool without noticeably changing performance. Just rechecked with a fix that allocates the signals after hsa_init and that also doesn't noticeably change performance. This patch therefore drops the initialisation. Only change from main is to drop a DEBUG_PRINT statement that would say the pool initial size is zero. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106515	2021-07-22 10:29:32 +01:00
Joseph Huber	4a66860424	[OpenMP] Add an option to disable function internalization Function internalization can sometimes occur in situations where we want to keep the call sites intact. This patch adds an option to disable function internalization and prevents the device runtime from being internalized while creating the bitcode library. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106438	2021-07-21 21:18:18 -04:00
Joseph Huber	1684012a47	[Libomptarget] Introduce new main thread ID runtime function This patch introduces `__kmpc_is_generic_main_thread_id` which splits the old comparison into its own runtime function. The purpose of this is so we can fold this part independently, so when both this and `is_spmd_mode` are folded the final function will be folded as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106437	2021-07-21 21:18:14 -04:00
Joseph Huber	7d57639264	[OpenMP] Add new execution mode for SPMD execution with Generic semantics Qualified kernels can be transformed from generic-mode to SPMD mode using an optimization in OpenMPOpt. This patch introduces a new execution mode to indicate kernels that have been transformed from generic-mode to SPMD-mode. These kernels have SPMD-mode execution, but need generic-mode semantics for scheduling the blocks and threads. Without this far too few blocks will be scheduled for a generic region as SPMD mode expects the trip count to be divided by the number of threads. Reviewed By: ggeorgakoudis Differential Revision: https://reviews.llvm.org/D106460	2021-07-21 20:57:28 -04:00
Joseph Huber	754eb1c210	[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. This makes it easier to implement the allocator in the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106496	2021-07-21 20:56:21 -04:00
Giorgis Georgakoudis	5a682d9b91	[OpenMP] Expose libomptarget function to get HW thread id The patch exposes the libomptarget runtime function that gets the hardware thread id through the kmpc API. This is to be used in SPMDization for checking the thread id to execute regions by a single thread in a block. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106323	2021-07-21 10:26:04 -07:00
Jon Chesterfield	a733bbbd17	[libomptarget][amdgpu][nfc] Refactor #includes Create a hsa_api.h header that includes the ROCr headers in use Drop some unused headers and _cplusplus macros Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106455	2021-07-21 17:28:07 +01:00
Shilei Tian	55c65884a4	[OpenMP][deviceRTLs] Update return type of function __kmpc_parallel_level In `deviceRTLs`, the parallel level is stored in a shared variable of type `uint8_t`. `__kmpc_parallel_level` currently returns a 16-bit interger. This patch first changes the return type of the function to `uint8_t`, same as the shared variable, and then corrects function type which was updated in D105955. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106384	2021-07-20 15:45:43 -04:00
Shilei Tian	02dff78983	[NFC][OpenMP] Fix an issue that no CHECK in test cases This fixes the complaint from FileCheck. Reviewed By: abhinavgaba, jdoerfert Differential Revision: https://reviews.llvm.org/D106387	2021-07-20 15:39:18 -04:00
Shilei Tian	996baa58a4	[OpenMP] Fixed a segmentation fault when using taskloop and target nowait The synchronization of task loop misses hidden helper tasks, causing segmentation fault reported in https://bugs.llvm.org/show_bug.cgi?id=50002. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D106220	2021-07-19 21:09:05 -04:00
Joseph Huber	762badb0ab	[Libomptarget] Remove volatile from NVPTX work function Currently the NPVTX work function is marked volatile. This prevents some optimizations from using this value. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106310	2021-07-19 20:03:25 -04:00
Giorgis Georgakoudis	fb0cf01795	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `e9c7291cb2`. Fix failing tests	2021-07-19 07:54:26 -07:00
Shilei Tian	4504e1134c	[OpenMP][CMake] Fix an issue when there is space in the argument LIBOMPTARGET_LIT_ARGS D106236 added a new CMake argument for `libomptarget` test, but when user's input contains white spaces, CMake will add escape char to the final lit command, which leads to an error. This patch converts the user's input `LIBOMPTARGET_LIT_ARGS` into a local array, and then passes the array to the function. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D106247	2021-07-18 21:54:14 -04:00
Shilei Tian	954711ed8f	[OpenMP][Offloading] Add a CMake argument LIBOMPTARGET_LIT_ARGS to control behavior of libomptarget lit test By default, `lit` uses all threads to invoke tests, which can easily cause out of memory on GPUs because most of OpenMP offloading test usually take about 1GB GPU memory, but a typical GPU only has 4-8GB memory. This patch introduce a CMake argument `LIBOMPTARGET_LIT_ARGS` to allow users to control the behavior of `libomptarget` tests, similar to `LLVM_LIT_ARGS`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D106236	2021-07-18 13:16:10 -04:00
Shilei Tian	4357cfc792	[OpenMP][Offloading] Add -g when compiling deviceRTLs in debug mode Currently when we compile the project in debug mode, `-g` will not be added to compilation flag. The bc files generated in different mode are of different size. When using GPU debuggers like `cuda-gdb`, it is expected to provide more info with a debug version of bc lib. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D106229	2021-07-18 09:34:54 -04:00
Giorgis Georgakoudis	e9c7291cb2	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102107	2021-07-16 23:27:44 -07:00
Shilei Tian	97c8f60bba	[NFC][OpenMP][Offloading] Replaced explicit parallel level computation with function `__kmpc_parallel_level` There are two places in current deviceRTLs where it computes parallel level explicitly, which is basically the functionality of `__kmpc_parallel_level`. Starting from D105787, we plan to introduce a series of function call folding based on information that can be deducted during compilation time. Computation of parallel level is the next target. This patch makes steps for the optimization. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105955	2021-07-15 22:21:06 -04:00
George Rokos	0c7a4870c5	[libomptarget] Keep the Shadow Pointer Map up-to-date D105812 introduced a regression where if a PTR_AND_OBJ entry was mapped on the device, then the OBJ was deallocated and then reallocated at a different address, the Shadow Pointer Map would still contain an entry for the PTR but pointing to the old address. This caused test `env/base_ptr_ref_count.c` to fail. Differential Revision: https://reviews.llvm.org/D105947	2021-07-14 15:19:58 -07:00
George Rokos	bb0166dc72	[libomptarget] Update device pointer only if needed Currently, libomptarget will always perform a host-to-device memory transfer in order to update the device pointer of a PTR_AND_OBJ entry. This is not always necessary because the device pointer may have been set to the correct pointee address already, so we can eliminate the redundant memory transfer.	2021-07-13 04:18:55 -07:00
Jon Chesterfield	b6b53ffef4	[libomptarget][devicertl] Remove branches around setting parallelLevel Simplifies control flow to allow store/load forwarding This change folds two basic blocks into one, leaving a single store to parallelLevel. This is a step towards spmd kernels with sufficiently aggressive inlining folding the loads from parallelLevel and thus discarding the nested parallel handling when it is unused. Transform: ``` int threadId = GetThreadIdInBlock(); if (threadId == 0) { parallelLevel[0] = expr; } else if (GetLaneId() == 0) { parallelLevel[GetWarpId()] = expr; } // => if (GetLaneId() == 0) { parallelLevel[GetWarpId()] = expr; } // because unsigned GetLaneId() { return GetThreadIdInBlock() & (WARPSIZE - 1);} // so whenever threadId == 0, GetLaneId() is also 0. ``` That replaces a store in two distinct basic blocks with as single store. A more aggressive follow up is possible if the threads in the warp/wave race to write the same value to the same address. This is not done as part of this change. ``` if (GetLaneId() == 0) { parallelLevel[GetWarpId()] = expr; } // => parallelLevel[GetWarpId()] = expr; // because unsigned GetWarpId() { return GetThreadIdInBlock() / WARPSIZE; } // so GetWarpId will index the same element for every thread in the warp // and, because expr is lane-invariant in this case, every lane stores the // same value to this unique address ``` Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D105699	2021-07-13 12:06:57 +01:00
Johannes Doerfert	a7b7b5dfe5	[OpenMP] Create and use `__kmpc_is_generic_main_thread` In order to fold calls based on high-level knowledge and control flow tracking it helps to expose the information as a runtime call. The logic: `!SPMD && getTID() == getMasterTID()` was used in various places and is now encapsulated in `__kmpc_is_generic_main_thread`. As part of this rewrite we replaced eager computation of arguments with on-demand computation, especially helpful if the calls can be folded and arguments don't need to be computed consequently. Differential Revision: https://reviews.llvm.org/D105768	2021-07-11 19:18:03 -05:00
Johannes Doerfert	1ab1f04a2b	[OpenMP] Simplify variable sharing and increase shared memory size In order to avoid malloc/free, up to NUM_SHARED_VARIABLES_IN_SHARED_MEM (=64) variables are communicated in dedicated shared memory instead. The simplification does avoid the need for an "init" and requires "deinit" only if we ever communicate more than NUM_SHARED_VARIABLES_IN_SHARED_MEM variables. Differential Revision: https://reviews.llvm.org/D105767	2021-07-11 19:18:03 -05:00
Johannes Doerfert	0a223827de	[OpenMP] Remove checkXXXX device runtime functions We had multiple functions to determine the execution mode (SPMD/Generic) and runtime status (initialized/uninitialized) but that just increased complexity without a real benefit. Especially with D102307 in mind it is helpful to reduce the dependence on the `ident_t` flags. Differential Revision: https://reviews.llvm.org/D105586	2021-07-10 18:20:40 -05:00
Johannes Doerfert	e2cfbfcc0c	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 17:53:56 -05:00
Nico Weber	d3e7491333	Revert Attributor patch series Broke check-clang, see https://reviews.llvm.org/D102307#2869065 Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`	2021-07-10 16:15:55 -04:00
Johannes Doerfert	e603ca0306	[OpenMP] Remove checkXXXX device runtime functions We had multiple functions to determine the execution mode (SPMD/Generic) and runtime status (initialized/uninitialized) but that just increased complexity without a real benefit. Especially with D102307 in mind it is helpful to reduce the dependence on the `ident_t` flags. Differential Revision: https://reviews.llvm.org/D105586	2021-07-10 12:32:51 -05:00
Johannes Doerfert	1d5711c3ee	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 12:32:50 -05:00
Joel E. Denny	d99f65de2a	[OpenMP] Avoid checking parent reference count in targetDataBegin This patch is an attempt to do for `targetDataBegin` what D104924 does for `targetDataEnd`: * Eliminates a lock/unlock of the data mapping table. * Clarifies the logic that determines whether a struct member's host-to-device transfer occurs. The old logic, which checks the parent struct's reference count, is a leftover from back when we had a different map interface (as pointed out at <https://reviews.llvm.org/D104924#2846972>). Additionally, it eliminates the `DeviceTy::getMapEntryRefCnt`, which is no longer used after this patch. While D104924 does not change the computation of `IsLast`, I found I needed to change the computation of `IsNew` for this patch. As far as I can tell, the change is correct, and this patch does not cause any additional `openmp` tests to fail. However, I'm not sure I've thought of all use cases. Please advise. Reviewed By: jdoerfert, jhuber6, protze.joachim, tianshilei1992, grokos, RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D105121	2021-07-10 12:15:04 -04:00
Joel E. Denny	1d0456361a	[OpenMP] Avoid checking parent reference count in targetDataEnd The patch has the following benefits: * Eliminates a lock/unlock of the data mapping table. * Clarifies the logic that determines whether a struct member's device-to-host transfer occurs. The old logic, which checks the parent struct's reference count, is a leftover from back when we had a different map interface (as pointed out at <https://reviews.llvm.org/D104924#2846972>). Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104924	2021-07-10 12:15:04 -04:00
Alexey Bataev	ab8989ab87	[OPENMP]Fix overlapped mapping for dereferenced pointer members. If the base is used in a map clause and later we have a memberexpr with this base, and the member is a pointer, and this pointer is dereferenced anyhow (subscript, array section, dereference, etc.), such components should be considered as overlapped, otherwise it may lead to incorrect size computations, since we try to map a pointee as a part of the whole struct, which is not true for the pointer members. Differential Revision: https://reviews.llvm.org/D105562	2021-07-09 12:51:26 -07:00
Jon Chesterfield	ddfb074a80	[libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global [libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global Folds some duplicates logic into a helper function, passes the new environment struct into getLaunchVals which no longer reads the DeviceInfo global. Implemented on top of D105237 Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D105239	2021-07-06 17:06:38 +01:00
Atmn Patel	21e92612c0	[Libomptarget] Experimental Remote Plugin Fixes D97883 introduced a compile-time error in the experimental remote offloading libomptarget plugin, this patch fixes it and resolves a number of inconsistencies in the plugin as well: 1. Non-functional Asynchronous API 2. Unnecessarily verbose debug printing 3. Misc. code clean ups This is not intended to make any functional changes to the plugin. Differential Revision: https://reviews.llvm.org/D105325	2021-07-02 12:38:34 -04:00
Shilei Tian	369216ab31	[OpenMP][Offloading] Refined return value of `DeviceTy::getOrAllocTgtPtr` `DeviceTy::getOrAllocTgtPtr` just returns a target pointer. In addition, two bool values (`IsNew` and `IsHostPtr`) are passed by reference to make the change in the function available in callee. In this patch, a struct, which contains the target pointer, two flags, and an iterator to the map table entry corresponding to the queried host pointer, will be returned. In addition to make the logic clearer regarding the two bool values, this paves the way for the next patch to fix the data race in `bug49334.cpp` by attaching an event to the map table entry (and that's why we need the iterator). Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104382	2021-07-01 12:32:03 -04:00
Jon Chesterfield	db89414da4	[libomptarget][nfc] Move grid size computation Change getLaunchVals to return the integers used for launch Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D105237	2021-07-01 12:53:04 +01:00
Dhruva Chakrabarti	98c36f0079	Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size" This reverts commit `2240b41ee4`. A value of 0 for KernDescVal WG_Size implies it is unknown, so it should be set to the default. The above change was made without this assumption. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105250	2021-06-30 17:15:00 -07:00
Jon Chesterfield	4b0926b044	[libomptarget][nfc] Replace out arguments with struct return A step towards making this function adequately self contained that it can be tested easily. No functional change intended here, left variable names unchanged. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D105229	2021-06-30 22:40:07 +01:00
Jon Chesterfield	d86b0073cf	[libomptarget][amdgpu][nfc] Fix build warnings, drop some headers Removes stdarg header, drops uses of iostream, fix some format string errors. Also changes a C style struct to C++ style to avoid a warning from clang/ Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D104923	2021-06-30 22:23:36 +01:00
Shilei Tian	24a36ce58b	[OpenMP][Offloading] Replace all calls to `isSPMDMode` with `__kmpc_is_spmd_exec_mode` In our ongoing work, we are using `AbstractAttributor` to deduct execution model of device functions, and potententially remove unnecessary function calls to `__kmpc_is_spmd_exec_mode`. In current device runtime, we have mixed use of `isSPMDMode` and `__kmpc_is_spmd_exec_mode`, but in fact in `__kmpc_is_spmd_exec_mode` it simply calls `isSPMDMode`. Since all functions starting with `__kmpc` is C function, which doesn't have things like name mangling. It is more optimization friendly. In this patch, we simply replaced all calls to `isSPMDMode` with `__kmpc_is_spmd_exec_mode` to pave the way for the optimization. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105211	2021-06-30 15:39:57 -04:00
Dhruva Chakrabarti	e0b713a035	[libomptarget] [amdgpu] Change default number of teams per computation unit This patch is related to https://reviews.llvm.org/D98832. Based on discussions there, I decided to separate out the teams default as this patch. This change is to increase the number of teams per computation unit so as to provide more wavefronts for hiding latency. This change improves performance for some programs, including 20-50% for some Stream benchmarks. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D99003	2021-06-29 15:34:35 -07:00
Dhruva Chakrabarti	2240b41ee4	[libomptarget] [amdgpu] Fix default setting of max flat workgroup size When max flat workgroup size is not specified, it is set to the default workgroup size. This prevents kernel launch with a workgroup size larger than the default. The fix is to ignore a size of 0 and treat it as unspecified. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105073	2021-06-29 13:47:24 -07:00
Pushpinder Singh	20df2c7052	[AMDGPU][Libomptarget] Collect allocatable memory pools using HSA The logic is almost similar to that of system.cpp with one change that instead of adding all the memory pools to a device struct it only keeps a single pool. The existing approach also always allocated memory on the first HSA pool found for a GPU. This depends on D104691. The goal of this series of patches is to remove _atl_machine global. The next patch will drop g_atl_machine entirely. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104695	2021-06-28 11:28:04 +00:00
Jon Chesterfield	f66b8fdc0a	[libomptarget][amdgpu] Build openmp for two more targets [libomptarget][amdgpu] Build openmp for two more targets The 4800U APU is a gfx902 and the MI100 accelerator is a gfx908. Both numbers are listed in ROCT topology.c Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D104922	2021-06-25 19:02:03 +01:00
Jon Chesterfield	96f6873dff	[OpenMP][NFC] Drop unused headers from amdgpu plugin	2021-06-25 12:08:56 +01:00
Aakanksha Patil	3453f3dd46	[AMDGPU] Add gfx1035 target Differential Revision: https://reviews.llvm.org/D104804	2021-06-24 14:32:41 -04:00
Joel E. Denny	9fa5e3280d	[OpenMP] Fix delete map type in ref count debug messages For example, without this patch: ``` $ cat test.c int main() { int x; #pragma omp target enter data map(alloc: x) #pragma omp target enter data map(alloc: x) #pragma omp target enter data map(alloc: x) #pragma omp target exit data map(delete: x) ; return 0; } $ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c $ LIBOMPTARGET_DEBUG=1 ./a.out \|& grep 'Creating\\|Mapping exists\\|last' Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=1, Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (incremented), Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=3 (incremented), Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (decremented) Libomptarget --> There are 4 bytes allocated at target address 0x00000000013bb040 - is not last ``` `RefCount` is reported as decremented to 2, but it ought to be reset because of the `delete` map type, and `is not last` is incorrect. This patch migrates the reset of reference counts from `DeviceTy::deallocTgtPtr` to `DeviceTy::getTgtPtrBegin`, which then correctly reports the reset. Based on the `IsLast` result from `DeviceTy::getTgtPtrBegin`, `targetDataEnd` then correctly reports `is last` for any deletion. `DeviceTy::deallocTgtPtr` is responsible only for the final reference count decrement and mapping removal. An obscure side effect of this patch is that a `delete` map type when the reference count is infinite yields `DelEntry=IsLast=false` in `targetDataEnd` and so no longer results in a `DeviceTy::deallocTgtPtr` call. Without this patch, that call is a no-op anyway besides some unnecessary locking and mapping table lookups. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104560	2021-06-23 09:57:19 -04:00
Joel E. Denny	48421ac441	[OpenMP] Improve ref count debug messages For example, without this patch: ``` $ cat test.c int main() { int x; #pragma omp target enter data map(alloc: x) #pragma omp target exit data map(release: x) ; return 0; } $ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c $ LIBOMPTARGET_DEBUG=1 ./a.out \|& grep 'Creating\\|Mapping exists' Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, updated RefCount=1 ``` There are two problems in this example: * `RefCount` is not reported when a mapping is created, but it might be 1 or infinite. In this case, because it's created by `omp target enter data`, it's 1. Seeing that would make later `RefCount` messages easier to understand. * `RefCount` is still 1 at the `omp target exit data`, but it's reported as `updated`. The reason it's still 1 is that, upon deletions, the reference count is generally not updated in `DeviceTy::getTgtPtrBegin`, where the report is produced. Instead, it's zeroed later in `DeviceTy::deallocTgtPtr`, where it's actually removed from the mapping table. This patch makes the following changes: * Report the reference count when creating a mapping. * Where an existing mapping is reported, always report a reference count action: * `update suppressed` when `UpdateRefCount=false` * `incremented` * `decremented` * `deferred final decrement`, which replaces the misleading `updated` in the above example * Add comments to `DeviceTy::getTgtPtrBegin` to explain why it does not zero the reference count. (Please advise if these comments miss the point.) * For unified shared memory, don't report confusing messages like `RefCount=` or `RefCount= updated` given that reference counts are irrelevant in this case. Instead, just report `for unified shared memory`. * Use `INFO` not `DP` consistently for `Mapping exists` messages. * Fix device table dumps to print `INF` instead of `-1` for an infinite reference count. Reviewed By: jhuber6, grokos Differential Revision: https://reviews.llvm.org/D104559	2021-06-23 09:57:19 -04:00
Joseph Huber	422adaa879	[OpenMP] Add thread limit environment variable support to plugins The OpenMP 5.1 standard defines the environment variable `OMP_TEAMS_THREAD_LIMIT` to limit the number of threads that will be run in a single block. This patch adds support for this into the AMDGPU and CUDA plugins. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103923	2021-06-22 16:25:40 -04:00
Shilei Tian	0029059074	[NFC][OpenMP][Offloading] Unified the construction of mapping table entry This patch unifies construction of mapping table entry to use `emplace`. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104580	2021-06-22 12:38:47 -04:00
Joseph Huber	244e98ff48	[Libomptarget] Improve device runtime implementation for globalized variables. Currently the runtime implementation of `__kmpc_alloc_shared` is extremely slow because it allocated memory for each thread individually. This patch adds a small buffer for the threads to share data and will greatly improve performance for builds where all globalization could not be optimized out. If the shared buffer is full, then memory will not only be allocated per-warp rather than per-thread. Depends on D97680 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104666	2021-06-22 11:52:49 -04:00
Joseph Huber	952a0f2385	[Libomptarget] Introduce new globalization runtime calls Summary: This patch introduces the new globalization runtime to be used by D97680. These runtime calls will replace the __kmpc_data_sharing_push_stack and __kmpc_data_sharing_pop_stack functions. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102532	2021-06-22 10:05:42 -04:00
Pushpinder Singh	9d110f9159	[AMDGPU][Libomptarget] Move allow_access_to_all_gpu_agents to rtl.cpp Moving this method helps eliminate a use of g_atl_machine. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104691	2021-06-22 11:44:52 +00:00
Vyacheslav Zakharin	aad9e48c5f	[NFC][libomptarget] Remove redundant libelf dependency for elf_common. Differential Revision: https://reviews.llvm.org/D104549	2021-06-21 07:19:55 -07:00
Pushpinder Singh	7a97cd9da7	[AMDGPU][Libomptarget] Remove redundant functions There does not seem to be any use of these functions. They just put the value to a local which is never used again. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104512	2021-06-21 06:13:24 +00:00
Shilei Tian	ec97866454	[OpenMP] Make bug49334.cpp more reproducible `bug49334.cpp` cannot detect data race in `libomptarget` efficiently. It is reported that with `N = 256` and `BS = 16`, the data race can be reproduced more steadily. The next coming pathces will fix it so this patch is expected to fail now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104552	2021-06-18 18:35:41 -04:00
Vyacheslav Zakharin	836992ab9a	[NFC][libomptarget] Build elf_common with PIC. Differential Revision: https://reviews.llvm.org/D104545	2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin	c5b7c7c8f7	[NFC][libomptarget] Fixed -DLLVM_ENABLE_RUNTIMES="openmp" build. Differential Revision: https://reviews.llvm.org/D104535	2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin	b5c4fc0f23	[NFC][libomptarget] Reduce the dependency on libelf This change-set removes libelf usage from elf_common part of the plugins. libelf is still used in x86_64 generic plugin code and in some plugins (e.g. amdgpu) - these will have to be cleaned up in separate checkins. Differential Revision: https://reviews.llvm.org/D103545	2021-06-16 08:34:23 -07:00
Pushpinder Singh	cadcaf3f46	[AMDGPU][Libomptarget] Drop dead code related to g_atl_machine This patch includes some changes which deletes the code accessing g_atl_machine global. Some accesses related to memory_pools are still remaining. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103813	2021-06-15 05:21:35 +00:00
Ron Lieberman	91f147792e	[libomptarget][amdgpu] Remove stray fprintf in rtl.cpp remove unintended fprintf in rtl.cpp Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D104003	2021-06-10 01:57:30 +00:00
Brendon Cahoon	294efbbd3e	Reland "[AMDGPU] Add gfx1013 target" This reverts commit `211e584fa2`. Fixed a use-after-free error that caused the sanitizers to fail.	2021-06-08 21:15:35 -04:00
Joseph Huber	df965513a9	[OpenMP] Add an information flag for device data transfers This patch adds an information flag that indicated when data is being copied to and from the device. This will be helpful for finding redundant or unnecessary data transfers in applications. Reviewed By: jdoerfert, grokos Differential Revision: https://reviews.llvm.org/D103927	2021-06-08 20:23:27 -04:00
Brendon Cahoon	211e584fa2	Revert "[AMDGPU] Add gfx1013 target" This reverts commit `ea10a86984`. A sanitizer buildbot reports an error.	2021-06-08 16:29:41 -04:00
Brendon Cahoon	ea10a86984	[AMDGPU] Add gfx1013 target Differential Revision: https://reviews.llvm.org/D103663	2021-06-08 12:49:49 -04:00
Pushpinder Singh	4f8bc7caf4	[AMDGPU][Libomptarget] Remove atlc global This global struct used to hold various flags for monitoring the initialization of hsa. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103795	2021-06-07 11:09:01 +00:00
Pushpinder Singh	f5f329a371	[AMDGPU][Libomptarget] Rework logic for locating kernarg pools Previous logic was to always use the first kernarg pool found to allocate kernel args. This patch changes this to use only the kernarg pool which has non-zero size. This logic is also reworked to not use any globals. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103600	2021-06-07 06:41:37 +00:00
Pushpinder Singh	b25546a4b4	[AMDGPU][Libomptarget][NFC] Remove bunch of dead structs Dropped structs are atmi_machine_t, atmi_device_t and atmi_memory_t Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103509	2021-06-02 10:40:51 +00:00
Pushpinder Singh	2368170a8d	[AMDGPU][Libomptarget][NFC] Remove atmi_place_t atmi_place_t has been replaced with int DeviceId. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103508	2021-06-02 10:35:28 +00:00
Pushpinder Singh	fb113264a8	[AMDGPU][Libomptarget] Remove g_atmi_machine global Turns out the only purpose of this class was verify if device ID was in range or not which could be done easily by using g_atl_machine. Still getting rid of g_atl_machine is pending which would be done in a later patch. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103443	2021-06-01 12:34:24 +00:00
Pushpinder Singh	4fc3286951	[AMDGPU][Libomptarget][NFC] Split host and device malloc This patch splits the code path for host and device malloc. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103389	2021-05-31 12:09:18 +00:00
Pushpinder Singh	8b79dfb302	[AMDGPU][Libomptarget][NFC] Remove atmi_mem_place_t This struct was used to specify the device on which memory was being allocated/free in atmi_malloc/free. It has now been replaced with int DeviceId. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103239	2021-05-27 11:53:18 +00:00
Jon Chesterfield	2fdf8bbd19	[libomptarget][nfc][amdgpu] Factor out setting upper bounds Refactor suggested in D103037 to help avoid similar copy-paste errors. Change is mechanical. Some parts of this would be more robust with unsigned. Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D103090	2021-05-26 19:57:49 +01:00
Jon Chesterfield	c5c1ec7945	[libomptarget][nfc][amdgpu] Refactor uses of KernelInfoTable Suggested in D103059. Use a single lookup instead of two, more const, less mutation. Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D103093	2021-05-26 19:25:25 +01:00
Jon Chesterfield	07f59baad6	[libomptarget][nfc][amdgpu] Remove atmi_status_t type ATMI_STATUS_UNKNOWN was unused, deleted references to it. Replaced ATMI_STATUS_{SUCCESS,ERROR} with HSA_STATUS_{SUCCESS,ERROR} Replaced atmi_status_t with hsa_status_t Otherwise no change. In particular, conversions between atmi_status_t and hsa_status_t will now be conversions between hsa_status_t and itself. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D103115	2021-05-26 17:02:19 +01:00
Pushpinder Singh	a2d6ef5876	[AMDGPU][Libomptarget] Inline atmi_init/atmi_finalize After D102847, these functions can be inlined. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103075	2021-05-26 10:50:08 +00:00
Pushpinder Singh	cc8661ac4a	[AMDGPU][Libomptarget] Delete g_atmi_initialized This patch drops g_atmi_initialized and inlines the Initialize & Finalize methods from Runtime class. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102847	2021-05-26 10:46:54 +00:00
Pushpinder Singh	7648b6978e	[AMDGPU][Libomptarget] Move Kernel/Symbol info tables to RTLDeviceInfoTy Two globals KernelInfoTable & SymbolInfoTable are moved into RTLDeviceInfoTy class. This builds on the top of D102691. [2/2] Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102692	2021-05-26 10:02:28 +00:00
Jon Chesterfield	df005fa364	[libomptarget][nfc] Move hostcall required test to rtl [libomptarget][nfc] Move hostcall required test to rtl Remove a global, fix minor race. First of N patches to bring up hostcall. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103058	2021-05-25 22:43:17 +01:00
Pushpinder Singh	b0d68c7141	[AMDGPU][Libomptarget] Mark lambda_by_value test as XFAIL Reason: Missing printf definition Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103078	2021-05-25 12:16:54 +00:00
Jon Chesterfield	75492e20fb	[libomptarget][nfc] Accept callable for hsa iterate_symbols [libomptarget][nfc] Accept callable for hsa iterate_symbols Candidate refactor to simplify D102692 Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D103030	2021-05-25 09:29:11 +01:00
Dhruva Chakrabarti	96d70f4d28	[libomptarget] [amdgpu] Added LDS usage to the kernel trace Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103059	2021-05-24 19:33:48 -07:00
Dhruva Chakrabarti	ca17b26d4d	[libomptarget] [amdgpu] Fix copy-paste error setting NumThreads for a corner case. Fix the case where NumTeams was set incorrectly instead of NumThreads Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103037	2021-05-24 15:23:15 -07:00
Pushpinder Singh	486110eb41	[AMDGPU][Libomptarget] Remove global KernelNameMap KernelNameMap contains entries like "key.kd" => key which clearly could be replaced by simple logic of removing suffix from the key. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102691	2021-05-24 08:46:08 +00:00
George Rokos	d0bc04d6b9	[libomptarget] Fix a bug whereby firstprivates are not copied over to the device The check for the TO flag when processing firstprivates is missing. As a result, sometimes the device copy of a firstprivate never gets initialized. Currectly we try to force lambda structs to be allocated immediately by marking them as a non-firstprivate, so that PrivateArgumentManagerTy::addArg allocates memory for them immediately. However, calling addArg with IsFirstPrivate=false makes the function skip initializing the device copy. Whether an argument is firstprivate and whether we need to allocate memory immediately are not synonyms, so this patch introduces one more control variable for immediate allocation and sets it apart from initialization. Differential Revision: https://reviews.llvm.org/D102890	2021-05-21 10:52:08 -07:00
Jon Chesterfield	d54712ab4d	[libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation [libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation There are a lot of different ways we might implement the devicertl local alloc and free functions. Via host, local buffers (stack or arena), specialising per kernel etc. It is not yet clear what the right design is. This change makes the alloc and free functions weak, so one can override them from local tests while comparing options. Not strictly necessary, as a comparable patch can be applied locally each time, but would be convenient for out of tree dev. Plan would be to drop the weak attribute at the same time as introducing a working allocator to trunk. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102499	2021-05-21 16:09:22 +01:00
Jon Chesterfield	68b88ae670	[libomptarget] Improve dlwrap compile time error diagnostic [libomptarget] Improve dlwrap compile time error diagnostic The dlwrap interface takes an explict arity, e.g. DLWRAP(cuAlloc, 2); This probably can't be eliminated as it controls the argument list of an external symbol, not an inline header function. If the arity given is too big, the error from clang referring to the line is in the middle of implementation details. /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1277:7: error: static_assert failed due to requirement '0UL < tuple_size<std::tuple<>>::value' "tuple index is in range" static_assert(__i < tuple_size<tuple<>>::value, ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1260:7: ... /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1260:7: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:93:27 ... /home/amd/llvm-project/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp:34:1: note: in instantiation of template class 'dlwrap::trait<cudaError_enum ()(unsigned long , unsigned long)>::arg<2>' requested here DLWRAP(cuMemAlloc, 3); ^ /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:51:31: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:166:3: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:133:3: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:186:37: ... If the arity is too small, the diagnostic is better: cuda/dynamic_cuda/cuda.cpp:34:1: error: too few arguments to function call, expected 2, have 1 DLWRAP(cuMemAlloc, 1); This patch changes the diagnostic to: cuda/dynamic_cuda/cuda.cpp:34:1: error: static_assert failed due to requirement '1 == trait<cudaError_enum ()(unsigned long , unsigned long)>::nargs' "Arity Error" DLWRAP(cuMemAlloc, 1); or cuda/dynamic_cuda/cuda.cpp:34:1: error: static_assert failed due to requirement '3 == trait<cudaError_enum ()(unsigned long , unsigned long)>::nargs' "Arity Error" DLWRAP(cuMemAlloc, 3); Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102858	2021-05-20 20:33:36 +01:00
Jon Chesterfield	d18fb09c69	[libomptarget][amdgpu] Remove majority of fatal errors [libomptarget][amdgpu] Remove majority of fatal errors Replaces most calls to exit() with returning an error to the library entry point. Minor changes to error handling for clear bugs, remove some dead code. Each exit() call site replaced is either in a library entry point or a function that already returns error codes on some paths. The existing handling is not well tested but replacing exit() with a fallback path should be a strict improvement. Remaining two early exit points are an abort() from a callback and exit() from within msgpack. Fixes for those are less obvious and left for a later patch. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102346	2021-05-20 16:26:43 +01:00
Jon Chesterfield	ea68ad6e26	[libomptarget] Disable test bug49334 on amdgpu [libomptarget] Disable test bug49334 on amdgpu Hangs on amdgpu, do not know why. Disable to unblock build. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D102017	2021-05-20 15:46:56 +01:00
Pushpinder Singh	d7503c3bce	[AMDGPU][Libomptarget] Rename & move g_executables to private This patch moves g_executables to private member of Runtime class and is renamed to HSAExecutables following LLVM naming convention. This movement required making Runtime::Initialize and Runtime::Finalize non-static. Verified the correctness of this change by running libomptarget tests on gfx906. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102600	2021-05-18 05:43:23 +00:00
Pushpinder Singh	3bc2b97b34	[AMDGPU][libomptarget] Remove unused global variables This initial patch removes some unused variables from global namespace. There will more incoming patches for moving global variables to classes or static members. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102598	2021-05-18 05:40:49 +00:00
Aakanksha Patil	464e4dc50f	[AMDGPU] Add gfx1034 target Differential Revision: https://reviews.llvm.org/D102306	2021-05-13 14:25:18 -04:00
Jon Chesterfield	10de217209	[libomptarget][amdgpu] Fix truncation error for partial wavefront [libomptarget][amdgpu] Fix truncation error for partial wavefront The partial barrier implementation involves one wavefront resetting and N-1 waiting. This change future proofs against launching with a number of threads that is not a multiple of the wavefront size. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102407	2021-05-13 17:31:57 +01:00
Jon Chesterfield	b049870d3b	[libomptarget][amdgpu] Convert an assert to print and offload_fail [libomptarget][amdgpu] Convert an assert to print and offload_fail The kernel launched is supposed to be present in the binary, but a not yet diagnosed bug means it is missing for some of the qmcpack test cases. Changing from assert to print and offload_fail should help diagnose that and similar bugs. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102378	2021-05-13 17:31:36 +01:00
Michael Kruse	34ed3e6337	[OpenMP] Test unified shared memory tests only on systems that support it. Add a `REQUIRES: unified_shared_memory` option to tests that use `#pragma omp requires unified_shared_memory`. For CUDA, the feature tag is derived from LIBOMPTARGET_DEP_CUDA_ARCH which itself is derived using [[ https://cmake.org/cmake/help/latest/module/FindCUDA.html#commands \| cuda_select_nvcc_arch_flags ]]. The latter determines which compute capability the GPU in the system supports. To ensure that this is the CUDA arch being used, we could also set the `-Xopenmp-target -march=` flag. In the absence of an NVIDIA GPU, LIBOMPTARGET_DEP_CUDA_ARCH will be 35. That is, in that case we are assuming unified_shared_memory is not available. CUDA plugin testing could be disabled entirely in this case, but this currently depends on `LIBOMPTARGET_CAN_LINK_LIBCUDA OR LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA`, not on whether the hardware is actually available. For all other targets, nothing changes and we are assuming unified shared memory is available. This might need refinement if not the case. This tries to fix the [[ http://meinersbur.de:8011/#/builders/143 \| OpenMP Offloading Buildbot ]] that, although brand-new, only has a Pascal-generation (sm_61) GPU installed. Hence, tests that require unified shared memory are currently failing. I wish I had known in advance. Reviewed By: protze.joachim, tianshilei1992 Differential Revision: https://reviews.llvm.org/D101498	2021-05-13 11:08:04 -05:00
Jon Chesterfield	9934571eab	[libomptarget][amdgpu][nfc] Expand errorcheck macros [libomptarget][amdgpu][nfc] Expand errorcheck macros These macros expand to continue, which is confusing, or exit, which is incompatible with continuing execution on offloading fail. Expanding the macros in place makes the code look untidy but the control flow obvious and amenable to improving. In particular, exit becomes easier to eliminate. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102230	2021-05-12 17:30:41 +01:00
Jon Chesterfield	72995a4bdf	[libomptarget][nfc] Add hook to easily disable building amdgcn bclib [libomptarget][nfc] Add hook to easily disable building amdgcn bclib This is useful when building LLVM with a toolchain that can't emit code for amdgcn, e.g. because it overrides the include search path with headers from another architecture, or the clang compiler is missing builtins. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102229	2021-05-11 17:23:09 +01:00
Jon Chesterfield	dedca78d48	[libomptarget][nfc] Drop stringify in macro [libomptarget][nfc] Drop stringify in macro A step towards deleting the macros entirely. Differential Revision: https://reviews.llvm.org/D102228	2021-05-11 12:19:55 +01:00
Jon Chesterfield	6da348569c	[libomptarget] Add support for target allocators to dynamic cuda RTL [libomptarget] Add support for target allocators to dynamic cuda RTL Follow on to D102000 which introduced new calls into libcuda. This patch adds the corresponding entry points to dynamic_cuda, fixing the build for systems that do not have the cuda toolkit installed. Function types and enum from https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102169	2021-05-10 15:27:50 +01:00
Pushpinder Singh	9586937ef5	[AMDGPU][OpenMP] Disable tests when amdgpu-arch fails This patch prevents runtime tests running on systems without amdgpu. Reviewed By: protze.joachim, tianshilei1992 Differential Revision: https://reviews.llvm.org/D102054	2021-05-10 07:37:27 +00:00
Vyacheslav Zakharin	f2f88f3e7a	An attempt to abandon omptarget out-of-tree builds. I want to start using LLVM component libraries in libomptarget to stop duplicating implementations already available in LLVM (e.g. LLVMObject, LLVMSupport, etc.). Without relying on LLVM in all libomptarget builds one has to provide fallback implementation for each used LLVM feature. This is an attempt to stop supporting out-of-llvm-tree builds of libomptarget. I understand that I may need to revert this, if this affects downstream projects in a bad way. Differential Revision: https://reviews.llvm.org/D101509	2021-05-07 12:43:50 -07:00
Joseph Huber	a15f8589f4	[libomptarget] Add support for target memory allocators to cuda RTL Summary: The allocator interface added in D97883 allows the RTL to allocate shared and host-pinned memory from the cuda plugin. This patch adds support for these to the runtime. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D102000	2021-05-07 10:27:02 -04:00
Jon Chesterfield	44ee974e2f	[libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one [libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one D101976 would require a second barrier instance. This NFC to amdgpu makes it simpler to add one (an extra global, one more line in init). Also renames the current barrier to L0. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102016	2021-05-06 23:52:19 +01:00
Jon Chesterfield	7e9351b9de	[libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin [libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin Drops an enum that was identical to a HSA one, localises some functions where they were only called from one TU. Covers everything internalize + adce can identify as dead, except for msgpack::dump which is useful when debugging. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102014	2021-05-06 23:16:32 +01:00
Pushpinder Singh	ae845d6426	[AMDGPU][OpenMP] Enable Libomptarget runtime tests This enables the runtime tests on amdgpu targets. 10 tests have been marked as XFAIL on amdgcn currently mostly due to missing printf. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D99656	2021-05-03 05:56:42 +00:00
Michael Kruse	7308862ff5	[OpenMP][CMake] Use in-project clang as CUDA->IR compiler. If available, use the clang that is already built in the same project as CUDA compiler unless another executable is explicitly defined. This also ensures the generated deviceRTL IR will be consistent with the version of Clang. This patch is required to reliably test OpenMP offloading in a buildbot without either a two-stage build (e.g. with LLVM_ENABLE_RUNTIMES) or a separately installed clang on the worker that will eventually become outdated. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101265	2021-04-30 12:45:52 -05:00
Michael Kruse	3244a8b536	[OpenMP][CMake] Pass --cuda-path to regression tests. The OpenMP runtime can be compiled using a CUDA installed at non-default location with the -DCUDA_TOOLKIT_ROOT_DIR setting. However, check-openmp will fail afterwards because Clang needs to know where to find the CUDA headers. Fix by passing -cuda-path to Clang using the value of CUDA_TOOLKIT_ROOT_DIR which has been determined by CMake. Also set LD_LIBRARY_PATH such that it can find the cuda runtime when executing. This will ensure that the regression test do not depend on the current environment, but use the environment it was configured for. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101266	2021-04-27 16:27:40 -05:00
Joachim Protze	24f836e8fd	[OpenMP][libomptarget] Separate lit tests for different offloading targets (2/2) This patch fuses the RUN lines for most libomptarget tests. The previous patch D101315 created separate test targets for each supported offloading triple. This patch updates the RUN lines in libomptarget tests to use a generic run line independent of the offloading target selected for the lit instance. In cases, where no RUN line was defined for a specific offloading target, the corresponding target is declared as XFAIL. If it turns out that a test actually supports the target, the XFAIL line can be removed. Differential Revision: https://reviews.llvm.org/D101326	2021-04-27 15:54:32 +02:00
Joachim Protze	b845217b1d	[OpenMP][libomptarget] Separate lit tests for different offloading targets (1/2) This patch creates a separate test directory for each offloading target to be tested. This allows to test multiple architectures in one configuration, while still see all failing tests separately. The lit test names include the target triple, so that it will be easier to spot the failing target. This patch also allows to mark expected failing tests based on the target-triple, as the currently used triple is added to the lit "features": ``` // XFAIL: nvptx64-nvidia-cuda ``` Differential Revision: https://reviews.llvm.org/D101315	2021-04-27 12:30:01 +02:00
Jon Chesterfield	58f125493d	[libomptarget] Enable AMDGPU devicertl [libomptarget] Enable AMDGPU devicertl The amdgpu devicertl is written in freestanding openmp and compiles to a bitcode library (per listed gfx arch) with no unresolved symbols. It requires a recent clang, preferably the one from the same monorepo checkout. This is D98658, with printf explicitly stubbed out, after patching clang to no longer require an llvm with the amdgpu target enabled. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101213	2021-04-24 02:24:44 +01:00
Johannes Doerfert	17330a3cb1	[OpenMP] Avoid reading uninitialized parallel level values In a last minute change request for `a2dbfb6b72` we introduced a read of the uninitialized parallel level value in SPMD-mode. We go back to initializing the array early and checking for an adjusted level. Found by the miniqmc unit tests: https://cdash.qmcpack.org/CDash/viewTest.php?onlyfailed&buildid=203434 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101123	2021-04-23 11:21:58 -05:00
Joseph Huber	59b6849012	[OpenMP] Replace global InfoLevel with a reference to an internal one. Summary: This patch improves the implementation of D100774 by replacing the global variable introduced with a function that returns a reference to an internal one. This removes the need to define the variable in every plugin that uses it. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101102	2021-04-23 09:43:46 -04:00
Joseph Huber	2b6f20082e	[OpenMP] Add function for setting LIBOMPTARGET_INFO at runtime Summary: This patch adds a new runtime function __tgt_set_info_flag that allows the user to set the information level at runtime without using the environment variable. Using this will require an extern function, but will eventually be added into an auxilliary library for OpenMP support functions. This patch required moving the current InfoLevel to a global variable which must be instantiated by each plugin. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100774	2021-04-22 12:48:11 -04:00
Alexey Bataev	ca70512099	[OPENMP]Mark test as unsupported to avoid possible unexpected passes, NFC.	2021-04-22 08:06:25 -07:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
Alexey Bataev	079884225a	[OPENMP]Fix PR49698: OpenMP declare mapper causes segmentation fault. The implicitly generated mappings for allocation/deallocation in mappers runtime should be mapped as implicit, also no need to clear member_of flag to avoid ref counter increment. Also, the ref counter should not be incremented for the very first element that comes from the mapper function. Differential Revision: https://reviews.llvm.org/D100673	2021-04-21 10:38:31 -07:00
Hansang Bae	9b98497b44	[OpenMP] Add omp_target_is_accessible() to header files -- Added omp_target_is_accessible to the header files -- Added missing const qualifier to device memory routines Differential Revision: https://reviews.llvm.org/D100420	2021-04-16 07:54:15 -05:00
Joseph Huber	83d4b2e2e0	[OpenMP] Add info for device table changes Summary: This patch adds a feature to print information whenever the host-device pointer mapping table is changed by inserting or removing an entry. This introduces a new bit field for LIBOMPTARGET_INFO at position 0x8. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100600	2021-04-15 18:39:48 -04:00
Hansang Bae	3da61ddae7	[OpenMP] Define omp_is_initial_device() variants in omp.h omp_is_initial_device() is marked as a built-in function in the current compiler, and user code guarded by this call may be optimized away, resulting in undesired behavior in some cases. This patch provides a possible fix for such cases by defining the routine as a variant function and removing it from builtin list. Differential Revision: https://reviews.llvm.org/D99447	2021-04-06 16:58:01 -05:00
Joseph Huber	0af4e74aef	[OpenMP][NFC] Fix typo in libomptarget error message Summary: There was a typo suggesting the user to use `LIBOMPTARGET_DEBUG` instead of `LIBOMPTARGET_INFO`	2021-04-01 12:45:28 -04:00
Joseph Huber	29338459fb	[OpenMP] Trim error messages in CUDA plugin Summary: Remove some of the error messages printed when the CUDA plugin fails. The current error messages can be confusing because they are the first error messages printed after the async stream finds an error. This means that the printed values aren't related to what caused the issue, but are simply the last asyncronous operation that succeeded on the device. Remove these as they can be misleading. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D99510	2021-03-29 12:20:19 -04:00
Alexey Bataev	0411b23319	[OPENMP]Map data field with l-value reference types. Added initial support dfor the mapping of the data members with l-value reference types. Differential Revision: https://reviews.llvm.org/D98812	2021-03-29 07:07:09 -07:00
Joseph Huber	16064e71e9	[OpenMP] Reset async stream properly upon failure Summary: If the call to `synchronize` fails, it will currently block the stream indefinitely if execution is continued from this point. Additionally, if the program exits it will trigger an assertion on the non-null value of the async queue and prevent the runtime from printing debugging information. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D99443	2021-03-26 19:05:06 -04:00
Jon Chesterfield	626a31de15	[libomptarget] Add register usage info to kernel metadata Add register usage information to the runtime metadata so that it can be used during kernel launch (that change will be in a different commit). Add this information to the kernel trace. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D98829	2021-03-18 17:00:42 +00:00
Jon Chesterfield	dbf8f2b089	Revert "[libomptarget] Build amdgcn devicertl by default" This reverts commit `e23f3502d9`. It broke the build of openmp for clang built without amdgcn support. D98746, under review, would allow this to reland.	2021-03-17 11:34:44 +00:00
Johannes Doerfert	0a954a528b	[OpenMP][FIX] Repair accidental replacement of _shfl_sync with _shfl This was broken accidentally in D95752. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D98677	2021-03-15 22:46:00 -05:00
Jon Chesterfield	e23f3502d9	[libomptarget] Build amdgcn devicertl by default [libomptarget] Build amdgcn devicertl by default The cmake for this looks for an llvm install and does the right thing when building as part of enable_runtimes. It will probably do the right thing in other settings - at least, it won't try to build this with gcc. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98658	2021-03-15 23:17:50 +00:00
Jon Chesterfield	bb38d7ff05	[libomptarget][nfc][amdgcn] Use precise triple for devicertl build	2021-03-15 20:24:13 +00:00
Jon Chesterfield	d0bc85f04a	[libomptarget][nfc] Drop unused DEVICE macro [libomptarget][nfc] Drop unused DEVICE macro Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98655	2021-03-15 20:12:50 +00:00
Jon Chesterfield	7da76aaaf4	[libomptarget] Build amdgpu plugin by default [libomptarget] Build amdgpu plugin by default This will build the amdgpu plugin if cmake is able to find the hsa runtime library, which will be the case if rocm is installed or if the hsa library has been installed somewhere cmake looks. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98654	2021-03-15 20:12:01 +00:00
Jon Chesterfield	bcb3f0f867	[libomptarget] Fix devicertl build [libomptarget] Fix devicertl build The target specific functions in target_interface are extern C, but the implementations for nvptx were mostly C++ mangling. That worked out as a quirk of DEVICE macro expanding to nothing, except for shuffle.h which only forward declared the functions with C++ linkage. Also implements GetWarpSize, as used by shuffle, and includes target_interface in nvptx target_impl.cu to help catch future divergence between interface and implementation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98651	2021-03-15 19:50:22 +00:00
Jon Chesterfield	f675b3df48	[libomptarget] Drop assert.h, use freestanding for amdgcn devicertl [libomptarget] Drop assert.h, use freestanding for amdgcn devicertl Promotes the runtime assert to a link time error for the unimplemented fallback functions. Enables amdgcn to build with only clang provided headers, which makes it less likely to break other builds when enabled. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98649	2021-03-15 18:50:09 +00:00
Jon Chesterfield	156842937f	[libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding [libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding The glibc headers are a periodic source of problems compiling the devicertl. This patch resolves the following error run into while building llvm on a slightly different linux system. ``` In file included from .../lib/clang/13.0.0/include/inttypes.h:21: In file included from /usr/include/inttypes.h:25: /usr/include/features.h:461:12: fatal error: 'sys/cdefs.h' file not found # include <sys/cdefs.h> ^~~~~~~~~~~~~ ``` As a second patch, removing assert.h from shuffle will let amdgcn build as -ffreestanding, at which point only the headers that clang itself provides are used and interactions with the host glibc are eliminated. Doing the same for nvptx is complicated by printf handling but also seems worthwhile. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98565	2021-03-15 16:54:58 +00:00
George Rokos	2468fdd9af	[libomptarget] Add allocator support for target memory This patch adds the infrastructure for allocator support for target memory. Three allocators are introduced for device, host and shared memory. The corresponding API functions have the llvm_ prefix temporarily, until they become part of the OpenMP standard. Differential Revision: https://reviews.llvm.org/D97883	2021-03-13 03:47:07 -08:00
Johannes Doerfert	5449fbb5d4	[OpenMP][NFC] Use `AsyncInfo` as the variable name for a `__tgt_async_info` Reviewed By: grokos, tianshilei1992 Differential Revision: https://reviews.llvm.org/D96444	2021-03-11 23:31:34 -06:00
Johannes Doerfert	66ba494b49	[OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant The shuffle idiom is differently implemented in our supported targets. To reduce the "target_impl" file we now move the shuffle idiom in it's own self-contained header that provides the implementation for AMDGPU and NVPTX. A fallback can be added later on. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95752	2021-03-11 23:31:30 -06:00
Joseph Huber	807466ef28	[OpenMP] Restore backwards compatibility for libomptarget Summary: The changes introduced in D87946 changed the API for libomptarget functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x but was not given a backward-compatible interface. This change will require people using Clang 13.x or 12.x to recompile their offloading programs. Reviewed By: jdoerfert cchen Differential Revision: https://reviews.llvm.org/D98358	2021-03-11 09:52:11 -05:00
Shilei Tian	c41ae246ac	[OpenMP][Clang][NVPTX] Only build one bitcode library for each SM In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on NVPTX target. We don't need to have macros in source code to select right functions based on CUDA version. we don't need to compile multiple bitcode libraries of different CUDA versions for each SM. We don't need to worry about future compatibility with newer CUDA version. `-target-feature +ptx61` is used in this patch, which corresponds to the highest PTX version that CUDA 9.2 can support. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97198	2021-03-08 12:03:04 -05:00
Joel E. Denny	d0eb25a643	[OpenMP] Encapsulate more in checkDeviceAndCtors This patch just encapsulates some repeated code. To do so, it relocates some functions from interface.cpp to omptarget.cpp. It also adjusts them to the LLVM coding style. This patch is almost NFC except some `DP` messages are a bit different. For example, messages like "Entering target region" are now emitted even if offload is disabled, but a subsequent "Offload is disabled" is then emitted. Reviewed By: jdoerfert, grokos Differential Revision: https://reviews.llvm.org/D97908	2021-03-04 12:03:42 -05:00
Joel E. Denny	bfe5452b93	[OpenMP] Fix lone target exit data Without this patch, an `omp target exit data` before the runtime is initialized produces a runtime error. This patch fixes that by changing `__tgt_target_data_end_mapper` to call `CheckDeviceAndCtors` like many other runtime routines. Discussed at <https://lists.llvm.org/pipermail/openmp-dev/2021-March/003920.html>. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D97907	2021-03-04 12:03:42 -05:00
Joel E. Denny	10c18c69f2	[OpenMP] Fix support for device as host Without this patch, when the offload device is set to `omp_get_initial_device()`, the runtime fails with an error diagnostic when entering target regions or target data regions. However, OpenMP 5.1, sec. 2.14.5 "target Construct", "Restrictions", p. 203, L3-5 states: > The device clause expression must evaluate to a non-negative integer > value that is less than or equal to the value of > omp_get_num_devices(). Sec. 3.7.7 "omp_get_initial_device", p. 412, L2-3 states: > The value of the device number is the value returned by the > omp_get_num_devices routine. Similarly, OpenMP 5.0, sec. 2.12.5 "target Construct", "Restrictions", p. 174 L30-32 states: > The device clause expression must evaluate to a non-negative integer > value less than the value of omp_get_num_devices() or to the value > of omp_get_initial_device(). This patch fixes this behavior by changing the runtime to behave as if offloading is disabled whenever it finds the offload device (either from a `device` clause or the default device) is set to the host device. In the case of mandatory offloading when `omp_get_num_devices() == 0`, it incorporates the behavior proposed for OpenMP 5.2 in OpenMP spec github issue 2669. Reviewed By: grokos, RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D97616	2021-03-04 12:03:42 -05:00
Alexey Bataev	0caf736d7e	[OPENMP50]Mapping of the subcomponents with the 'default' mappers. If the mapped structure has data members, which have 'default' mappers, need to map these members individually using their 'default' mappers. Differential Revision: https://reviews.llvm.org/D92195	2021-03-02 07:11:06 -08:00
Vyacheslav Zakharin	6baeeb9efa	[libomptarget] Fixed MSVC build fail caused by __attribute__((used)). Differential Revision: https://reviews.llvm.org/D97348	2021-02-24 09:59:39 -08:00
Shilei Tian	e5da63d5a9	[OpenMP] Fixed a crash when offloading to x86_64 with target nowait PR#49334 reports a crash when offloading to x86_64 with `target nowait`, which is caused by referencing a nullptr. The root cause of the issue is, when pushing a hidden helper task in `__kmp_push_task`, it also maps the gtid to its shadow gtid, which is wrong. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97329	2021-02-24 12:37:30 -05:00
Manoel Roemmer	542d9c2154	[libomptarget] Load images in order of registration This makes sure that images are loaded in the order in which they are registered with libomptarget. If a target can load multiple images and these images depend on each other (for example if one image contains the programs target regions and one image contains library code), then the order in which images are loaded can be important for symbol resolution (for example, in the VE plugin). In this case: because the same code exist in the host binaries, the order in which the host linker loads them (which is also the order in which images are registered with libomptarget) is the order in which the images have to be loaded onto the device. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95530	2021-02-24 18:15:41 +01:00
Shilei Tian	f6c2984a09	[OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM `ptx71` is not supported in release version of LLVM yet. As a result, the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned in D97004. Since the support in D97004 is just a WA for releease, and we'll not use it in the near future, using `ptx70` for CUDA 11 is feasible. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97195	2021-02-23 13:20:21 -05:00
Shilei Tian	309b00a42e	[OpenMP][NFC] clang-format the whole openmp project Same script as D95318. Test files are excluded. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D97088	2021-02-20 12:46:32 -05:00
Joel E. Denny	ef8b3b5ffd	[OpenMP] Fix nvptx CUDA_VERSION conversion As mentioned in PR#49250, without this patch, ptxas for CUDA 9.1 fails in the following two tests: - openmp/libomptarget/test/mapping/lambda_mapping.cpp - openmp/libomptarget/test/offloading/bug49021.cpp The error looks like: ``` ptxas /tmp/lambda_mapping-081ea9.s, line 828; error : Not a name of any known instruction: 'activemask' ``` The problem is that our cmake script converts CUDA version strings incorrectly: 9.1 becomes 9100, but it should be 9010, as shown in `getCudaVersion` in `clang/lib/Driver/ToolChains/Cuda.cpp`. Thus, `openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu` inadvertently enables `activemask` because it apparently becomes available in 9.2. This patch fixes the conversion. This patch does not fix the other two tests in PR#49250. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97012	2021-02-19 11:09:26 -05:00
Joel E. Denny	d2147b1a87	[OpenMP] Fix always,from and delete for data absent at exit Without this patch, there's a runtime error for those map types at exit from an "omp target data" or at "omp target exit data", but the spec says the list item should be ignored. This patch tests that fix in data_absent_at_exit.c, and it also improves other testing for data that is not fully present at exit. Reviewed By: grokos, RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D96999	2021-02-19 11:09:26 -05:00
Ron Lieberman	30c0d5b4c3	[OPENMP][AMDGCN] Improvements to print_kernel_trace (bit mask) allow bit masking to select various trace features. bit 0 => Launch tracing (stderr) bit 1 => timing of runtime (stdout) bit 2 => detailed launch tracing (stderr) bit 3 => timing goes to stdout instead of stderr example: LIBOMPTARGET_KERNEL_TRACE=7 does it all LIBOMPTARGET_KERNEL_TRACE=5 Launch + details LIBOMPTARGET_KERNEL_TRACE=2 timings + launch to stderr LIBOMPTARGET_KERNEL_TRACE=10 timings + launch to stdout Differential Revision: https://reviews.llvm.org/D96998	2021-02-19 06:47:22 -05:00
Shilei Tian	89827fd404	[OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1 CUDA 11.2 and CUDA 11.1 are all available now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97004	2021-02-18 21:04:39 -05:00
Jon Chesterfield	53d7fd3762	[libomptarget][amdgcn] Remove lookup of .language msgpack field	2021-02-17 23:02:16 +00:00
Alexey Bataev	60d71a286b	[OPENMP50]Allow overlapping mapping in target constructs. OpenMP 5.0 removed a lot of restriction for overlapped mapped items comparing to OpenMP 4.5. Patch restricts the checks for overlapped data mappings only for OpenMP 4.5 and less and reorders mapping of the arguments so, that present and alloc mappings are processed first and then all others. Differential Revision: https://reviews.llvm.org/D86119	2021-02-16 14:42:08 -08:00
Johannes Doerfert	2518cc65d2	[OpenMP][FIX] Avoid use of stack allocations in asynchronous calls As reported by Guilherme Valarini [0], we used to pass stack allocations to calls that can nowadays be asynchronous. This is arguably a problem and it will inevitably result in UB. To remedy the situation we allocate the locations as part of the AsyncInfoTy object. The lifetime of that object matches what we need for now. If the synchronization is not tied to the AsyncInfoTy object anymore we might need to have a different buffer construct in global space. This should be back-ported to LLVM 12 but needs slight modifications as it is based on refactoring patches we do not need to backport. [0] https://lists.llvm.org/pipermail/openmp-dev/2021-February/003867.html Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D96667	2021-02-16 15:38:11 -06:00
Johannes Doerfert	758b849931	[OpenMP] Unify omptarget API and usage wrt. `__tgt_async_info` This patch unifies our libomptarget API in two ways: - always pass a `__tgt_async_info` object, the Queue member decides if it is in use or not. - (almost) always synchronize in the interface layer and not in the omptarget layer. A side effect is that we now put all constructor and static initializer kernels in a stream too, if the device utilizes `__tgt_async_info`. The patch contains a TODO which can be addressed as we add support for asynchronous malloc and free in the plugin API. This is the only `synchronizeAsyncInfo` left in the omptarget layer. Site note: On a V100 system the GridMini performance for small sizes more than doubled. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96379	2021-02-16 15:38:06 -06:00
Johannes Doerfert	a2fc0d34db	[OpenMP] Move synchronization into `__tgt_async_info` The AsyncInfo should be passed everywhere and it should offer a way to ensure synchronization, given a libomptarget Device. This replaces D96431. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96438	2021-02-16 15:38:01 -06:00
Johannes Doerfert	942728763b	[OpenMP][NFC] Unify `target` API with other by passing a `__tgt_async_info` pointer Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96430	2021-02-16 15:37:56 -06:00
Johannes Doerfert	44f3022cdf	[OpenMP][NFC] Pass a DeviceTy, not the device number to `target` This unifies the API of `target` relative to `targetUpdateData` and such. Reviewed By: tianshilei1992, grokos Differential Revision: https://reviews.llvm.org/D96429	2021-02-16 15:37:51 -06:00
Johannes Doerfert	ea9395716e	[OpenMP][NFC] Clang format the libomptarget plugins Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96445	2021-02-16 15:37:46 -06:00
Johannes Doerfert	ad94fce845	[OpenMP][NFC] Eliminate sign comparison warning via explicit casts Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96812	2021-02-16 15:37:41 -06:00
Johannes Doerfert	9cd1e2228c	[OpenMP][NFC] Clang format libomptarget code (src & include) The struct and enum alignments are kept by disabling clang-format for that code region. Reviewed By: tianshilei1992, JonChesterfield, grokos Differential Revision: https://reviews.llvm.org/D96428	2021-02-16 15:37:35 -06:00

... 3 4 5 6 7 ...

897 Commits