llvm-project

Commit Graph

Author	SHA1	Message	Date
Johannes Doerfert	93bebdc78f	[OpenMP][NFCI] Cleanup new device RT mapping interface Minimize the `impl` interface and clean up some uses of mapping functions. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D112154	2021-11-04 17:54:53 -05:00
Johannes Doerfert	73720c8059	[OpenMP][FIX] Introduce and use a simple generic-mode barrier Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was OK to be used in the custom state machine. Now that SPMD barriers are assumed to be aligned we need to use a "generic" barrier in places that are not aligned. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112893	2021-11-02 23:22:01 -05:00
Johannes Doerfert	ccb5d2726a	[OpenMP][FIX] Avoid a race between initialization and first state reads When we pick state 0 to initialize state but thread N is going to be the "main thread", in generic mode, we would require extra synchronization. Instead, we should pick the main thread to initialize state in generic mode and any thread in SPMD mode. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112874	2021-11-02 23:21:49 -05:00
Shilei Tian	025f549240	[OpenMP][DeviceRTL] Fixed an issue that causes hang in SU3 The synchronization at the end of parallel region cannot make sure all threads exit the scope. As a result, the assertions right after it might be hit, and further the `state::assumeInitialState(IsSPMD)` in `__kmpc_target_deinit` may not hold as well. We either add a synchronization right after the parallel region, or remove the assertions and assuptions. Here we choose the first one as those assertions and assumptions can help optimizations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112861	2021-10-30 14:44:29 -04:00
Kazu Hirata	3cfc1757c5	Ensure newlines at the end of files (NFC)	2021-10-29 20:26:09 -07:00
Joseph Huber	927c74d4da	[OpenMP] Fix assert macro expr Summary: A previous patch changed the check and mistakenly only did `!expr` when this is a macro expansion and could only apply to the left side of an expression.	2021-10-29 17:44:13 -04:00
Joseph Huber	2c6a4e5678	[OpenMP] Use the assertion formatting from assert.h This patch changes the `assert_assume` function used for internal assumptions in the device runtime to use a more standard formatting for the assumption message. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112842	2021-10-29 16:44:01 -04:00
Joseph Huber	6dd791bca8	[OpenMP] Check output of malloc in the device for debug A common problem is the device running out of global heap memory and crashing due to a nullptr dereference when using the data sharing stack. This explicitly checks that a nullptr was not returned by malloc when debugging field 1 is enabled. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112005	2021-10-29 14:57:12 -04:00
Joseph Huber	74f91741b6	[OpenMP] Use function tracing RAII for runtime functions. This patch adds support for using function tracing features to track the executino of runtime functions in the device runtime library. This is enabled by first compiling the new runtime with `-fopenmp-target-debug=3` and running with `LIBOMPTARGET_DEVICE_RTL_DEBUG=3`. The output only tracks team 0 and thread 0 so there isn't much output when using a generic region. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112002	2021-10-29 14:57:11 -04:00
Jon Chesterfield	4d50803ce4	[libomptarget] Build DeviceRTL for amdgpu Passes same tests as the current deviceRTL. Includes cmake change from D111987. CI is showing a different set of pass/fails to local, committing this without the tests enabled by default while debugging that difference. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227	2021-10-28 12:34:01 +01:00
Jon Chesterfield	22bd75be70	[openmp] Fix a git misfire in `cf37a94c1e`	2021-10-28 01:35:25 +01:00
Jon Chesterfield	6c7b203d1d	Revert "[libomptarget] Build DeviceRTL for amdgpu" - more tests failing on CI than failed locally when writing this patch This reverts commit `33427fdb7b`.	2021-10-28 01:01:53 +01:00
Jon Chesterfield	cf37a94c1e	[openmp] Add amdgpu impl missed from D112153	2021-10-28 00:55:53 +01:00
Jon Chesterfield	33427fdb7b	[libomptarget] Build DeviceRTL for amdgpu Passes same tests as the current deviceRTL. Includes cmake change from D111987. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227	2021-10-28 00:41:45 +01:00
Johannes Doerfert	48877525cf	[OpenMP] Remove obsolete external interface for device RT We do not generate _serialized_parallel calls in device mode, no need for an external API. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D112145	2021-10-27 18:22:35 -05:00
Johannes Doerfert	5102c3c61e	[OpenMP][FIX] Do not adjust the level after the environment was popped Exiting a data environment will reset all values, it is wrong to adjust them afterwards. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112144	2021-10-27 18:22:33 -05:00
Johannes Doerfert	b16aadf0a7	[OpenMP] Introduce aligned synchronization into the new device RT We will later use the fact that a barrier is aligned to reason about thread divergence. For now we introduce the assumption and some more documentation. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112153	2021-10-27 18:22:31 -05:00
Johannes Doerfert	ef922c692f	[OpenMP][FIX] Query proper thread ID information to support nesting The OpenMP thread ID is not the hardware thread ID if we have nesting. We need to ask the runtime properly to ensure correct results. Note that the loop interface is going to change soon so we do not adjust it now but simply ignore the extra argument. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111950	2021-10-27 18:18:44 -05:00
Johannes Doerfert	4c88341d17	[OpenMP][FIX] Do check the level before return team size The team size could/should be an ICV but since we know it is either 1 or a value we can leave it in the team state for now. However, we still need to determine if the current level is nested before we use it. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D111949	2021-10-27 18:18:42 -05:00
Johannes Doerfert	dc72960967	[OpenMP][FIX] Do not dereference a potential nullptr The first thread state in the new GPU runtime doesn't have a previous one and we should not dereference the nullptr placeholder. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111946	2021-10-27 18:18:39 -05:00
Jon Chesterfield	e42e5785ad	[libomptarget][nfc]Generalise DeviceRTL cmake to allow building for amdgpu Essentially moves the foreach over sm integers into a macro and instantiates it for nvptx. NFC in that the macro is not presently instantiated for amdgpu as the corresponding code doesn't compile yet. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111987	2021-10-26 21:18:21 +01:00
Ron Lieberman	be03ef3ed1	[openmp][lit] Add support to OpenMP lit.cfg for ROCR_VISIBLE_DEVICES env-var add support for ROCR_VISIBLE_DEVICES similar to name and purpose as CUDA_VISIBLE_DEVICES Differential Revision: https://reviews.llvm.org/D112503	2021-10-26 13:46:42 +00:00
Georgios Rokos	2feafa2e46	[libomptarget][NFC] Add comment explaining why we pass argument bases and offsets as two separate entities to the plugins.	2021-10-25 14:51:14 -07:00
Shilei Tian	2a30c03c62	[OpenMP][Offloading] Only get trip count if team construct Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D112475	2021-10-25 17:16:14 -04:00
Kazu Hirata	d8e4170b0a	Ensure newlines at the end of files (NFC)	2021-10-23 08:45:29 -07:00
Jon Chesterfield	bf6f955f39	[libomptarget] Run GPU offloading tests on both new and old runtime Implemented by patching python config instead of modifying all the tests so that -generic and XFAIL work as usual. Expectation is for this to be reverted once the old runtime is deleted. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D112225	2021-10-22 23:28:44 +01:00
Jon Chesterfield	a602c2b51d	[libomptarget][DeviceRTL] Generalise and simplify cmakelists Step towards building the DeviceRTL for amdgpu. Mostly replaces cuda-specific toolchain finding logic with the generic logic currently found in the amdgpu deviceRTL cmake. Also deletes dead code and changes the default to build on systems without cuda installed, as the library doesn't use cuda and the amdgpu-only systems generally won't have cuda installed. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111983	2021-10-21 16:14:29 +01:00
Joseph Huber	b1ce454930	[OpenMP] Remove macro guards for device debugging The plugin currently uses a macro to check if this is a debug built before assigning the debug kind variable to the device environment struct. This is being deprecated because the new device runtime does not maintain separate debug builds and should always be availible. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112083	2021-10-19 12:21:43 -04:00
Jon Chesterfield	7272982e1d	[libomptarget] Refactor DeviceRTL prior to AMDGPU bringup Subset of D111993. Fix typos, rename read to load. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111999	2021-10-19 08:05:06 +01:00
Jon Chesterfield	251b1e7c25	[libomptarget] Pass OMP_TARGET_OFFLOAD env variable through to tests Useful for OMP_TARGET_OFFLOAD=MANDATORY when testing Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111995	2021-10-18 16:03:03 +01:00
Shilei Tian	2c941fa2f9	[OpenMP][deviceRTLs] Fix wrong return value of `__kmpc_is_spmd_exec_mode` D110279 introduced a bug to the device runtime. In `__kmpc_parallel_51`, we detect whether we are already in parallel region by `__kmpc_parallel_level() > __kmpc_is_spmd_exec_mode()`. It is based on the assumption that: - In SPMD mode, parallel level is initialized to 1. - In generic mode, parallel level is initialized to 0. - `__kmpc_is_spmd_exec_mode` returns `1` for SPMD mode, 0 otherwise. Because the return value type of `__kmpc_is_spmd_exec_mode` is `int8_t`, there was an implicit cast from `bool` to `int8_t`. We can make sure it is either 0 or 1 since C++14. In D110279, the return value is the result of an `and` operation, which is 2 in SPMD mode. This breaks the assumption in `__kmpc_parallel_51`. Reviewed By: carlo.bertolli, dpalermo Differential Revision: https://reviews.llvm.org/D111905	2021-10-16 12:58:29 -04:00
Ron Lieberman	d022f39d9f	[libomptarget][amdgpu][NFC] tweak a comment	2021-10-09 12:51:53 -04:00
Joseph Huber	bad44d5f39	[OpenMP] Add RTL function for getting number of threads in block. This patch adds support for the `__kmpc_get_hardware_num_threads_in_block` function that returns the number of threads. This was missing in the new runtime and was used by the AMDGPU plugin which prevented it from using the new runtime. This patchs also unified the interface for getting the thread numbers in the frontend. Originally authored by jdoerfert. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D111475	2021-10-08 22:21:59 -04:00
Joseph Huber	85ad566335	[OpenMP] Avoid calling `isSPMDMode` during RT initialization Until we hit the first barrier we should not call `mapping::isSPMDMode` with all threads. Instead, we now have (and use during initialization) a `mapping::isMainThreadInGenericMode` overload that takes the known SPMD-mode state and one that queries it. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111381	2021-10-08 22:00:41 -04:00
Joseph Huber	208f900527	[Libomptarget] Add an external interface to dynamic shared memory This patch adds an external interface to access the dynamic shared memory buffer in the device runtime. The function introduced is ``llvm_omp_get_dynamic_shared``. This includes a host-side definition that only returns a null pointer so that it can be used when host-fallback is enabled without crashing. Support for dynamic shared memory was also ported to the old device runtime. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110957	2021-10-08 15:36:57 -04:00
Shilei Tian	c060c634ef	[OpenMP][NVPTX] Fix an error in configuring #teams and #threads It must be a copy mistake. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D111407	2021-10-08 11:07:43 -04:00
Shilei Tian	af4599b8ab	[OpenMP][DeviceRTL] Add the support for printf in a freestanding way For NVPTX, `printf` can be used just with a function declaration. For AMDGCN, an function definition is added, but it simply returns. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109728	2021-10-07 22:15:37 -04:00
Johannes Doerfert	44710940af	[OpenMP][FIX] Data race in the SPMD execution of the new runtime We need to synchronize the threads before we destroy the RAII objects that hold the old values and not after to avoid threads executing the parallel region but seeing an inconsistent state. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111369	2021-10-07 21:01:24 -04:00
Jon Chesterfield	1bc3a6e41b	[libomptarget] Reapply `2bc4d48a78` which was accidentally reverted	2021-10-07 20:17:48 +01:00
Jon Chesterfield	0c554a4769	[libomptarget] Move device environment to shared header, remove divergence Follow on to D110006, related to D110957 Where implementations have diverged this resolves to match the new DeviceRTL - replaces definitions of this struct in deviceRTL and plugins with include - changes the dynamic_shared_size field from D110006 to 32 bits - handles stdint being unavailable in DeviceRTL - adds a zero initializer for the field to amdgpu - moves the extern declaration for deviceRTL to target_interface (omptarget.h is more natural, but doesn't work due to include order with debug.h) - Renames the fields everywhere to match the LLVM format used in DeviceRTL - Makes debug_level uint32_t everywhere (previously sometimes int32_t) Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D111069	2021-10-07 12:03:48 +01:00
Michał Górny	0873b9bef4	[openmp] [elf_common] Fix linking against LLVM dylib The hand-rolled linking logic in elf_common does not account for the possibility of using LLVM dylib rather than a dozen static libraries. Since it does not seem to be easily convertible to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB. This is necessary to support stand-alone builds against installed LLVM. Differential Revision: https://reviews.llvm.org/D111038	2021-10-04 09:29:06 +02:00
Jon Chesterfield	05ba9ff6a6	[libomptarget][amdgpu] Refactor memory pool collection	2021-10-01 14:58:01 +01:00
Jon Chesterfield	3247329107	[openmp] Add addrspacecast to getOrCreateIdent Fixes 51982. Adds a missing CreatePointerCast and allocates a global in the correct address space. Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\ blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting parts while checking the assertion failure still occurred. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110556	2021-09-30 21:36:31 +01:00
Jon Chesterfield	b75a7481ba	[libomptarget] Apply D110029 to amdgpu Use enum for execution mode. This is partly a port from ROCm and partly a port from D110029. Attempted to make the same choices as ROCm as far as comments etc go to reduce the merge conflicts. There is some cleanup warranted here - in particular I like the cuda patch factoring out the comparisons into named variables - but I'd like to leave that for a follow up patch, keeping this one minimal. Reviewed By: carlo.bertolli Differential Revision: https://reviews.llvm.org/D110845	2021-09-30 21:29:37 +01:00
Dhruva Chakrabarti	6226270253	[libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed. Fixes: SWDEV-275232 (With contributions from Ammar Elwazir, Laurent Morichetti, and Tony Tye) The current code is racy. After the packet is submitted, the GPU will increment the read index. If this wraps around before the memory is read from it'll refer to a signal from an unrelated packet. Change avoids reading from the packet post-submission. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110679	2021-09-29 09:22:07 -07:00
Jon Chesterfield	2bc4d48a78	[libomptarget][amdgpu] Follow on to D110513, empty kernarg pools are not fatal	2021-09-27 22:44:35 +01:00
Jon Chesterfield	738734f655	[libomptarget][amdgpu] Report zero devices if plugin construction fails, instead of segv	2021-09-27 22:13:12 +01:00
Jon Chesterfield	80fa43fe9a	Revert "[openmp] Add addrspacecast to getOrCreateIdent" This reverts commit `1a761e5b7b`. Failed CI, albeit with a different failure mode to BZ51982	2021-09-27 19:27:35 +01:00
Jon Chesterfield	1a761e5b7b	[openmp] Add addrspacecast to getOrCreateIdent Fixes 51982. Minor refactor to remove `return x = y` construct. Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\ blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting parts while checking the assertion failure still occurred. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110556	2021-09-27 19:23:12 +01:00
Joseph Huber	74d622dea4	[OpenMP] Add new worksharing definitions into device RTL This path defines the newly added `__kmpc_disitrute_static_init` functions in the device runtime library. These functions are currently exact copies of the current worksharing method but can be tuned later. Depends on D110429 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110430	2021-09-27 11:36:41 -04:00
Pushpinder Singh	b1695c2eb8	[AMDGPU][OpenMP] Add memory pool size check to isValidMemoryPool Keeping all the checks in one place for future simplification. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110513	2021-09-27 12:29:00 +00:00
Michael Kruse	1b242dccff	[OpenMP][CMake] Use in-project clang as CUDA->IR compiler for new DeviceRTL. Use the in-project clang, llvm-link and opt if available and unless CMake cache variables specify to use a different compiler. This applies D101265 to the new DeviceRTL's CMakeLists.txt which was copied before D101265 was applied. Fixes the openmp-offloading-cuda-runtime builder which was failing since D110006. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110251	2021-09-27 07:14:19 -05:00
Pushpinder Singh	9d0eb440ff	[libomptarget][nfc][amdgpu] Reorder function to clarify review diff	2021-09-27 09:30:55 +00:00
Jon Chesterfield	726a34f063	[libomptarget][amdgpu] Replace dead exit call with returning error	2021-09-27 09:43:37 +01:00
Jon Chesterfield	8cf93a35d4	[libomptarget][amdgpu] Destruct HSA queues Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D109511	2021-09-26 15:34:21 +01:00
Joseph Huber	d83ca624a1	[OpenMP] Fix data-race in new device RTL This patch fixes a data-race observed when using the new device runtime library. The Internal control variable for the parallel level is read in the `__kmpc_parallel_51` function while it could potentially be written by other threads. This causes data corruption and will cause nondetermistic behaviour in the runtime. This patch fixes this by adding an explicit synchronization before the region starts. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110366	2021-09-23 17:28:07 -04:00
Shilei Tian	423d34f74a	[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit` This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110279	2021-09-22 17:16:41 -04:00
Joseph Huber	60a40cf379	[OpenMP] Fix KeepAlive usage Summary: Functions were called the wrong way around, this didn't keep the symbol alive.	2021-09-22 14:38:19 -04:00
Joseph Huber	277b681ede	[OpenMP] Add function tracing debugging to device RTL This patch adds support for an RAII struct that will print function traces when placed inside of a function declaration. Each successive call will increase the indentation to make it easier to visually inspect. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110202	2021-09-22 12:25:29 -04:00
Shilei Tian	ca999f7191	[OpenMP][Offloading] Use bitset to indicate execution mode instead of value The execution mode of a kernel is stored in a global variable, whose value means: - 0 - SPMD mode - 1 - indicates generic mode - 2 - SPMD mode execution with generic mode semantics We are going to add support for SIMD execution mode. It will be come with another execution mode, such as SIMD-generic mode. As a result, this value-based indicator is not flexible. This patch changes to bitset based solution to encode execution mode. Each position is: [0] - generic mode [1] - SPMD mode [2] - SIMD mode (will be added later) In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode execution with generic mode semantics. In the future after we add the support for SIMD mode, `0b1xx` will be in SIMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110029	2021-09-22 11:40:52 -04:00
Joseph Huber	1cf86df883	[OpenMP] Make sure the Thread ID function is not removed Summary: The thread ID function was reintroduced in D110195, but could potentially be removed by the optimizer. Make the function noinline to preserve the call sites and add it to the externalization RAII so its definition is not removed by the attributor.	2021-09-22 10:13:18 -04:00
Joseph Huber	e95731cca7	[OpenMP] Add thread ID function into new RTL The new device runtime library currently lacks the `kmpc_get_hardware_thread_id_in_block` function which is currently used when doing the SPMDzation optimization. This call would be introduced through the optimization and then cause a linking error because it was not present. This patch adds support for this runtime call. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110195	2021-09-21 17:43:50 -04:00
Giorgis Georgakoudis	ac90dfc43a	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `1d66649adf`. Revert to fix AMG GPU issue.	2021-09-21 13:20:39 -07:00
Giorgis Georgakoudis	1d66649adf	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6 Differential Revision: https://reviews.llvm.org/D102107	2021-09-21 10:50:04 -07:00
Shilei Tian	49e976c934	[OpenMP][NVPTX] Fix a warning that data argument not used by format string Reviewed By: jhuber6, grokos Differential Revision: https://reviews.llvm.org/D110104	2021-09-20 17:22:14 -04:00
Joseph Huber	f1c821fa85	[OpenMP] Add support for dynamic shared memory in new RTL This patch adds support for using dynamic shared memory in the new device runtime. The new function `__kmpc_get_dynamic_shared` will return a pointer to the buffer of dynamic shared memory. Currently the amount of memory allocated is set by an environment variable. In the future this amount will be added to the amount used for the smart stack which will be configured in a similar way. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110006	2021-09-17 21:25:36 -04:00
Joseph Huber	ec02c34b6d	[OpenMP] Add additional fields to device environment This patch adds fields for the device number and number of devices into the device environment struct and debugging values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110004	2021-09-17 21:25:32 -04:00
Joseph Huber	b266bcb135	[OpenMP] Implement __assert_fail in the new device runtime This patch implements the `__assert_fail` function in the new device runtime. This allows users and developers to use the standars assert function inside of the device. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D109886	2021-09-17 21:25:28 -04:00
Shilei Tian	81a1a91c62	[NFC] clang-format -i /openmp/libomptarget/deviceRTLs/interface.h	2021-09-17 12:55:02 -04:00
Hansang Bae	ae2a5facce	[OpenMP][libomptarget] Minor fix in x86_64 plugin Call to remove() was passing invalid address for the file name. Differential Revision: https://reviews.llvm.org/D109846	2021-09-15 15:57:06 -05:00
Ye Luo	2187cbf56f	[OpenMP][libomptarget] Add __tgt_target_return_t enum for __tgt_target_XXX return int The defintion of OFFLOAD_SUCCESS and OFFLOAD_FAIL used in plugin APIs and libomptarget public APIs are not consistent. Create __tgt_target_return_t for libomptarget public APIs. Differential Revision: https://reviews.llvm.org/D109304	2021-09-10 16:11:08 -05:00
Jon Chesterfield	6760234e8d	[libomptarget][amdgpu] Precisely manage hsa lifetime The hsa library must be initialized before any calls into it and destructed after the last call into it. There have been a number of bugs in this area related to member variables which would like to use raii to manage resources acquired from hsa. This patch moves the init/shutdown of hsa into a class, such that when used as the first member variable (could be a base), the lifetime of other member variables are reliably scoped within it. This will allow other classes to use raii reliably when used as member variables within the global. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D109512	2021-09-09 17:28:11 +01:00
Jon Chesterfield	2a581710c1	[openmp] No longer use LIBRARY_PATH to find devicertl Given D109057, change test runner to use the libomptarget-x-bc-path argument instead of the LIBRARY_PATH environment variable to find the device library. Also drop the use of LIBRARY_PATH environment variable as it is far too easy to pull in the device library from an unrelated toolchain by accident with the current setup. No loss in flexibility to developers as the clang commandline used here is still available. Reviewed By: jdoerfert, tianshilei1992 Differential Revision: https://reviews.llvm.org/D109061	2021-09-09 17:16:41 +01:00
Jon Chesterfield	d642156f8f	[libomptarget][nfc] Hoist hsa_init into rtl.cpp	2021-09-09 16:09:34 +01:00
Ye Luo	2cfe1a09d1	[OpenMP][libomptarget][NFC] Change checkDeviceAndCtors return type to bool. What is exactly needed is only a boolean. Pulling OFFLOAD_SUCCESS/FAIL only adds confusion. Differential Revision: https://reviews.llvm.org/D109303	2021-09-07 13:59:27 -05:00
Ye Luo	c3aecf87d5	[OpenMP][libomptarget] Change device vector elements to unique_ptr type Using std::vector<DeviceTy> requires implementing copy constructor and copied assign operator for DeviceTy. Indeed DeviceTy should never be copied. After changing to std::vector<std::unique_ptr<DeviceTy>>, All the unsafe copy constructor and copy assign operator implementations can be removed. Compilers mark them deleted due to mutex or underlying objects and this is the desired behavior. Differential Revision: https://reviews.llvm.org/D109276	2021-09-06 22:28:49 -05:00
Ye Luo	8e5c1b039e	[OpenMP][libomptarget] Change synchronize_ty return type to int32_t Plugins always return int32_t. Stay consistent with other functions which return error status. Differential Revision: https://reviews.llvm.org/D109341	2021-09-06 21:38:54 -05:00
Ron Lieberman	fdac5adee6	[openmp] NFC add bitcode comment	2021-09-02 18:21:39 -05:00
Jon Chesterfield	201e466eba	[libomptarget][amdgpu] Add gfx90a to build list	2021-09-02 18:11:02 +01:00
Jon Chesterfield	3153bdd547	[libomptarget][amdgpu] Drop env variables Use the same debug print as the rest of libomptarget plugins with the same environment control. Also drop the max queue size debugging hook as I don't believe it is still in use, can bring it back near the rest of the env handling in rtl.cpp if someone objects. That makes most of rt.h and all of utils.cpp unused. Clean that up and simplify control flow in a couple of places. Behaviour change is that debug prints that used to use the old environment variable now use the new one and print in slightly different format, and the removal of the max queue size variable. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D108784	2021-09-02 11:02:39 +01:00
Ye Luo	289a1089cd	[libomptarget] Move HostDataToTargetTy states into StatesTy Use unique_ptr to achieve the effect of mutable. Remove mutable keyword of DynRefCount and HoldRefCount Remove std::shared_ptr from UpdateMtx Reviewed By: tianshilei1992, grokos Differential Revision: https://reviews.llvm.org/D109007	2021-09-01 23:36:05 -05:00
Fangrui Song	4d5220faf9	[OpenMP] Fix -Wunused-but-set-parameter in -DLLVM_ENABLE_ASSERTIONS=off builds. NFC	2021-09-01 17:55:13 -07:00
Joel E. Denny	1f9e437065	[OpenMP][AMDGPU] Remove unneeded XFAILs	2021-09-01 18:00:25 -04:00
Joel E. Denny	786a140650	[OpenMP] Use IsHostPtr where needed in rest of omptarget.cpp As started in D107925, this patch replaces the remaining occurrences of `UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin` in `omptarget.cpp` with `IsHostPtr`. The former condition is broken in the rare case that the device and host happen to use the same address for their mapped allocations. I don't know how to write a test that's likely to reveal this case. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107928	2021-09-01 17:31:42 -04:00
Joel E. Denny	d11bab0b73	[OpenMP] Use IsHostPtr where needed for targetDataBegin As discussed in D105990, without this patch, `targetDataBegin` determines whether to transfer data (as opposed to assuming it's in shared memory) using the condition `!UseUSM \|\| HasCloseModifier`. However, this condition is broken if use of discrete memory was forced by `omp_target_associate_ptr`. This patch extends `unified_shared_memory/associate_ptr.c` to reveal this case, and it fixes it using `!IsHostPtr` in `DeviceTy::getTargetPointer` to replace this condition. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107927	2021-09-01 17:31:42 -04:00
Joel E. Denny	fa6c275505	[OpenMP][NFC] Eliminate CopyMember from targetDataEnd This patch is based on comments in D105990. It is NFC according to the following observations: 1. `CopyMember` is computed as `!IsHostPtr && IsLast`. 2. `DelEntry` is true only if `IsLast` is true. We apply those observations in order: ``` if ((DelEntry \|\| Always \|\| CopyMember) && !IsHostPtr) if ((DelEntry \|\| Always \|\| IsLast) && !IsHostPtr) if ((Always \|\| IsLast) && !IsHostPtr) ``` Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107926	2021-09-01 17:31:42 -04:00
Joel E. Denny	8e4836b2a2	[OpenMP] Use IsHostPtr where needed for targetDataEnd As discussed in D105990, without this patch, `targetDataEnd` determines whether to transfer data or delete a device mapping (as opposed to assuming it's in shared memory) using two different conditions, each of which is broken for some cases: 1. `!(UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin)`: The broken case is rare: the device and host might happen to use the same address for their mapped allocations. I don't know how to write a test that's likely to reveal this case, but this patch does fix it, as discussed below. 2. `!UNIFIED_SHARED_MEMORY \|\| HasCloseModifier`: There are at least two broken cases: 1. The `close` modifier might have been specified on an `omp target enter data` but not the corresponding `omp target exit data`, which thus might falsely assume a mapping is in shared memory. The test `unified_shared_memory/close_enter_exit.c` already has a missing deletion as a result, and this patch adds a check for that. This patch also adds the new test `close_member.c` to reveal a missing transfer and deletion. 2. Use of discrete memory might have been forced by `omp_target_associate_ptr`, as in the test `unified_shared_memory/api.c`. In the current `targetDataEnd` implementation, this condition turns out not be used for this case: because the reference count is infinite, a transfer is possible only with an `always` modifier, and this condition is never used in that case. To ensure it's never used for that case in the future, this patch adds the test `unified_shared_memory/associate_ptr.c`. Fortunately, `DeviceTy::getTgtPtrBegin` already has a solution: it reports whether the allocation was found in shared memory via the variable `IsHostPtr`. After this patch, `HasCloseModifier` is no longer used in `targetDataEnd`, and I wonder if the `close` modifier is ever useful on an `omp target data end`. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107925	2021-09-01 17:31:42 -04:00
Jon Chesterfield	cef1199686	Revert "[openmp] No longer use LIBRARY_PATH to find devicertl" This reverts commit `7a228f872f`. Failing test case under CI	2021-09-01 20:44:12 +01:00
Jon Chesterfield	7a228f872f	[openmp] No longer use LIBRARY_PATH to find devicertl Given D109057, change test runner to use the libomptarget-x-bc-path argument instead of the LIBRARY_PATH environment variable to find the device library. Also drop the use of LIBRARY_PATH environment variable as it is far too easy to pull in the device library from an unrelated toolchain by accident with the current setup. No loss in flexibility to developers as the clang commandline used here is still available. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109061	2021-09-01 20:24:34 +01:00
Jon Chesterfield	718e5a9883	[libomptarget] Set runpath on libomptarget, use that to drop LD_LIBRARY_PATH from test runner Using rpath instead of LD_LIBRARY_PATH to find libomp.so and libomptarget.so lets one rerun the already built test executables without setting environment variables and removes the risk of the test runner picking up different libraries to the developer debugging the failure. rpath usually means runpath, which is not transitive, so set runpath on libomptarget itself so that it can find the plugins located next to it, spelled $ORIGIN. This provides sufficient functionality to drop D102043 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D109071	2021-09-01 18:47:56 +01:00
Jon Chesterfield	f8bcbb82a7	[libomptarget] Normalise a cmake debug string, checking it triggers CI	2021-09-01 14:24:28 +01:00
Joel E. Denny	1688b4cf8e	[OpenMP][AMDGPU] XFAIL test where kernels call printf	2021-08-31 22:11:28 -04:00
Joel E. Denny	ec1ebcd302	[OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in runtime (2/2) This patch implements OpenMP runtime support for an original OpenMP extension we have developed to support OpenACC: the `ompx_hold` map type modifier. The previous patch in this series, D106509, implements Clang support and documents the new functionality in detail. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D106510	2021-08-31 16:13:49 -04:00
Joachim Protze	5ea1c37118	[libomptarget][amdcgn] Only add opt/llvm-link dependency if TARGET is available In some build configurations, the target we depend on is not available for declaring the build dependency. We only need to declare the build dependency, if the build target is available in the same build. Fixes the issue raised in https://reviews.llvm.org/D107156#2969862 This patch should go into release/13 together with D108404 Differential Revision: https://reviews.llvm.org/D108868	2021-08-30 17:32:11 +02:00
Shilei Tian	e8fdacfd81	[OpenMP][NVPTX] Fixed missing variables for CUDA free compilation in NVPTX plugin `CU_EVENT_DEFAULT` is defined in CUDA header. It should be added to `openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` for CUDA free build. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108878	2021-08-28 18:08:10 -04:00
Shilei Tian	29df4ab3f3	[OpenMP][Offloading] Add support for event related interfaces This patch adds the support form event related interfaces, which will be used later to fix data race. See D104418 for more details. Reviewed By: jdoerfert, ye-luo Differential Revision: https://reviews.llvm.org/D108528	2021-08-28 16:24:14 -04:00
George Rokos	a2bd44089e	[libomptarget][NFC] Fixed tests which checked for obsolete string "getOrAllocTgtPtr"	2021-08-28 07:35:42 -07:00
Jon Chesterfield	78f92c3810	[openmp][amdgpu] Initial gfx10 offloading implementation Lets wavefront size be 32 for amdgpu openmp, as well as 64. Fixes up as little as possible to pass that through the libraries. This change is end to end, as opposed to updating clang/devicertl/plugin separately. It can be broken up for review/commit if preferred. Posting as-is so that others with a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are probably bugs remaining as well as the todo: for letting grid values vary more. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108708	2021-08-27 12:34:03 +01:00
George Rokos	3819aae6dd	[libomptarget][NFC] Replaced obsolete name "getOrAllocTgtPtr" with new "getTargetPointer" in debug messages.	2021-08-26 18:01:18 -07:00
Jon Chesterfield	3d85342982	[libomptarget][amdgpu][nfc] Rename variables, delete dead code	2021-08-26 19:58:38 +01:00
Jon Chesterfield	68ab93f4d7	[libomptarget][amdgpu][nfc] Rename source files	2021-08-26 18:29:44 +01:00
Jon Chesterfield	a5f4074d85	[libomptarget][amdgpu] Macro for accessing GPU variables from plugin Lets the amdgpu plugin write to omptarget_device_environment to enable debugging. Intend to use in the near future to record the wavesize that a given deviceRTL was compiled with for running on hardware that supports 32 or 64. Patch sets all the attributes that are useful. Notably .data means the variable is set by writing to host memory before copying to the GPU instead of launching a kernel to update the image. Can simplify the plugin slightly to drop the code for patching after load if this is used consistently. NFC on nvptx, cuda plugin seems to work fine without any annotations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108698	2021-08-26 17:28:18 +01:00
Jon Chesterfield	ba0af885e7	[libomptarget][amdgpu][nfc] Make grid value access match devicertl	2021-08-25 15:11:19 +01:00
Jon Chesterfield	9b2c6c07b5	[libomptarget][amdgpu] Refactor debug printing Move most debug printing in rtl.cpp behind DP() macro Adjust the print output for gpu arch mismatch when the architectures match Convert an assert into graceful failure Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108562	2021-08-25 14:57:51 +01:00
Jon Chesterfield	ba8547775b	[libomptarget][amdgpu] Fix debug build from D104696	2021-08-25 01:27:51 +01:00
Michael Kruse	1275ee3041	[OpenMP][amdgcn] Don't use in-tree clang if not available. The use of `$<TARGET_FILE:clang>` was adapted too broadly from D101265. Fixes llvm.org/PR51579 Also see discussion in D108534. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D108640	2021-08-24 12:50:49 -05:00
Pushpinder Singh	9b8b7c1180	[AMDGPU][Libomptarget] Delete g_atl_machine global With uses of g_atl_machine gone, a significant portion of dead code has been removed. This patch depends on D104691 and D104695. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104696	2021-08-24 07:59:40 +00:00
Jon Chesterfield	d26000e4cc	[openmp][devicertl] Freestanding nvptx via stub printf Compiled nvptx devicertl as freestanding, breaking the dependency on host glibc and gcc-multilibs. Thus build it by default. Comes at the cost of #defining out printf. Tried mapping it onto __builtin_printf but that gets transformed back to printf instead of hitting the cuda/openmp lowering transform. Printf could be preserved by one of: - dropping all the standard headers and ffreestanding - providing a header only printf implementation - changing the compiler handling of printf Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D108349	2021-08-23 23:07:47 +01:00
Jon Chesterfield	842f875c8b	[openmp] Use llvm GridValues from devicertl Add include path to the cmakefiles and set the target_impl enums from the llvm constants instead of copying the values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108391	2021-08-23 20:25:24 +01:00
Joachim Protze	4bb36df144	[libomptarget][amdcgn] Add build dependency for llvm-link and opt D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime (LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same cmake instance. We could limit the dependency to cases where libomptarget/plugins are really built. But compared to the whole llvm project, building openmp runtime is negligible and postponing the build of OpenMP runtime after the dependencies are ready seems reasonable. The direct dependency introduced in D107156 and D107320 is necessary for the case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp). Differential Revision: https://reviews.llvm.org/D108404	2021-08-20 01:57:58 +02:00
Jennifer Yu	c274b19866	Add implicit map for a list item appears in a reduction clause. A new rule is added in 5.0: If a list item appears in a reduction, lastprivate or linear clause on a combined target construct then it is treated as if it also appears in a map clause with a map-type of tofrom. Currently map clauses for all capture variables are added implicitly. But missing for list item of expression for array elements or array sections. The change is to add implicit map clause for array of elements used in reduction clause. Skip adding map clause if the expression is not mappable. Noted: For linear and lastprivate, since only variable name is accepted, the map has been added though capture variables. To do so: During the mappable checking, if error, ignore diagnose and skip adding implicit map clause. The changes: 1> Add code to generate implicit map in ActOnOpenMPExecutableDirective, for omp 5.0 and up. 2> Add extra default parameter NoDiagnose in ActOnOpenMPMapClause: Use that to skip error as well as skip adding implicit map during the mappable checking. Note: there are only tow places need to be check for NoDiagnose. Rest of them either the check is for < omp 5.0 or the error already generated for reduction clause. Differential Revision: https://reviews.llvm.org/D108132	2021-08-19 12:53:47 -07:00
Jon Chesterfield	ad0f6e1d98	[openmp] Disable the tests that block CI for amdgpu and host offloading.	2021-08-19 20:43:30 +01:00
Jon Chesterfield	6c75ce1b8b	[libomptarget][nfc] Move lanemask_t type into target_impl.h	2021-08-19 18:50:03 +01:00
Jon Chesterfield	77579b99e9	[openmp][nfc] Replace OMPGridValues array with struct [nfc] Replaces enum indices into an array with a struct. Named the fields to match the enum, leaves memory layout and initialization unchanged. Motivation is to later safely remove dead fields and replace redundant ones with (compile time) computation. It should also be possible to factor some common fields into a base and introduce a gfx10 amdgpu instance with less duplication than the arrays of integers require. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108339	2021-08-19 13:25:42 +01:00
Jon Chesterfield	f420939b82	[libomptarget] Apply D106710 to amdgcn devicertl	2021-08-19 01:34:33 +01:00
Jon Chesterfield	c480792b6a	[libomptarget][nfc][devicertl] Delete unused enums	2021-08-19 00:14:34 +01:00
Jon Chesterfield	21d91a8ef3	[libomptarget][devicertl] Replace lanemask with uint64 at interface Use uint64_t for lanemask on all GPU architectures at the interface with clang. Updates tests. The deviceRTL is always linked as IR so the zext and trunc introduced for wave32 architectures will fold after inlining. Simplification partly motivated by amdgpu gfx10 which will be wave32 and is awkward to express in the current arch-dependant typedef interface. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108317	2021-08-18 20:47:33 +01:00
Joseph Huber	edb8acdc6e	[Libomptarget] Correctly default to Generic if exec_mode is not present Currently, the runtime returns an error when the `exec_mode` global is not present. The expected behvaiour is that the region will default to Generic. This prevents global constructors from being called because they do not contain execution mode globals. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108255	2021-08-18 11:24:28 -04:00
George Rokos	df06ec3057	[libomptarget][NFC] Fix compilation issue with GCC Removed redundant assignment from condition which causes gcc to emit the following error: error: operation on ‘MoveData’ may be undefined [-Werror=sequence-point]	2021-08-10 09:43:43 -07:00
Joel E. Denny	2ced1f338a	[OpenMP][NFC] Simplify targetDataEnd conditions for CopyMember targetDataEnd and targetDataBegin compute CopyMember/copy differently, and I don't see why they should. This patch eliminates one of those differences by making a simplifying NFC change to targetDataEnd. The change is NFC as follows. The change only affects the case when `!UNIFIED_SHARED_MEMORY \|\| HasCloseModifier`. In that case, the following points are always true: * The value of CopyMember is relevant later only if DelEntry = false. * DelEntry = false only if one of the following is true: * IsLast = false. In this case, it's always true that CopyMember = false = IsLast. * `MEMBER_OF && !PTR_AND_OBJ` is true. In this case, CopyMember = IsLast. * Thus, if CopyMember is relevant, CopyMember = IsLast. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D105990	2021-08-10 12:29:55 -04:00
Dimitry Andric	400cd6d2f0	[libomptarget][amdgpu] use --allow-shlib-undefined to link on FreeBSD On FreeBSD, the `environ` symbol is undefined at link time for shared libraries, but resolved by the dynamic linker at runtime. Therefore, allow the symbol to be undefined when creating a shared library, by using the `--allow-shlib-undefined` linker flag, instead of `-z defs` (a.k.a `--no-undefined`). Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107698	2021-08-08 13:52:44 +02:00
Dimitry Andric	71ae2e0221	[libomptarget][amdgpu] don't declare Elf_Note on FreeBSD On FreeBSD, the system `<libelf.h>` already declares `struct Elf_Note` indirectly (via `<sys/elf_common.h>`). This results in compile errors when building the libomptarget amdgpu plugin. Avoid redeclaring `struct Elf_Note` on FreeBSD to fix the errors. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107661	2021-08-06 21:45:26 +02:00
Shilei Tian	28939b6ae5	[NFC] Clean up and clang-format openmp/libomptarget/plugins/cuda/src/rtl.cpp	2021-08-05 22:32:28 -04:00
Lechen Yu	3bc8ce5dd7	[openmp] Add OMPT initialization in libomptarget When loading libomptarget, the init function in libomptarget/src/rtl.cpp will search for the libomptarget_start_tool function using libdl. libomptarget_start_tool will pass those OMPT callbacks related to target constructs to libomptarget Differential Revision: https://reviews.llvm.org/D99803	2021-08-04 18:00:11 +02:00
Jon Chesterfield	567c8c7bfd	[libomptarget][nfc] Only set cuda-path for nvptx tests Remove --cuda-path=CUDA_TOOLKIT_ROOT_DIR-NOTFOUND from the invocation of non-nvptx test cases. Better signal to noise ratio on other architectures. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D107074	2021-07-30 23:01:09 +01:00
Jose M Monsalve Diaz	5424ceeda0	[OpenMP] Fixing llvm-omp-device-info compilation with runtimes When using `-DLLVM_ENABLED_RUNTIMES` instead of `-DLLVM_ENABLED_PROJECTS` the `llvm-omp-device-info` tool is not compiled or installed. In general, no llvm tool would be build on runtimes, because the -DLLVM_BUILD_TOOLS flag is removed by the way runtimes compilation calls cmake again. This patch is simple. Just forward the value of this flag to the runtime cmake command. I'm also removing an unnecessary comment in the compilation of the tool Differential Revision: https://reviews.llvm.org/D107177	2021-07-30 13:09:08 -05:00
Shilei Tian	36d53af4a9	[OpenMP][Offloading] Remove task wait in nowait interfaces All `nowait` series of interfaces in `libomptarget` accept four more arguments (`int32_t depNum, void depList, int32_t noAliasDepNum, void noAliasDepList`) compared with their counterparts w/o `nowait`. These extra arguments were expected for dependence resolution, potentially lowered to device side. Current implementation calls `libomp` function `__kmpc_omp_taskwait`. However, the front end simply ignores them, that these four arguments are not emitted at all. As a consequence, the `depNum` and `noAliasDepNum` are garbage, which could lead to unnecessary task wait. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107164	2021-07-30 11:39:46 -04:00
Joachim Protze	4ffa1478fd	[libomptarget][amdcgn] Add build dependency for opt This patch should fix the build we observe when building LLVM from scratch. Differential Revision: https://reviews.llvm.org/D107156	2021-07-30 15:45:13 +02:00
Jon Chesterfield	a90da62adb	[libomptarget][amdgpu] Update printed plugin name	2021-07-29 14:46:42 +01:00
Jose M Monsalve Diaz	88e66fa60a	[OpenMP] Fixing missing variables when CUDA SDK not in system This patch fixes the error reported in D106751. When there is no CUDA SDK installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE` variables. Using @zsrkmyn sugested fix Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106933	2021-07-27 23:46:15 -05:00
Jose M Monsalve Diaz	313c523995	[OpenMP][Tool] Introducing the `llvm-omp-device-info` tool This patch introduces the `llvm-omp-device-info` tool, which uses the omptarget library and interface to query the device info from all the available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo` Since omptarget usually requires a description structure with executable kernels, I split the initialization of the RTLs and Devices to be able to initialize all possible devices and query each of them. This revision relies on the patch that introduces the print device info. A limitation is that the order in which the devices are initialized, and the corresponding device ID is not necesarily the one seen by OpenMP. The changes are as follows: 1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function 2. Create an `initAllRTLs` method that initializes all available RTLs at runtime 3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information. Example Output: ``` Device (0): print_device_info not implemented Device (1): print_device_info not implemented Device (2): print_device_info not implemented Device (3): print_device_info not implemented Device (4): CUDA Driver Version: 11000 CUDA Device Number: 0 Device Name: Quadro P1000 Global Memory Size: 4236312576 bytes Number of Multiprocessors: 5 Concurrent Copy and Execution: Yes Total Constant Memory: 65536 bytes Max Shared Memory per Block: 49152 bytes Registers per Block: 65536 Warp Size: 32 Threads Maximum Threads per Block: 1024 Maximum Block Dimensions: 1024, 1024, 64 Maximum Grid Dimensions: 2147483647 x 65535 x 65535 Maximum Memory Pitch: 2147483647 bytes Texture Alignment: 512 bytes Clock Rate: 1480500 kHz Execution Timeout: Yes Integrated Device: No Can Map Host Memory: Yes Compute Mode: DEFAULT Concurrent Kernels: Yes ECC Enabled: No Memory Clock Rate: 2505000 kHz Memory Bus Width: 128 bits L2 Cache Size: 1048576 bytes Max Threads Per SMP: 2048 Async Engines: Yes (2) Unified Addressing: Yes Managed Memory: Yes Concurrent Managed Memory: Yes Preemption Supported: Yes Cooperative Launch: Yes Multi-Device Boars: No Compute Capabilities: 61 ``` Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106752	2021-07-27 22:38:35 -04:00
Jose M Monsalve Diaz	d2f85d0910	[OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget` This patch introduces a function in the device's plugin to print the device information. This patch relates to another patch that introduces a CLI tool to obtain the device information from the omplibrary directly. It is inspired by PGI's pgaccelinfo. The modifications are as follows: 1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL. 2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented 3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy` 4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106751	2021-07-27 21:47:57 -04:00
Jose M Monsalve Diaz	5ab6aedda9	[OpenMP] Folding threadLimit and numThreads when single value in kernels The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block` and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and `NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions. The code checks all the kernels, and if their attributes match, the functions are folded. In the future we will explore specializing for multiple values of NumThreads and NumTeams. Depends on D106390 Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D106033	2021-07-27 21:47:12 -04:00
Johannes Doerfert	ed7ec860f0	[OpenMP] Improve alignment handling in the new device runtime	2021-07-27 17:50:27 -05:00
Joseph Huber	e3ee76245e	[Libomptarget] Revert new variable sharing to use the old method The new method of sharing variables introduces a `__kmpc_alloc_shared` call that cannot be removed in the middle end because of its non-constant argument and unconnected free. This patch reverts this to the old method that used a static amount of shared memory for sharing variables. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106905	2021-07-27 18:14:01 -04:00
Johannes Doerfert	67ab875ff5	[OpenMP] Prototype opt-in new GPU device RTL The "old" OpenMP GPU device runtime (D14254) has served us well for many years but modernizing it has caused some pain recently. This patch introduces an alternative which is mostly written from scratch embracing OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual interfaces. This new runtime is opt-in through a clang flag (D106793). The new runtime is currently only build for nvptx and has "-new" in its name. The design is tailored towards middle-end optimizations rather than front-end code generation choices, a trend we already started in the old runtime a while back. In contrast to the old one, state is organized in a simple manner rather than a "smart" one. While this can induce costs it helps optimizations. Our expectation is that the majority of codes can be optimized and a "simple" design is therefore preferable. The new runtime does also avoid users to pay for things they do not use, especially wrt. memory. The unlikely case of nested parallelism is supported but costly to make the more likely case use less resources. The worksharing and reduction implementation have been taken from the old runtime and will be rewritten in the future if necessary. Documentation and debug features are still mostly missing and will be added over time. All external symbols start with `__kmpc` for legacy reasons but should be renamed once we switch over to a single runtime. All internal symbols are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name clashes with user symbols. Differential Revision: https://reviews.llvm.org/D106803	2021-07-27 00:56:05 -05:00
Shilei Tian	e97e0a4fad	[AbstractAttributor] Fold __kmpc_parallel_level if possible Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible. Note that `__kmpc_parallel_level` doesn't take activeness into consideration, based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead of 0, 129, 130, etc. that also indicate activeness. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106154	2021-07-26 22:46:19 -04:00
Jon Chesterfield	2a613a7790	[libomptarget] Build amdgpu plugin without hsa Default to building the amdgpu plugin to use dlopen when hsa is not found instead of disabling it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106600	2021-07-26 09:54:51 +01:00
Jon Chesterfield	93fe84d32f	[libomptarget][nfc] Squash unused variable warning Suppress only current warning on openmp-clang-x86_64-linux-debian Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106777	2021-07-26 09:54:31 +01:00
Jon Chesterfield	dd0b463dd9	[libomptarget][amdgpu] More robust handling of failure to init HSA If hsa_init fails, subsequent calls into hsa are not safe. Except for hsa_init, but we don't retry on failure. This patch: - deletes a print that called into hsa to ask why it can't call into hsa - drops a merge conflict block next to that print - reliably initializes number of devices to zero - skips the plugin destructor contents if the constructor failed to init hsa Tested by making hsa_init return error, and by forcing the dynamic library use which was then deleted from disk. Before this patch, both segv. After it, friendly message about offloading being unavailable. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106774	2021-07-25 23:15:58 +01:00
Jon Chesterfield	e3251f2ec4	Revert "[libomptarget] Build amdgpu plugin without hsa" Inaccurate error handling around hsa_init This reverts commit `e30b3b23a4`.	2021-07-25 21:03:51 +01:00
Jon Chesterfield	e30b3b23a4	[libomptarget] Build amdgpu plugin without hsa Default to building the amdgpu plugin to use dlopen when hsa is not found instead of disabling it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106600	2021-07-25 19:33:36 +01:00
Shilei Tian	f1b8fa55d0	[OpenMP][NVPTX] Disable OpenMPOpt when building deviceRTLs We build `deviceRTLs` with `-O1` by default, which also triggers OpenMPOpt. When the info cache is created, some attributes are removed. As a result, although we mark a few functions `noinline`, they are still inlined when the bitcode library is generated. This can cause an issue in middle end optimization. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106710	2021-07-25 10:38:27 -04:00
Shilei Tian	c2c43132f6	[OpenMP] Fix bug 50022 Bug 50022 [0] reports target nowait fails in certain case, which is added in this patch. The root cause of the failure is, when the second task is created, its parent's `td_incomplete_child_tasks` will not be incremented because there is no parallel region here thus its team is serialized. Therefore, when the initial thread is waiting for its unfinished children tasks, it thought there is only one, the first task, because it is hidden helper task, so it is tracked. The second task will only be pushed to the queue when the first task is finished. However, when the first task finishes, it first decrements the counter of its parent, and then release dependences. Once the counter is decremented, the thread will move on because its counter is reset, but actually, the second task has not been executed at all. As a result, since in this case, the main function finishes, then `libomp` starts to destroy. When the second task is pushed somewhere, all some of the structures might already have already been destroyed, then anything could happen. This patch simply moves `__kmp_release_deps` ahead of decrement of the counter. In this way, we can make sure that the initial thread is aware of the existence of another task(s) so it will not move on. In addition, in order to tackle dependence chain starting with hidden helper thread, when hidden helper task is encountered, we force the task to release dependences. Reference: [0] https://bugs.llvm.org/show_bug.cgi?id=50022 Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D106519	2021-07-23 16:54:11 -04:00
Joseph Huber	e1dedecaa6	[Libomptarget] Add unroll flag to shared variables loop Unrolling this loop provides better performance in practice because it is executed on the device and is likely to be very small. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106692	2021-07-23 16:45:27 -04:00
Shilei Tian	18ce3d3f2c	[OpenMP][Offloading] Fix data race in data mapping by using two locks This patch tries to partially fix one of the two data race issues reported in [1] by introducing a per-entry mutex. Additional discussion can also be found in D104418, which will also be refined to fix another data race problem. Here is how it works. Like before, `DataMapMtx` is still being used for mapping table lookup and update. In any case, we will get a table entry. If we need to make a data transfer (update the data on the device), we need to lock the entry right before releasing `DataMapMtx`, and the issue of data transfer should be after releasing `DataMapMtx`, and the entry is unlocked afterwards. This can guarantee that: 1) issue of data movement is not in critical region, which will not affect performance too much, and also will not affect other threads that don't touch the same entry; 2) if another thread accesses the same entry, the state of data movement is consistent (which requires that a thread must first get the update lock before getting data movement information). For a target that doesn't support async data transfer, issue of data movement is data transfer. This two-lock design can potentially improve concurrency compared with the design that guards data movement with `DataMapMtx` as well. For a target that supports async data movement, we could simply attach the event between the issue of data movement and unlock the entry. For a thread that wants to get the event, it must first get the lock. This can also get rid of the busy wait until the event pointer is valid. Reference: [1] https://bugs.llvm.org/show_bug.cgi?id=49940 Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104555	2021-07-23 16:10:51 -04:00
Abhinav Gaba	f7c92995c0	[OpenMP] Fix CUDA plugin build after `3817ba13ae`. The build was broken on machines that don't have Cuda SDK installed. See https://reviews.llvm.org/D106627 for the original discussion.	2021-07-23 16:50:00 +08:00
Johannes Doerfert	d12ee28e2e	[OpenMP] Simplify the ThreadStackTy for globalization fallback With D106496 we can make the globalization fallback stack much simpler and this version doesn't seem to experience the spurious failures and deadlocks we have seen before. Differential Revision: https://reviews.llvm.org/D106576	2021-07-22 23:57:46 -05:00
Joseph Huber	76c0c0ca86	[OpenMP][NFC] Fix formatting in CUDA plugin	2021-07-22 21:50:40 -04:00
Joseph Huber	3817ba13ae	[OpenMP] Add environment variables to change stack / heap size in the CUDA plugin This patch adds support for two environment variables to configure the device. ``LIBOMPTARGET_STACK_SIZE`` sets the amount of memory in bytes that each thread has for its stack. ``LIBOMPTARGET_HEAP_SIZE`` sets the amount of heap memory that can be allocated using malloc / free on the device. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106627	2021-07-22 21:40:02 -04:00

1 2 3 4 5 ...

862 Commits