llvm-project

Commit Graph

Author	SHA1	Message	Date
Jon Chesterfield	22bd75be70	[openmp] Fix a git misfire in `cf37a94c1e`	2021-10-28 01:35:25 +01:00
Jon Chesterfield	6c7b203d1d	Revert "[libomptarget] Build DeviceRTL for amdgpu" - more tests failing on CI than failed locally when writing this patch This reverts commit `33427fdb7b`.	2021-10-28 01:01:53 +01:00
Jon Chesterfield	cf37a94c1e	[openmp] Add amdgpu impl missed from D112153	2021-10-28 00:55:53 +01:00
Jon Chesterfield	33427fdb7b	[libomptarget] Build DeviceRTL for amdgpu Passes same tests as the current deviceRTL. Includes cmake change from D111987. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227	2021-10-28 00:41:45 +01:00
Johannes Doerfert	48877525cf	[OpenMP] Remove obsolete external interface for device RT We do not generate _serialized_parallel calls in device mode, no need for an external API. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D112145	2021-10-27 18:22:35 -05:00
Johannes Doerfert	5102c3c61e	[OpenMP][FIX] Do not adjust the level after the environment was popped Exiting a data environment will reset all values, it is wrong to adjust them afterwards. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112144	2021-10-27 18:22:33 -05:00
Johannes Doerfert	b16aadf0a7	[OpenMP] Introduce aligned synchronization into the new device RT We will later use the fact that a barrier is aligned to reason about thread divergence. For now we introduce the assumption and some more documentation. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112153	2021-10-27 18:22:31 -05:00
Johannes Doerfert	ef922c692f	[OpenMP][FIX] Query proper thread ID information to support nesting The OpenMP thread ID is not the hardware thread ID if we have nesting. We need to ask the runtime properly to ensure correct results. Note that the loop interface is going to change soon so we do not adjust it now but simply ignore the extra argument. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111950	2021-10-27 18:18:44 -05:00
Johannes Doerfert	4c88341d17	[OpenMP][FIX] Do check the level before return team size The team size could/should be an ICV but since we know it is either 1 or a value we can leave it in the team state for now. However, we still need to determine if the current level is nested before we use it. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D111949	2021-10-27 18:18:42 -05:00
Johannes Doerfert	dc72960967	[OpenMP][FIX] Do not dereference a potential nullptr The first thread state in the new GPU runtime doesn't have a previous one and we should not dereference the nullptr placeholder. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111946	2021-10-27 18:18:39 -05:00
Jon Chesterfield	e42e5785ad	[libomptarget][nfc]Generalise DeviceRTL cmake to allow building for amdgpu Essentially moves the foreach over sm integers into a macro and instantiates it for nvptx. NFC in that the macro is not presently instantiated for amdgpu as the corresponding code doesn't compile yet. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111987	2021-10-26 21:18:21 +01:00
Ron Lieberman	be03ef3ed1	[openmp][lit] Add support to OpenMP lit.cfg for ROCR_VISIBLE_DEVICES env-var add support for ROCR_VISIBLE_DEVICES similar to name and purpose as CUDA_VISIBLE_DEVICES Differential Revision: https://reviews.llvm.org/D112503	2021-10-26 13:46:42 +00:00
Georgios Rokos	2feafa2e46	[libomptarget][NFC] Add comment explaining why we pass argument bases and offsets as two separate entities to the plugins.	2021-10-25 14:51:14 -07:00
Shilei Tian	2a30c03c62	[OpenMP][Offloading] Only get trip count if team construct Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D112475	2021-10-25 17:16:14 -04:00
Kazu Hirata	d8e4170b0a	Ensure newlines at the end of files (NFC)	2021-10-23 08:45:29 -07:00
Jon Chesterfield	bf6f955f39	[libomptarget] Run GPU offloading tests on both new and old runtime Implemented by patching python config instead of modifying all the tests so that -generic and XFAIL work as usual. Expectation is for this to be reverted once the old runtime is deleted. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D112225	2021-10-22 23:28:44 +01:00
Jon Chesterfield	a602c2b51d	[libomptarget][DeviceRTL] Generalise and simplify cmakelists Step towards building the DeviceRTL for amdgpu. Mostly replaces cuda-specific toolchain finding logic with the generic logic currently found in the amdgpu deviceRTL cmake. Also deletes dead code and changes the default to build on systems without cuda installed, as the library doesn't use cuda and the amdgpu-only systems generally won't have cuda installed. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111983	2021-10-21 16:14:29 +01:00
Joseph Huber	b1ce454930	[OpenMP] Remove macro guards for device debugging The plugin currently uses a macro to check if this is a debug built before assigning the debug kind variable to the device environment struct. This is being deprecated because the new device runtime does not maintain separate debug builds and should always be availible. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112083	2021-10-19 12:21:43 -04:00
Jon Chesterfield	7272982e1d	[libomptarget] Refactor DeviceRTL prior to AMDGPU bringup Subset of D111993. Fix typos, rename read to load. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111999	2021-10-19 08:05:06 +01:00
Jon Chesterfield	251b1e7c25	[libomptarget] Pass OMP_TARGET_OFFLOAD env variable through to tests Useful for OMP_TARGET_OFFLOAD=MANDATORY when testing Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111995	2021-10-18 16:03:03 +01:00
Shilei Tian	2c941fa2f9	[OpenMP][deviceRTLs] Fix wrong return value of `__kmpc_is_spmd_exec_mode` D110279 introduced a bug to the device runtime. In `__kmpc_parallel_51`, we detect whether we are already in parallel region by `__kmpc_parallel_level() > __kmpc_is_spmd_exec_mode()`. It is based on the assumption that: - In SPMD mode, parallel level is initialized to 1. - In generic mode, parallel level is initialized to 0. - `__kmpc_is_spmd_exec_mode` returns `1` for SPMD mode, 0 otherwise. Because the return value type of `__kmpc_is_spmd_exec_mode` is `int8_t`, there was an implicit cast from `bool` to `int8_t`. We can make sure it is either 0 or 1 since C++14. In D110279, the return value is the result of an `and` operation, which is 2 in SPMD mode. This breaks the assumption in `__kmpc_parallel_51`. Reviewed By: carlo.bertolli, dpalermo Differential Revision: https://reviews.llvm.org/D111905	2021-10-16 12:58:29 -04:00
Ron Lieberman	d022f39d9f	[libomptarget][amdgpu][NFC] tweak a comment	2021-10-09 12:51:53 -04:00
Joseph Huber	bad44d5f39	[OpenMP] Add RTL function for getting number of threads in block. This patch adds support for the `__kmpc_get_hardware_num_threads_in_block` function that returns the number of threads. This was missing in the new runtime and was used by the AMDGPU plugin which prevented it from using the new runtime. This patchs also unified the interface for getting the thread numbers in the frontend. Originally authored by jdoerfert. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D111475	2021-10-08 22:21:59 -04:00
Joseph Huber	85ad566335	[OpenMP] Avoid calling `isSPMDMode` during RT initialization Until we hit the first barrier we should not call `mapping::isSPMDMode` with all threads. Instead, we now have (and use during initialization) a `mapping::isMainThreadInGenericMode` overload that takes the known SPMD-mode state and one that queries it. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111381	2021-10-08 22:00:41 -04:00
Joseph Huber	208f900527	[Libomptarget] Add an external interface to dynamic shared memory This patch adds an external interface to access the dynamic shared memory buffer in the device runtime. The function introduced is ``llvm_omp_get_dynamic_shared``. This includes a host-side definition that only returns a null pointer so that it can be used when host-fallback is enabled without crashing. Support for dynamic shared memory was also ported to the old device runtime. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110957	2021-10-08 15:36:57 -04:00
Shilei Tian	c060c634ef	[OpenMP][NVPTX] Fix an error in configuring #teams and #threads It must be a copy mistake. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D111407	2021-10-08 11:07:43 -04:00
Shilei Tian	af4599b8ab	[OpenMP][DeviceRTL] Add the support for printf in a freestanding way For NVPTX, `printf` can be used just with a function declaration. For AMDGCN, an function definition is added, but it simply returns. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109728	2021-10-07 22:15:37 -04:00
Johannes Doerfert	44710940af	[OpenMP][FIX] Data race in the SPMD execution of the new runtime We need to synchronize the threads before we destroy the RAII objects that hold the old values and not after to avoid threads executing the parallel region but seeing an inconsistent state. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111369	2021-10-07 21:01:24 -04:00
Jon Chesterfield	1bc3a6e41b	[libomptarget] Reapply `2bc4d48a78` which was accidentally reverted	2021-10-07 20:17:48 +01:00
Jon Chesterfield	0c554a4769	[libomptarget] Move device environment to shared header, remove divergence Follow on to D110006, related to D110957 Where implementations have diverged this resolves to match the new DeviceRTL - replaces definitions of this struct in deviceRTL and plugins with include - changes the dynamic_shared_size field from D110006 to 32 bits - handles stdint being unavailable in DeviceRTL - adds a zero initializer for the field to amdgpu - moves the extern declaration for deviceRTL to target_interface (omptarget.h is more natural, but doesn't work due to include order with debug.h) - Renames the fields everywhere to match the LLVM format used in DeviceRTL - Makes debug_level uint32_t everywhere (previously sometimes int32_t) Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D111069	2021-10-07 12:03:48 +01:00
Michał Górny	0873b9bef4	[openmp] [elf_common] Fix linking against LLVM dylib The hand-rolled linking logic in elf_common does not account for the possibility of using LLVM dylib rather than a dozen static libraries. Since it does not seem to be easily convertible to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB. This is necessary to support stand-alone builds against installed LLVM. Differential Revision: https://reviews.llvm.org/D111038	2021-10-04 09:29:06 +02:00
Jon Chesterfield	05ba9ff6a6	[libomptarget][amdgpu] Refactor memory pool collection	2021-10-01 14:58:01 +01:00
Jon Chesterfield	3247329107	[openmp] Add addrspacecast to getOrCreateIdent Fixes 51982. Adds a missing CreatePointerCast and allocates a global in the correct address space. Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\ blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting parts while checking the assertion failure still occurred. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110556	2021-09-30 21:36:31 +01:00
Jon Chesterfield	b75a7481ba	[libomptarget] Apply D110029 to amdgpu Use enum for execution mode. This is partly a port from ROCm and partly a port from D110029. Attempted to make the same choices as ROCm as far as comments etc go to reduce the merge conflicts. There is some cleanup warranted here - in particular I like the cuda patch factoring out the comparisons into named variables - but I'd like to leave that for a follow up patch, keeping this one minimal. Reviewed By: carlo.bertolli Differential Revision: https://reviews.llvm.org/D110845	2021-09-30 21:29:37 +01:00
Dhruva Chakrabarti	6226270253	[libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed. Fixes: SWDEV-275232 (With contributions from Ammar Elwazir, Laurent Morichetti, and Tony Tye) The current code is racy. After the packet is submitted, the GPU will increment the read index. If this wraps around before the memory is read from it'll refer to a signal from an unrelated packet. Change avoids reading from the packet post-submission. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110679	2021-09-29 09:22:07 -07:00
Jon Chesterfield	2bc4d48a78	[libomptarget][amdgpu] Follow on to D110513, empty kernarg pools are not fatal	2021-09-27 22:44:35 +01:00
Jon Chesterfield	738734f655	[libomptarget][amdgpu] Report zero devices if plugin construction fails, instead of segv	2021-09-27 22:13:12 +01:00
Jon Chesterfield	80fa43fe9a	Revert "[openmp] Add addrspacecast to getOrCreateIdent" This reverts commit `1a761e5b7b`. Failed CI, albeit with a different failure mode to BZ51982	2021-09-27 19:27:35 +01:00
Jon Chesterfield	1a761e5b7b	[openmp] Add addrspacecast to getOrCreateIdent Fixes 51982. Minor refactor to remove `return x = y` construct. Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\ blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting parts while checking the assertion failure still occurred. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110556	2021-09-27 19:23:12 +01:00
Joseph Huber	74d622dea4	[OpenMP] Add new worksharing definitions into device RTL This path defines the newly added `__kmpc_disitrute_static_init` functions in the device runtime library. These functions are currently exact copies of the current worksharing method but can be tuned later. Depends on D110429 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110430	2021-09-27 11:36:41 -04:00
Pushpinder Singh	b1695c2eb8	[AMDGPU][OpenMP] Add memory pool size check to isValidMemoryPool Keeping all the checks in one place for future simplification. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110513	2021-09-27 12:29:00 +00:00
Michael Kruse	1b242dccff	[OpenMP][CMake] Use in-project clang as CUDA->IR compiler for new DeviceRTL. Use the in-project clang, llvm-link and opt if available and unless CMake cache variables specify to use a different compiler. This applies D101265 to the new DeviceRTL's CMakeLists.txt which was copied before D101265 was applied. Fixes the openmp-offloading-cuda-runtime builder which was failing since D110006. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110251	2021-09-27 07:14:19 -05:00
Pushpinder Singh	9d0eb440ff	[libomptarget][nfc][amdgpu] Reorder function to clarify review diff	2021-09-27 09:30:55 +00:00
Jon Chesterfield	726a34f063	[libomptarget][amdgpu] Replace dead exit call with returning error	2021-09-27 09:43:37 +01:00
Jon Chesterfield	8cf93a35d4	[libomptarget][amdgpu] Destruct HSA queues Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D109511	2021-09-26 15:34:21 +01:00
Joseph Huber	d83ca624a1	[OpenMP] Fix data-race in new device RTL This patch fixes a data-race observed when using the new device runtime library. The Internal control variable for the parallel level is read in the `__kmpc_parallel_51` function while it could potentially be written by other threads. This causes data corruption and will cause nondetermistic behaviour in the runtime. This patch fixes this by adding an explicit synchronization before the region starts. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110366	2021-09-23 17:28:07 -04:00
Shilei Tian	423d34f74a	[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit` This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110279	2021-09-22 17:16:41 -04:00
Joseph Huber	60a40cf379	[OpenMP] Fix KeepAlive usage Summary: Functions were called the wrong way around, this didn't keep the symbol alive.	2021-09-22 14:38:19 -04:00
Joseph Huber	277b681ede	[OpenMP] Add function tracing debugging to device RTL This patch adds support for an RAII struct that will print function traces when placed inside of a function declaration. Each successive call will increase the indentation to make it easier to visually inspect. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110202	2021-09-22 12:25:29 -04:00
Shilei Tian	ca999f7191	[OpenMP][Offloading] Use bitset to indicate execution mode instead of value The execution mode of a kernel is stored in a global variable, whose value means: - 0 - SPMD mode - 1 - indicates generic mode - 2 - SPMD mode execution with generic mode semantics We are going to add support for SIMD execution mode. It will be come with another execution mode, such as SIMD-generic mode. As a result, this value-based indicator is not flexible. This patch changes to bitset based solution to encode execution mode. Each position is: [0] - generic mode [1] - SPMD mode [2] - SIMD mode (will be added later) In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode execution with generic mode semantics. In the future after we add the support for SIMD mode, `0b1xx` will be in SIMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110029	2021-09-22 11:40:52 -04:00
Joseph Huber	1cf86df883	[OpenMP] Make sure the Thread ID function is not removed Summary: The thread ID function was reintroduced in D110195, but could potentially be removed by the optimizer. Make the function noinline to preserve the call sites and add it to the externalization RAII so its definition is not removed by the attributor.	2021-09-22 10:13:18 -04:00
Joseph Huber	e95731cca7	[OpenMP] Add thread ID function into new RTL The new device runtime library currently lacks the `kmpc_get_hardware_thread_id_in_block` function which is currently used when doing the SPMDzation optimization. This call would be introduced through the optimization and then cause a linking error because it was not present. This patch adds support for this runtime call. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110195	2021-09-21 17:43:50 -04:00
Giorgis Georgakoudis	ac90dfc43a	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `1d66649adf`. Revert to fix AMG GPU issue.	2021-09-21 13:20:39 -07:00
Giorgis Georgakoudis	1d66649adf	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6 Differential Revision: https://reviews.llvm.org/D102107	2021-09-21 10:50:04 -07:00
Shilei Tian	49e976c934	[OpenMP][NVPTX] Fix a warning that data argument not used by format string Reviewed By: jhuber6, grokos Differential Revision: https://reviews.llvm.org/D110104	2021-09-20 17:22:14 -04:00
Joseph Huber	f1c821fa85	[OpenMP] Add support for dynamic shared memory in new RTL This patch adds support for using dynamic shared memory in the new device runtime. The new function `__kmpc_get_dynamic_shared` will return a pointer to the buffer of dynamic shared memory. Currently the amount of memory allocated is set by an environment variable. In the future this amount will be added to the amount used for the smart stack which will be configured in a similar way. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110006	2021-09-17 21:25:36 -04:00
Joseph Huber	ec02c34b6d	[OpenMP] Add additional fields to device environment This patch adds fields for the device number and number of devices into the device environment struct and debugging values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110004	2021-09-17 21:25:32 -04:00
Joseph Huber	b266bcb135	[OpenMP] Implement __assert_fail in the new device runtime This patch implements the `__assert_fail` function in the new device runtime. This allows users and developers to use the standars assert function inside of the device. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D109886	2021-09-17 21:25:28 -04:00
Shilei Tian	81a1a91c62	[NFC] clang-format -i /openmp/libomptarget/deviceRTLs/interface.h	2021-09-17 12:55:02 -04:00
Hansang Bae	ae2a5facce	[OpenMP][libomptarget] Minor fix in x86_64 plugin Call to remove() was passing invalid address for the file name. Differential Revision: https://reviews.llvm.org/D109846	2021-09-15 15:57:06 -05:00
Ye Luo	2187cbf56f	[OpenMP][libomptarget] Add __tgt_target_return_t enum for __tgt_target_XXX return int The defintion of OFFLOAD_SUCCESS and OFFLOAD_FAIL used in plugin APIs and libomptarget public APIs are not consistent. Create __tgt_target_return_t for libomptarget public APIs. Differential Revision: https://reviews.llvm.org/D109304	2021-09-10 16:11:08 -05:00
Jon Chesterfield	6760234e8d	[libomptarget][amdgpu] Precisely manage hsa lifetime The hsa library must be initialized before any calls into it and destructed after the last call into it. There have been a number of bugs in this area related to member variables which would like to use raii to manage resources acquired from hsa. This patch moves the init/shutdown of hsa into a class, such that when used as the first member variable (could be a base), the lifetime of other member variables are reliably scoped within it. This will allow other classes to use raii reliably when used as member variables within the global. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D109512	2021-09-09 17:28:11 +01:00
Jon Chesterfield	2a581710c1	[openmp] No longer use LIBRARY_PATH to find devicertl Given D109057, change test runner to use the libomptarget-x-bc-path argument instead of the LIBRARY_PATH environment variable to find the device library. Also drop the use of LIBRARY_PATH environment variable as it is far too easy to pull in the device library from an unrelated toolchain by accident with the current setup. No loss in flexibility to developers as the clang commandline used here is still available. Reviewed By: jdoerfert, tianshilei1992 Differential Revision: https://reviews.llvm.org/D109061	2021-09-09 17:16:41 +01:00
Jon Chesterfield	d642156f8f	[libomptarget][nfc] Hoist hsa_init into rtl.cpp	2021-09-09 16:09:34 +01:00
Ye Luo	2cfe1a09d1	[OpenMP][libomptarget][NFC] Change checkDeviceAndCtors return type to bool. What is exactly needed is only a boolean. Pulling OFFLOAD_SUCCESS/FAIL only adds confusion. Differential Revision: https://reviews.llvm.org/D109303	2021-09-07 13:59:27 -05:00
Ye Luo	c3aecf87d5	[OpenMP][libomptarget] Change device vector elements to unique_ptr type Using std::vector<DeviceTy> requires implementing copy constructor and copied assign operator for DeviceTy. Indeed DeviceTy should never be copied. After changing to std::vector<std::unique_ptr<DeviceTy>>, All the unsafe copy constructor and copy assign operator implementations can be removed. Compilers mark them deleted due to mutex or underlying objects and this is the desired behavior. Differential Revision: https://reviews.llvm.org/D109276	2021-09-06 22:28:49 -05:00
Ye Luo	8e5c1b039e	[OpenMP][libomptarget] Change synchronize_ty return type to int32_t Plugins always return int32_t. Stay consistent with other functions which return error status. Differential Revision: https://reviews.llvm.org/D109341	2021-09-06 21:38:54 -05:00
Ron Lieberman	fdac5adee6	[openmp] NFC add bitcode comment	2021-09-02 18:21:39 -05:00
Jon Chesterfield	201e466eba	[libomptarget][amdgpu] Add gfx90a to build list	2021-09-02 18:11:02 +01:00
Jon Chesterfield	3153bdd547	[libomptarget][amdgpu] Drop env variables Use the same debug print as the rest of libomptarget plugins with the same environment control. Also drop the max queue size debugging hook as I don't believe it is still in use, can bring it back near the rest of the env handling in rtl.cpp if someone objects. That makes most of rt.h and all of utils.cpp unused. Clean that up and simplify control flow in a couple of places. Behaviour change is that debug prints that used to use the old environment variable now use the new one and print in slightly different format, and the removal of the max queue size variable. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D108784	2021-09-02 11:02:39 +01:00
Ye Luo	289a1089cd	[libomptarget] Move HostDataToTargetTy states into StatesTy Use unique_ptr to achieve the effect of mutable. Remove mutable keyword of DynRefCount and HoldRefCount Remove std::shared_ptr from UpdateMtx Reviewed By: tianshilei1992, grokos Differential Revision: https://reviews.llvm.org/D109007	2021-09-01 23:36:05 -05:00
Fangrui Song	4d5220faf9	[OpenMP] Fix -Wunused-but-set-parameter in -DLLVM_ENABLE_ASSERTIONS=off builds. NFC	2021-09-01 17:55:13 -07:00
Joel E. Denny	1f9e437065	[OpenMP][AMDGPU] Remove unneeded XFAILs	2021-09-01 18:00:25 -04:00
Joel E. Denny	786a140650	[OpenMP] Use IsHostPtr where needed in rest of omptarget.cpp As started in D107925, this patch replaces the remaining occurrences of `UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin` in `omptarget.cpp` with `IsHostPtr`. The former condition is broken in the rare case that the device and host happen to use the same address for their mapped allocations. I don't know how to write a test that's likely to reveal this case. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107928	2021-09-01 17:31:42 -04:00
Joel E. Denny	d11bab0b73	[OpenMP] Use IsHostPtr where needed for targetDataBegin As discussed in D105990, without this patch, `targetDataBegin` determines whether to transfer data (as opposed to assuming it's in shared memory) using the condition `!UseUSM \|\| HasCloseModifier`. However, this condition is broken if use of discrete memory was forced by `omp_target_associate_ptr`. This patch extends `unified_shared_memory/associate_ptr.c` to reveal this case, and it fixes it using `!IsHostPtr` in `DeviceTy::getTargetPointer` to replace this condition. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107927	2021-09-01 17:31:42 -04:00
Joel E. Denny	fa6c275505	[OpenMP][NFC] Eliminate CopyMember from targetDataEnd This patch is based on comments in D105990. It is NFC according to the following observations: 1. `CopyMember` is computed as `!IsHostPtr && IsLast`. 2. `DelEntry` is true only if `IsLast` is true. We apply those observations in order: ``` if ((DelEntry \|\| Always \|\| CopyMember) && !IsHostPtr) if ((DelEntry \|\| Always \|\| IsLast) && !IsHostPtr) if ((Always \|\| IsLast) && !IsHostPtr) ``` Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107926	2021-09-01 17:31:42 -04:00
Joel E. Denny	8e4836b2a2	[OpenMP] Use IsHostPtr where needed for targetDataEnd As discussed in D105990, without this patch, `targetDataEnd` determines whether to transfer data or delete a device mapping (as opposed to assuming it's in shared memory) using two different conditions, each of which is broken for some cases: 1. `!(UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin)`: The broken case is rare: the device and host might happen to use the same address for their mapped allocations. I don't know how to write a test that's likely to reveal this case, but this patch does fix it, as discussed below. 2. `!UNIFIED_SHARED_MEMORY \|\| HasCloseModifier`: There are at least two broken cases: 1. The `close` modifier might have been specified on an `omp target enter data` but not the corresponding `omp target exit data`, which thus might falsely assume a mapping is in shared memory. The test `unified_shared_memory/close_enter_exit.c` already has a missing deletion as a result, and this patch adds a check for that. This patch also adds the new test `close_member.c` to reveal a missing transfer and deletion. 2. Use of discrete memory might have been forced by `omp_target_associate_ptr`, as in the test `unified_shared_memory/api.c`. In the current `targetDataEnd` implementation, this condition turns out not be used for this case: because the reference count is infinite, a transfer is possible only with an `always` modifier, and this condition is never used in that case. To ensure it's never used for that case in the future, this patch adds the test `unified_shared_memory/associate_ptr.c`. Fortunately, `DeviceTy::getTgtPtrBegin` already has a solution: it reports whether the allocation was found in shared memory via the variable `IsHostPtr`. After this patch, `HasCloseModifier` is no longer used in `targetDataEnd`, and I wonder if the `close` modifier is ever useful on an `omp target data end`. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107925	2021-09-01 17:31:42 -04:00
Jon Chesterfield	cef1199686	Revert "[openmp] No longer use LIBRARY_PATH to find devicertl" This reverts commit `7a228f872f`. Failing test case under CI	2021-09-01 20:44:12 +01:00
Jon Chesterfield	7a228f872f	[openmp] No longer use LIBRARY_PATH to find devicertl Given D109057, change test runner to use the libomptarget-x-bc-path argument instead of the LIBRARY_PATH environment variable to find the device library. Also drop the use of LIBRARY_PATH environment variable as it is far too easy to pull in the device library from an unrelated toolchain by accident with the current setup. No loss in flexibility to developers as the clang commandline used here is still available. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109061	2021-09-01 20:24:34 +01:00
Jon Chesterfield	718e5a9883	[libomptarget] Set runpath on libomptarget, use that to drop LD_LIBRARY_PATH from test runner Using rpath instead of LD_LIBRARY_PATH to find libomp.so and libomptarget.so lets one rerun the already built test executables without setting environment variables and removes the risk of the test runner picking up different libraries to the developer debugging the failure. rpath usually means runpath, which is not transitive, so set runpath on libomptarget itself so that it can find the plugins located next to it, spelled $ORIGIN. This provides sufficient functionality to drop D102043 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D109071	2021-09-01 18:47:56 +01:00
Jon Chesterfield	f8bcbb82a7	[libomptarget] Normalise a cmake debug string, checking it triggers CI	2021-09-01 14:24:28 +01:00
Joel E. Denny	1688b4cf8e	[OpenMP][AMDGPU] XFAIL test where kernels call printf	2021-08-31 22:11:28 -04:00
Joel E. Denny	ec1ebcd302	[OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in runtime (2/2) This patch implements OpenMP runtime support for an original OpenMP extension we have developed to support OpenACC: the `ompx_hold` map type modifier. The previous patch in this series, D106509, implements Clang support and documents the new functionality in detail. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D106510	2021-08-31 16:13:49 -04:00
Joachim Protze	5ea1c37118	[libomptarget][amdcgn] Only add opt/llvm-link dependency if TARGET is available In some build configurations, the target we depend on is not available for declaring the build dependency. We only need to declare the build dependency, if the build target is available in the same build. Fixes the issue raised in https://reviews.llvm.org/D107156#2969862 This patch should go into release/13 together with D108404 Differential Revision: https://reviews.llvm.org/D108868	2021-08-30 17:32:11 +02:00
Shilei Tian	e8fdacfd81	[OpenMP][NVPTX] Fixed missing variables for CUDA free compilation in NVPTX plugin `CU_EVENT_DEFAULT` is defined in CUDA header. It should be added to `openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` for CUDA free build. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108878	2021-08-28 18:08:10 -04:00
Shilei Tian	29df4ab3f3	[OpenMP][Offloading] Add support for event related interfaces This patch adds the support form event related interfaces, which will be used later to fix data race. See D104418 for more details. Reviewed By: jdoerfert, ye-luo Differential Revision: https://reviews.llvm.org/D108528	2021-08-28 16:24:14 -04:00
George Rokos	a2bd44089e	[libomptarget][NFC] Fixed tests which checked for obsolete string "getOrAllocTgtPtr"	2021-08-28 07:35:42 -07:00
Jon Chesterfield	78f92c3810	[openmp][amdgpu] Initial gfx10 offloading implementation Lets wavefront size be 32 for amdgpu openmp, as well as 64. Fixes up as little as possible to pass that through the libraries. This change is end to end, as opposed to updating clang/devicertl/plugin separately. It can be broken up for review/commit if preferred. Posting as-is so that others with a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are probably bugs remaining as well as the todo: for letting grid values vary more. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108708	2021-08-27 12:34:03 +01:00
George Rokos	3819aae6dd	[libomptarget][NFC] Replaced obsolete name "getOrAllocTgtPtr" with new "getTargetPointer" in debug messages.	2021-08-26 18:01:18 -07:00
Jon Chesterfield	3d85342982	[libomptarget][amdgpu][nfc] Rename variables, delete dead code	2021-08-26 19:58:38 +01:00
Jon Chesterfield	68ab93f4d7	[libomptarget][amdgpu][nfc] Rename source files	2021-08-26 18:29:44 +01:00
Jon Chesterfield	a5f4074d85	[libomptarget][amdgpu] Macro for accessing GPU variables from plugin Lets the amdgpu plugin write to omptarget_device_environment to enable debugging. Intend to use in the near future to record the wavesize that a given deviceRTL was compiled with for running on hardware that supports 32 or 64. Patch sets all the attributes that are useful. Notably .data means the variable is set by writing to host memory before copying to the GPU instead of launching a kernel to update the image. Can simplify the plugin slightly to drop the code for patching after load if this is used consistently. NFC on nvptx, cuda plugin seems to work fine without any annotations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108698	2021-08-26 17:28:18 +01:00
Jon Chesterfield	ba0af885e7	[libomptarget][amdgpu][nfc] Make grid value access match devicertl	2021-08-25 15:11:19 +01:00
Jon Chesterfield	9b2c6c07b5	[libomptarget][amdgpu] Refactor debug printing Move most debug printing in rtl.cpp behind DP() macro Adjust the print output for gpu arch mismatch when the architectures match Convert an assert into graceful failure Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108562	2021-08-25 14:57:51 +01:00
Jon Chesterfield	ba8547775b	[libomptarget][amdgpu] Fix debug build from D104696	2021-08-25 01:27:51 +01:00
Michael Kruse	1275ee3041	[OpenMP][amdgcn] Don't use in-tree clang if not available. The use of `$<TARGET_FILE:clang>` was adapted too broadly from D101265. Fixes llvm.org/PR51579 Also see discussion in D108534. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D108640	2021-08-24 12:50:49 -05:00
Pushpinder Singh	9b8b7c1180	[AMDGPU][Libomptarget] Delete g_atl_machine global With uses of g_atl_machine gone, a significant portion of dead code has been removed. This patch depends on D104691 and D104695. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104696	2021-08-24 07:59:40 +00:00
Jon Chesterfield	d26000e4cc	[openmp][devicertl] Freestanding nvptx via stub printf Compiled nvptx devicertl as freestanding, breaking the dependency on host glibc and gcc-multilibs. Thus build it by default. Comes at the cost of #defining out printf. Tried mapping it onto __builtin_printf but that gets transformed back to printf instead of hitting the cuda/openmp lowering transform. Printf could be preserved by one of: - dropping all the standard headers and ffreestanding - providing a header only printf implementation - changing the compiler handling of printf Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D108349	2021-08-23 23:07:47 +01:00
Jon Chesterfield	842f875c8b	[openmp] Use llvm GridValues from devicertl Add include path to the cmakefiles and set the target_impl enums from the llvm constants instead of copying the values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108391	2021-08-23 20:25:24 +01:00
Joachim Protze	4bb36df144	[libomptarget][amdcgn] Add build dependency for llvm-link and opt D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime (LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same cmake instance. We could limit the dependency to cases where libomptarget/plugins are really built. But compared to the whole llvm project, building openmp runtime is negligible and postponing the build of OpenMP runtime after the dependencies are ready seems reasonable. The direct dependency introduced in D107156 and D107320 is necessary for the case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp). Differential Revision: https://reviews.llvm.org/D108404	2021-08-20 01:57:58 +02:00

1 2 3 4 5 ...

802 Commits