llvm-project

Commit Graph

Author	SHA1	Message	Date
Jon Chesterfield	f52927c122	Revert "[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned" This seems to be the root cause of hangs on amdgpu. Reverting while investigating. This reverts commit `7b9844cc8d`.	2022-02-01 14:56:59 +00:00
Jon Chesterfield	8b7e99c41d	[openmp] Disable tests that presently hang on CI	2022-02-01 13:01:35 +00:00
Johannes Doerfert	7b9844cc8d	[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned Due to num_threads (probably also other reasons) we cannot assume explicit barriers are always executed by all threads in an aligned fashion. We can optimize them if that property can be proven but that is different.	2022-02-01 01:10:52 -06:00
Joseph Huber	4d4587d5b0	[OpenMP] Remove new driver tests for AMDGPU Some of the new driver tests are flaky on AMDGPU, remove for now.	2022-01-31 23:32:33 -05:00
Joseph Huber	0ac799b5c9	[Libomptarget] Run GPU offloading tests using the new drvier This patch adds a new target to the tests to run using the new driver as the method for generating offloading code. Depends on D116541 Differential Revision: https://reviews.llvm.org/D118637	2022-01-31 23:11:43 -05:00
Joseph Huber	ad0a306a38	[OpenMP][NFC] Change error message on offloading failure to mention documentation This patch changes the error message to instead mention the documentation page for the debugging options provided by libomptarget and the bitcode runtimes. Add some extra information to the documentation to help users more quickly identify debugging resources. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118626	2022-01-31 15:19:52 -05:00
Joseph Huber	fd5853dae6	[Libomptarget] Reduce shared memory stack size to 512 and a message when it is exceeded Reduces the shared memory size used for globalization to 512 bytes from 2048 to reduce the pressure on shared memory. This patch ado adds a debug mesage to indicate when the shared memory was insufficient. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118625	2022-01-31 15:19:48 -05:00
Jon Chesterfield	9b9d08111b	Set rpath on openmp executables Openmp executables need to find libomp and libomptarget at runtime. This currently requires LD_LIBRARY_PATH or the user to specify rpath. Change that to set the expected location of the openmp libraries in the install tree. Whether rpath means rpath or runpath is system dependent. The attached test shows that the Wl,--disable-new-dtags control interacts correctly with this feature. The implicit rpath field is appended to any user specified ones which is ideal. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D118493	2022-01-31 16:35:00 +00:00
Jon Chesterfield	a841a3a579	Revert "Set rpath on openmp executables" Failed some buildbots, bad assumptions about structure of install path This reverts commit `a80d5c34e4`.	2022-01-31 16:18:03 +00:00
Jon Chesterfield	a80d5c34e4	Set rpath on openmp executables Openmp executables need to find libomp and libomptarget at runtime. This currently requires LD_LIBRARY_PATH or the user to specify rpath. Change that to set the expected location of the openmp libraries in the install tree. Whether rpath means rpath or runpath is system dependent. The attached test shows that the Wl,--disable-new-dtags control interacts correctly with this feature. The implicit rpath field is appended to any user specified ones which is ideal. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D118493	2022-01-31 16:01:08 +00:00
Ye Luo	bafb6f3e9c	[OpenMP] disable build of old nvptx device runtime Fully respect LIBOMPTARGET_BUILD_NVPTX_BCLIB. There is no CUDA toolchain dependency. Complement D118268. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118522	2022-01-28 21:25:48 -06:00
Ron Lieberman	619f44b0ed	Revert "[OpenMP] Ensure broken assumptions print once, not thousands of times." This reverts commit `27c799ecc9`.	2022-01-28 01:41:10 +00:00
Joseph Huber	27c799ecc9	[OpenMP] Ensure broken assumptions print once, not thousands of times. If we have a broken assumption we want to print a message to the user. If the assumption is broken by many threads in many teams this can become a problem. To avoid it we use a hash that tracks if a broken assumption has (likely) been printed and avoid printing it again. This is not fool proof and has some caveats that might cause problems in the future (see comment) but it should improve the situation considerably for now. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D112156	2022-01-27 18:43:45 -05:00
Johannes Doerfert	1e12156896	[OpenMP][NFCI] Pipe the IdentTy object through more new RT functions IdentTy objects are useful for debugging and profiling so we want to keep them around in more places, especially those that have a large impact on performance, e.g., everything related to state. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112494	2022-01-27 15:36:55 -05:00
Sri Hari Krishna Narayanan	f44e41af41	Runtime for Interop directive This implements the runtime portion of the interop directive. It expects the frontend and IRBuilder portions to be in place for proper execution. It currently works only for GPUs and has several TODOs that should be addressed going forward. Reviewed By: RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D106674	2022-01-27 15:16:24 -05:00
Jon Chesterfield	e08f3bfe58	[openmp] Disable build of old runtimes by default The old runtime is not tested by CI. Disable the build prior to the llvm-14 branch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118268	2022-01-26 19:17:31 +00:00
Joseph Huber	26feef0846	[Libomptarget] Change visibility to hidden for device RTL This patch changes the visibility for all construct in the new device RTL to be hidden by default. This is done after the changes introduced in D117806 changed the visibility from being hidden by default for all device compilations. This asserts that the visibility for the device runtime library will be hidden except for the internal environment variable. This is done to aid optimization and linking of the device library. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D117807	2022-01-20 21:06:28 -05:00
Johannes Doerfert	b0789a1b12	[OpenMP] Avoid costly shadow map traversals whenever possible In the OpenMC app we saw `omp target update` spending an awful lot of time in the shadow map traversal without ever doing any update there. There are two cases that allow us to avoid the traversal completely. The simplest thing is that small updates cannot (reasonably) contain an attached pointer part. The other case requires to track in the mapping table if an entry might contain an attached pointer as part. Given that we have a single location shadow map entries are created, the latter is actually fairly easy as well. Differential Revision: https://reviews.llvm.org/D113124	2022-01-19 22:14:41 -06:00
Johannes Doerfert	1e447d03e2	[OpenMP] Introduce an environment variable to disable atomic map clauses Atomic handling of map clauses was introduced to comply with the OpenMP standard (see D104418). However, many apps won't need this feature which can be costly in certain situations. To allow for applications to opt-out we now introduce the `LIBOMPTARGET_MAP_FORCE_ATOMIC` environment flag that voids the atomicity guarantee of the standard for map clauses again, shifting the burden to the user. This patch also de-duplicates the code that introduces the events used to enforce atomicity as a cleanup. Differential Revision: https://reviews.llvm.org/D117627	2022-01-19 22:14:41 -06:00
Joseph Huber	28d718602a	[OpenMP] Expand short verisions of OpenMP offloading triples The OpenMP offloading libraries are built with fixed triples and linked in during compile time. This would cause un-helpful errors if the user passed in the wrong expansion of the triple used for the bitcode library. because we only support these triples for OpenMP offloading we can normalize them to the full verion used in the bitcode library. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D117634	2022-01-19 20:26:37 -05:00
Jon Chesterfield	ce8f365884	[openmp] Always pass valid triple to openmp-targets when using newRTL Previously, we sometimes pass fopenmp-targets=nvptx64-nvidia-cuda-newRTL Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D117715	2022-01-19 22:07:22 +00:00
Jon Chesterfield	8baf4ba890	[openmp][amdgpu] Remove xfail from test using declare target variable	2022-01-19 15:55:37 +00:00
Jon Chesterfield	ca84c43d69	[openmp][amdgpu] Disable tests on old runtime, enable tests on new one	2022-01-19 15:49:47 +00:00
Jon Chesterfield	e35c8f541c	[openmp][amdgpu] Temporarily disable tests on old runtime	2022-01-19 15:39:00 +00:00
Joseph Huber	4863fed933	[Libomptarget] Fix external visibility for internal variables After the changes in D117362 made variables declared inside of a target declare directive visible outside the plugin, some variables inside the runtime were given visiblity that conflicted with their address space type. This caused problems when shared or local memory was made externally visible. This patch fixes this issue by making these varialbes static within the module, therefore limiting their visibility to being internal. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D117526	2022-01-18 18:19:57 -05:00
Joseph Huber	138cc5a001	Revert "[Libomptarget] Fix external visibility for internal variables" Reverting to investigate break on AMDGPU. This reverts commit `0203ff1960`.	2022-01-18 14:44:11 -05:00
Joseph Huber	0203ff1960	[Libomptarget] Fix external visibility for internal variables After the changes in D117362 made variables declared inside of a target declare directive visible outside the plugin, some variables inside the runtime were given visiblity that conflicted with their address space type. This caused problems when shared or local memory was made externally visible. This patch fixes this issue by making these varialbes static within the module, therefore limiting their visibility to being internal. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D117526	2022-01-18 12:53:24 -05:00
Joseph Huber	4869a22d1d	[Libomptarget] Add `cold` to KeepAlive attributes This patch adds the `cold` attribute to the keepAlive functions in the RTL. This dummy function exists to keep certain RTL calls alive without them being optimized out, but it is never called and can be declared cold. This also helps some erroneous remarks being given on this function because it has weak linkage and cannot be made internal. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D117513	2022-01-17 17:29:26 -05:00
Jon Chesterfield	d53b979596	[openmp][devicertl] Handle missing clang_tool Fixes github issues/52910 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D117230	2022-01-13 22:43:26 +00:00
Joseph Huber	4746e38f67	[Libomptarget] Fix multiply defined symbol during linking This patch adds the `weak` identifier to the openmp device environment variable. The changes introduced in https://reviews.llvm.org/D117211 result in multiply defined symbols. Because the symbol is potentially included multiple times for each offloading file we will get symbol colisions, and because it needs to have external visiblity it should be weak. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D117231	2022-01-13 11:57:33 -05:00
Jon Chesterfield	4395608939	[openmp] Mark used variables as retain as well D97446 changed the behaviour of 'used'. Compensate. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D117211	2022-01-13 13:57:32 +00:00
Jon Chesterfield	a74826d30a	[openmp][amdgpu] Replace unsigned long with uint64_t Some types need to be 64 bit. Unsigned long is a hazard there. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D116963	2022-01-10 22:19:30 +00:00
Shilei Tian	aab62aab04	[OpenMP][Offloading] Fixed a crash caused by dereferencing nullptr In function `DeviceTy::getTargetPointer`, `Entry` could be `nullptr` because of zero length array section. We need to check if it is a valid iterator before using it. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D116716	2022-01-05 23:04:29 -05:00
Shilei Tian	9584c6fa2f	[OpenMP][Offloading] Fixed data race in libomptarget caused by async data movement The async data movement can cause data race if the target supports it. Details can be found in [1]. This patch tries to fix this problem by attaching an event to the entry of data mapping table. Here are the details. For each issued data movement, a new event is generated and returned to `libomptarget` by calling `createEvent`. The event will be attached to the corresponding mapping table entry. For each data mapping lookup, if there is no need for a data movement, the attached event has to be inserted into the queue to gaurantee that all following operations in the queue can only be executed if the event is fulfilled. This design is to avoid synchronization on host side. Note that we are using CUDA terminolofy here. Similar mechanism is assumped to be supported by another targets. Even if the target doesn't support it, it can be easily implemented in the following fall back way: - `Event` can be any kind of flag that has at least two status, 0 and 1. - `waitEvent` can directly busy loop if `Event` is still 0. My local test shows that `bug49334.cpp` can pass. Reference: [1] https://bugs.llvm.org/show_bug.cgi?id=49940 Reviewed By: grokos, JonChesterfield, ye-luo Differential Revision: https://reviews.llvm.org/D104418	2022-01-05 20:20:04 -05:00
Shilei Tian	458db51c10	[OpenMP] Add missing `tt_hidden_helper_task_encountered` along with `tt_found_proxy_tasks` In most cases, hidden helper task behave similar as detached tasks. That means, for example, if we have to wait for detached tasks, we have to do the same thing for hidden helper tasks as well. This patch adds the missing condition for hidden helper task accordingly along with detached task. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D107316	2021-12-29 23:22:53 -05:00
Johannes Doerfert	73104ad65b	[OpenMP][NFC] Move headers into include folder	2021-12-28 23:53:28 -06:00
Shilei Tian	943d1d83dd	[OpenMP][CUDA] Add resource pool for CUevent Following D111954, this patch adds the resource pool for CUevent. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D116315	2021-12-28 17:42:38 -05:00
Shilei Tian	357c8031ff	[OpenMP][Plugin] Minor adjustments to ResourcePool This patch makes some minor adjustments to `ResourcePool`: - Don't initialize the resources if `Size` is 0 which can avoid assertion. - Add a new interface function `clear` to release all hold resources. - If initial size is 0, resize to 1 when the first request is encountered. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D116340	2021-12-28 16:11:03 -05:00
Joseph Huber	7cdaa5a94e	[OpenMP][FIX] Change globalization alignment to 16 This patch changes the default aligntment from 8 to 16, and encodes this information in the `__kmpc_alloc_shared` runtime call to communicate it to the HeapToStack pass. The previous alignment of 8 was not sufficient for the maximum size of primitive types on 64-bit systems, and needs to be increaesd. This reduces the amount of space availible in the data sharing stack, so this implementation will need to be improved later to include the alignment requirements in the allocation call, and use it properly in the data sharing stack in the runtime. Depends on D115888 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115971	2021-12-27 16:58:25 -05:00
Shilei Tian	a697a0a4b6	[OpenMP][Plugin] Introduce generic resource pool Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now we have the need that some resources, such as CUDA stream and event, will be hold by `libomptarget`. It is always good to buffer those resources. What's more important, given the way that `libomptarget` and plugins are connected, we cannot make sure whether plugins are still alive when `libomptarget` is destroyed. That leads to an issue that those resouces hold by `libomptarget` might not be released correctly. As a result, we need an unified management of all the resources that can be shared between `libomptarget` and plugins. `ResourcePoolTy` is designed to manage the type of resource for one device. It has to work with an allocator which is supposed to provide `create` and `destroy`. In this way, when the plugin is destroyed, we can make sure that all resources allocated from native runtime library will be released correctly, no matter whether `libomptarget` starts its destroy. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D111954	2021-12-27 11:32:14 -05:00
Jon Chesterfield	38af5b4fd1	[libomptarget][nfc] Refactor dlwrap.h for easier reuse in D115966 and upcoming patches	2021-12-17 22:28:31 +00:00
Jon Chesterfield	91dfb32f2f	[openmp][amdgpu][nfc] Mark all external functions extern C to get type checking	2021-12-17 18:46:43 +00:00
Carlo Bertolli	d3abb04e14	[OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter I missed the async info parameter in the first version of this API. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115887	2021-12-17 15:58:18 +00:00
Carlo Bertolli	d83dc4c648	[OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin This patch extends the AMDGPU plugin for OpenMP target offloading from using a single HSA queue to multiple queues (four in this patch) per device. This enables concurrent threads to concurrently submit kernel launches to the same GPU. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115771	2021-12-15 15:33:17 +00:00
Joseph Huber	8425bde82d	Revert "[OpenMP] Avoid costly shadow map traversals whenever possible" This reverts commit `7c8f4e7b85`. Fails a few OpenMP tests, causes a few updates to segfault.	2021-12-10 15:57:58 -05:00
Joseph Huber	7c8f4e7b85	[OpenMP] Avoid costly shadow map traversals whenever possible In the OpenMC app we saw `omp target update` spending an awful lot of time in the shadow map traversal without ever doing any update there. There are two cases that allow us to avoid the traversal completely. The simplest thing is that small updates cannot (reasonably) contain an attached pointer part. The other case requires to track in the mapping table if an entry might contain an attached pointer as part. Given that we have a single location shadow map entries are created, the latter is actually fairly easy as well. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D113124	2021-12-10 14:33:18 -05:00
Carlo Bertolli	28309c5436	[OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version. This patch prepares the plugin for asynchronous implementation by: Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function (submitted at D115267) - Separating the control flow path of asynchronous and synchronous kernel launch functions** (this diff) Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115273	2021-12-10 19:21:05 +00:00
Joel E. Denny	51168ce8d5	[OpenMP] Add test for custom state machine if have reduction D113602 broke the custom state machine when a reduction is present, as revealed by the reproducer this patch adds to the test suite. In that case, openmp-opts changes the return value to undef in `__kmpc_get_warp_size` (which the custom state machine calls as of D113602). Later optimizations then optimize away the custom state machine code as if all threads are outside the thread block, so the target region does not execute. D114802 fixed that but didn't add a reproducer. This patch also adds a `__OMP_RTL_ATTRS` entry for `__kmpc_get_warp_size` to OMPKinds.def, which D113602 missed. This change does not seem to have any impact on the reduction problem. Reviewed By: JonChesterfield, jdoerfert Differential Revision: https://reviews.llvm.org/D113824	2021-12-10 12:53:54 -05:00
Joseph Huber	bc9c4d7216	[OpenMP][FIX] Pass the num_threads value directly to parallel_51 The problem with the old scheme is that we would need to keep track of the "next region" and reset the num_threads value after it. The new RT doesn't do it and an assertion is triggered. The old RT doesn't do it either, I haven't tested it but I assume a num_threads clause might impact multiple parallel regions "accidentally". Further, in SPMD mode num_threads was simply ignored, for some reason beyond me. In any case, parallel_51 is designed to take the clause value directly, so let's do that instead. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D113623	2021-12-09 16:30:29 -05:00
Carlo Bertolli	cc8dc5e28b	[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy. Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115279	2021-12-08 23:02:39 +00:00

1 2 3 4 5 ...

845 Commits