llvm-project

Commit Graph

Author	SHA1	Message	Date
Ron Lieberman	95eac47260	[libomptarget] x86 offloading fails map_back_race.cpp intermittently Differential Revision: https://reviews.llvm.org/D122658	2022-03-29 16:01:17 +00:00
Johannes Doerfert	b803f06901	[OpenMP] The test does not have check lines	2022-03-29 00:02:55 -05:00
Johannes Doerfert	b309bdb970	[OpenMP][FIX] Use clang++ for the C++ test case	2022-03-28 23:14:24 -05:00
Johannes Doerfert	b316126887	[OpenMP][FIX] Avoid races in the handling of to be deleted mapping entries If we decided to delete a mapping entry we did not act on it right away but first issued and waited for memory copies. In the meantime some other thread might reuse the entry. While there was some logic to avoid colliding on the actual "deletion" part, there were two races happening: 1) The data transfer back of the thread deleting the entry and the data transfer back of the thread taking over the entry raced. 2) The update to the shadow map happened regardless if the entry was actually reused by another thread which left the shadow map in a inconsistent state. To fix both issues we will now update the shadow map and delete the entry only if we are sure the thread is responsible for deletion, hence no other thread took over the entry and reused it. We also wait for a potential former data transfer from the device to finish before we issue another one that would race with it. Fixes https://github.com/llvm/llvm-project/issues/54216 Differential Revision: https://reviews.llvm.org/D121058	2022-03-28 22:33:18 -05:00
Johannes Doerfert	ba93e4e33e	[OpenMP][NFC] Add missing virtual destructor to silence warning	2022-03-28 22:33:18 -05:00
Johannes Doerfert	7df2eba7fa	[Attributor][OpenMP] Add assumption for non-call assembly instructions Inline assembly is scary but we need to support it for the OpenMP GPU device runtime. The new assumption expresses the fact that it may not have call semantics, that is, it will not call another function but simply perform an operation or side-effect. This is important for reachability in the presence of inline assembly. Differential Revision: https://reviews.llvm.org/D109986	2022-03-28 20:57:52 -05:00
Shilei Tian	545fcc3d84	[OpenMP][CUDA] Fix potential program crash caused by double free resources As we mentioned in the code comments for function `ResourcePoolTy::release`, at some point there could be two identical resources on the two sides of `Next` mark. It is usually not an issue, unless the following case: 1. Some resources are not returned. 2. We need to iterate the pool and free the element. That will cause double free, which is the case for event pool. Since we don't release events hold by the data map, it can happen that the `Next` mark is not reset, and we have two identical items in the pool. When the pool is destroyed, we will call `cuEventDestroy` twice on the same event. In the best case, we can only observe CUDA errors. In the worst case, it can cause internal failures in CUDART and further crash. This patch fixes the issue by tracking all resources that have been given using an `unordered_set`. We don't remove it when a resource is returned. When the pool is destroyed, we merge the pool (a `vector`) and the set. In this way, we can make sure that the set contains all resources allocated from the device. We just need to iterate the set and free the resource accordingly. For now, only event pool is set to use it. Stream pool is not because we can make sure all streams are returned when the plugin is destroyed. Someone might be wondering, why don't we release all events hold in the data map. That is because, plugins are determined to be destroyed before `libomptarget`. If we can somehow make the plugin outlast `libomptarget`, life will be much easier. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D122014	2022-03-25 22:49:32 -04:00
Joseph Huber	9d3550c517	[OpenMP] Add AMDGPU calling convention to ctor / dtor functions This patch adds the necessary AMDGPU calling convention to the ctor / dtor kernels. These are fundamentally device kenels called by the host on image load. Without this calling convention information the AMDGPU plugin is unable to identify them. Depends on D122504 Fixes #54091 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D122515	2022-03-25 22:44:20 -04:00
Johannes Doerfert	6c2be885ff	Revert "[OpenMP][NFC] Add missing virtual destructor to silence warning" This reverts commit `b9fd8f34ae` as it accidentally contained a unit test change that is not finished (and unrelated).	2022-03-25 16:07:11 -05:00
Johannes Doerfert	7dfad948f1	[OpenMP][FIX] Repair ExclusiveAccess move semantic snafu	2022-03-25 16:00:53 -05:00
Johannes Doerfert	b9fd8f34ae	[OpenMP][NFC] Add missing virtual destructor to silence warning	2022-03-25 16:00:53 -05:00
Johannes Doerfert	4e34f061d6	[OpenMP][FIX] Ensure exclusive access to the HDTT map This patch solves two problems with the `HostDataToTargetMap` (HDTT map) which caused races and crashes before: 1) Any access to the HDTT map needs to be exclusive access. This was not the case for the "dump table" traversals that could collide with updates by other threads. The new `Accessor` and `ProtectedObject` wrappers will ensure we have a hard time introducing similar races in the future. Note that we could allow multiple concurrent read-accesses but that feature can be added to the `Accessor` API later. 2) The elements of the HDTT map were `HostDataToTargetTy` objects which meant that they could be copied/moved/deleted as the map was changed. However, we sometimes kept pointers to these elements around after we gave up the map lock which caused potential races again. The new indirection through `HostDataToTargetMapKeyTy` will allows us to modify the map while keeping the (interesting part of the) entries valid. To offset potential cost we duplicate the ordering key of the entry which avoids an additional indirect lookup. We should replace more objects with "protected objects" as we go. Differential Revision: https://reviews.llvm.org/D121057	2022-03-25 11:38:54 -05:00
Joseph Huber	a619072c61	[OpenMP] Manually unroll the argument copy loop The unroll pragma did not properly work as the loop bound was not known when we optimize the runtime and we then added a "unroll disable" metadata which prevented unrolling later when the bounds were known. For now we manually unroll to make sure up to 16 elements are handled nicely. This helps optimizations to look through the argument passing. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109164	2022-03-21 20:54:11 -04:00
Stanislav Mekhanoshin	e0b9364b5c	[AMDGPU] Add gfx90a and gfx940 to get_elf_mach_gfx_name.cpp Differential Revision: https://reviews.llvm.org/D120849	2022-03-17 13:05:07 -07:00
Jon Chesterfield	75779435f3	[nfc][openmp] Swap arguments to remove noise from upcoming diff	2022-03-11 23:08:37 +00:00
Shilei Tian	f6639a424b	[OpenMP][CUDA] Fix the check of `setContext`	2022-03-09 18:48:44 -05:00
Shilei Tian	39d3283a08	[OpenMP][CUDA] Avoid calling `cuCtxSetCurrent` redundantly Currently we set ccontext everywhere accordingly, but that causes many unnecessary function calls. For example, in the resource pool, if we need to resize the pool, we need to get from allocator. Each call to allocate sets the current context once, which is unnecessary. In this patch, we set the context only in the entry interface functions, if needed. Actually in the best way this should be implemented via RAII, but since `cuCtxSetCurrent` could return error, and we don't use exception, we can't stop the execution if RAII fails. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121322	2022-03-09 16:32:47 -05:00
Shilei Tian	5105c7cd78	[OpenMP][CUDA] Fix an issue that multiple `CUmodule` are could be overwritten This patch fixes the issue introduced in `14de0820e8` and D120089, that if dynamic libraries are used, the `CUmodule` array could be overwritten. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121308	2022-03-09 14:55:20 -05:00
Johannes Doerfert	14de0820e8	[OpenMP][FIX] Ensure the modules vector is filled as others are The modules vector was for some reason special which could lead to it not being of the same size (=num devices). Easiest solution is to treat it like we do all the other vectors.	2022-03-08 23:45:43 -06:00
Johannes Doerfert	1660288b28	[OpenMP][CUDA] Use one event pool per device An event pool, similar to the stream pool, needs to be kept per device. For one, events are associated with cuda contexts which means we cannot destroy the former after the latter. Also, CUDA documentation states streams and events need to be associated with the same context, which we did not ensure at all. Differential Revision: https://reviews.llvm.org/D120142	2022-03-07 23:43:05 -06:00
Johannes Doerfert	10aa83ff74	[OpenMP] Allow to explicitly deinitialize device resources There are two problems this patch tries to address: 1) We currently free resources in a random order wrt. plugin and libomptarget destruction. This patch should ensure the CUDA plugin is less fragile if something during the deinitialization goes wrong. 2) We need to support (hard) pause runtime calls eventually. This patch allows us to free all associated resources, though we cannot reinitialize the device yet. Follow up patch will associate one event pool per device/context. Differential Revision: https://reviews.llvm.org/D120089	2022-03-07 23:43:04 -06:00
Johannes Doerfert	307bbd3c82	[OpenMP][NFCI] Use RAII lock guards in libomptarget where possible Differential Revision: https://reviews.llvm.org/D121060	2022-03-07 23:43:04 -06:00
Johannes Doerfert	7ead7e90fc	Revert "[OpenMP][NFCI] Use RAII lock guards in libomptarget where possible" This reverts commit `ff50e81b50` as it broke the buildbots, see https://reviews.llvm.org/D121060#3362737.	2022-03-06 21:27:41 -06:00
Johannes Doerfert	ff50e81b50	[OpenMP][NFCI] Use RAII lock guards in libomptarget where possible Differential Revision: https://reviews.llvm.org/D121060	2022-03-06 19:59:23 -06:00
Shilei Tian	7f7c2c34b6	[OpenMP][CMake] Clean up the CMake variable `LIBOMPTARGET_LLVM_INCLUDE_DIRS` `LIBOMPTARGET_LLVM_INCLUDE_DIRS` is currently checked and included for multiple times redundantly. This patch is simply a clean up. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D121055	2022-03-05 22:37:59 -05:00
Joseph Huber	e2dcc2218c	[Libomptarget] Work around bug in initialization of libomptarget Libomptarget uses some shared variables to track certain internal stated in the runtime. This causes problems when we have code that contains no OpenMP kernels. These variables are normally initialized upon kernel entry, but if there are no kernels we will see no initialization. Currently we load the runtime into each source file when not running in LTO mode, so these variables will be erroneously considered undefined or dead and removed, causing miscompiles. This patch temporarily works around the most obvious case, but others still exhibit this problem. We will need to fix this more soundly later. Fixes #54208. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121007	2022-03-04 13:13:31 -05:00
Aakanksha	840695814a	[AMDGPU] Add gfx1036 target Differential Revision: https://reviews.llvm.org/D120846	2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin	2e2e64df4a	[AMDGPU] Add gfx940 target This is target definition only. Differential Revision: https://reviews.llvm.org/D120688	2022-03-02 13:54:48 -08:00
Shilei Tian	75812e7704	[OpenMP][Offloading] Change N back to 256 in bug49334.cpp	2022-02-23 16:10:35 -05:00
Joseph Huber	5dd0c39638	[Libomptarget][NFC} Fix missing newline in error message	2022-02-23 08:10:16 -05:00
Carlo Bertolli	7b731f4d0b	[OpenMP][libomptarget] Delay restore of shadow pointers in structs to after H2D memory copies are completed When using asynchronous plugin calls, shadow pointer restore could happen before the D2H copy for the entire struct has completed, effectively leaving a device pointer in a host struct. This patch fixes the problem by delaying restore's to after a synchronization happens (target regions) and by calling early synchronization (target update). Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119968	2022-02-18 10:09:10 -06:00
Joseph Huber	0870a4f59a	[OpenMP] Add flag for disabling thread state in runtime The runtime uses thread state values to indicate when we use an ICV or are in nested parallelism. This is done for OpenMP correctness, but it not needed in the majority of cases. The new flag added is `-fopenmp-assume-no-thread-state`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120106	2022-02-18 08:35:05 -05:00
Shilei Tian	092a5bb72b	[OpenMP][Offloading] Fix test case issues in bug49334.cpp `bug49334.cpp` has one issue that causes flaky result reported in #53730. The root cause is `BlockedC` is never initialized but in `BlockMatMul_TargetNowait` it is directly read and written (via `+=`). Fixes #53730. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D119988	2022-02-17 10:22:48 -05:00
Johannes Doerfert	57b4c5267b	[OpenMP][FIX] Eliminate race on the IsSPMD global The `IsSPMD` global can only be read by threads other than the main thread after initialization is complete. To allow usage of `mapping::getBlockSize` before initialization is done, we can pass the `IsSPMD` state explicitly. This is similar to other APIs that take `IsSPMD` explicitly to avoid such a race, e.g., `mapping::isInitialThreadInLevel0(IsSPMD)` Fixes https://github.com/llvm/llvm-project/issues/53857	2022-02-16 14:44:20 -06:00
Joseph Huber	777039a51c	[Libomptarget] Run CPU offloading tests using the new driver This patch adds a new target to the OpenMP CPU offloading tests. This tests the usage of the new driver for CPU offloading. If this all works then we can move to transition to the new driver as the default. Depends on D119613 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119736	2022-02-15 15:05:32 -05:00
Joseph Huber	48e3dcecc4	[Libomptarget][NFC] Remove constexpr to hide warnings Currently whenever we compile the device runtime we get the following 'Mapping.cpp:32:32: warning: inline function '_OMP::impl::getGridValue' is not defined [-Wundefined-inline]' warning. This can be silenced by removing the constexpr attribute for this function. Doing this doesn't change the generated bitcode at all but prevents the screen from getting filled with warnings whenver we build the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119747	2022-02-14 15:34:18 -05:00
Shilei Tian	c27f530d4c	[OpenMP][Offloading] Fix infinite loop in applyToShadowMapEntries This patch fixes the issue that the for loop in `applyToShadowMapEntries` is infinite because `Itr` is not incremented in `CB`. Fixes #53727. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119471	2022-02-12 22:02:53 -05:00
Shilei Tian	702a976c12	[OpenMP][Offloading] Change the way to compare floating point values in bug49334.cpp `bug49334.cpp` directly uses `!=` to compare two floating point values, which is almost wrong. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D119485	2022-02-10 18:20:36 -05:00
Shilei Tian	aca33b0b37	[OpenMP][CUDA] Remove the hard team limit Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as it is larger than that hard limit, the final number to launch the kernel will always be that hard limit. It is way less than the actual hardware limit. For example, my workstation has GTX2080, and the hardware limit of grid size is 2147483647, which is exactly the largest number a `int32_t` can represent. There is no limitation mentioned in the spec. This patch simply removes it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119313	2022-02-10 18:07:46 -05:00
Ye Luo	59ad9650cf	[Libomptarget][AMDGCN] add gfx90c target Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D119478	2022-02-10 15:55:44 -06:00
Shilei Tian	f6685f7746	[OpenMP][CUDA] Refine the logic to determine grid size This patch refines the logic to determine grid size as previous method can escape the check of whether `CudaBlocksPerGrid` could be greater than the actual hardware limit. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119311	2022-02-10 14:13:32 -05:00
Joseph Huber	9582f09690	[Libomptarget] Increase stack size for bug49779 test The 'bug49779.cpp' test has been failing recently. This is because the runtime is sufficiently complex when using nested parallelism without optimizations that the CUDA tools cannot statically determine the stack size. Because of this the kernel can exceed the thread stack size and crash. Work around this using the 'LIBOMPTARGET_STACK_SIZE' environment variable and add an FAQ entry for this situation. Fixes #53670 Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D119357	2022-02-09 15:37:23 -05:00
Joseph Huber	99d72ebddf	[Libomptarget] Add header files as a dependency to CMake target This patch manually adds the runtime include files to the list of dependencies when we build the bitcode runtime library. Previously if only the header was changed we would not recompile the source files. The solution used here isn't optimal because every source file not has a dependency on each header file regardless of if it was actually used by that file. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D119254	2022-02-08 12:09:59 -05:00
Joseph Huber	f8ffac5987	[OpenMP] Enable new driver tests for AMDGPU This patch enables running the new driver tests for AMDGPU. Previously this was disabled because some tests failed. This was only because the new driver tests hadn't been listed as unsupported or expected to fail. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D119240	2022-02-08 09:55:29 -05:00
Joseph Huber	d28051c4ab	[Libomptarget] Replace Value RAII with default value This patch replaces the ValueRAII pointer with a default 'nullptr' value. Previously this was initialized as a reference to an existing variable. The use of this variable caused overhead as the compiler could not look through the uses and determine that it was unused if 'Active' was not set. Because of this accesses to the variable would be left in the runtime once compiled. Fixes #53641 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119187	2022-02-07 17:12:00 -05:00
Joseph Huber	034adaf5be	[OpenMP] Completely remove old device runtime This patch completely removes the old OpenMP device runtime. Previously, the old runtime had the prefix `libomptarget-new-` and the old runtime was simply called `libomptarget-`. This patch makes the formerly new runtime the only runtime available. The entire project has been deleted, and all references to the `libomptarget-new` runtime has been replaced with `libomptarget-`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D118934	2022-02-04 15:31:33 -05:00
Joseph Huber	b4be18219e	[Libomptarget] Remove AMDGPU XFAIL from test Summary; This test should pass now with AMDGPU. Previously the symbols were hidden and would fail when read.	2022-02-04 13:40:03 -05:00
Jon Chesterfield	f52927c122	Revert "[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned" This seems to be the root cause of hangs on amdgpu. Reverting while investigating. This reverts commit `7b9844cc8d`.	2022-02-01 14:56:59 +00:00
Jon Chesterfield	8b7e99c41d	[openmp] Disable tests that presently hang on CI	2022-02-01 13:01:35 +00:00
Johannes Doerfert	7b9844cc8d	[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned Due to num_threads (probably also other reasons) we cannot assume explicit barriers are always executed by all threads in an aligned fashion. We can optimize them if that property can be proven but that is different.	2022-02-01 01:10:52 -06:00

1 2 3 4 5 ...

892 Commits