llvm-project

Commit Graph

Author	SHA1	Message	Date
Jon Chesterfield	a74826d30a	[openmp][amdgpu] Replace unsigned long with uint64_t Some types need to be 64 bit. Unsigned long is a hazard there. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D116963	2022-01-10 22:19:30 +00:00
Shilei Tian	aab62aab04	[OpenMP][Offloading] Fixed a crash caused by dereferencing nullptr In function `DeviceTy::getTargetPointer`, `Entry` could be `nullptr` because of zero length array section. We need to check if it is a valid iterator before using it. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D116716	2022-01-05 23:04:29 -05:00
Shilei Tian	9584c6fa2f	[OpenMP][Offloading] Fixed data race in libomptarget caused by async data movement The async data movement can cause data race if the target supports it. Details can be found in [1]. This patch tries to fix this problem by attaching an event to the entry of data mapping table. Here are the details. For each issued data movement, a new event is generated and returned to `libomptarget` by calling `createEvent`. The event will be attached to the corresponding mapping table entry. For each data mapping lookup, if there is no need for a data movement, the attached event has to be inserted into the queue to gaurantee that all following operations in the queue can only be executed if the event is fulfilled. This design is to avoid synchronization on host side. Note that we are using CUDA terminolofy here. Similar mechanism is assumped to be supported by another targets. Even if the target doesn't support it, it can be easily implemented in the following fall back way: - `Event` can be any kind of flag that has at least two status, 0 and 1. - `waitEvent` can directly busy loop if `Event` is still 0. My local test shows that `bug49334.cpp` can pass. Reference: [1] https://bugs.llvm.org/show_bug.cgi?id=49940 Reviewed By: grokos, JonChesterfield, ye-luo Differential Revision: https://reviews.llvm.org/D104418	2022-01-05 20:20:04 -05:00
RitanyaB	378b0ac179	SIGSEGV in ompt_tsan_dependences with for-ordered Segmentation fault in ompt_tsan_dependences function due to an unchecked NULL pointer dereference is as follows: ``` ThreadSanitizer:DEADLYSIGNAL ==140865==ERROR: ThreadSanitizer: SEGV on unknown address 0x000000000050 (pc 0x7f217c2d3652 bp 0x7ffe8cfc7e00 sp 0x7ffe8cfc7d90 T140865) ==140865==The signal is caused by a READ memory access. ==140865==Hint: address points to the zero page. /usr/bin/addr2line: DWARF error: could not find variable specification at offset 1012a /usr/bin/addr2line: DWARF error: could not find variable specification at offset 133b5 /usr/bin/addr2line: DWARF error: could not find variable specification at offset 1371a /usr/bin/addr2line: DWARF error: could not find variable specification at offset 13a58 #0 ompt_tsan_dependences(ompt_data_t, ompt_dependence_t const, int) /ptmp/bhararit/llvm-project/openmp/tools/archer/ompt-tsan.cpp:1004 (libarcher.so+0x15652) #1 __kmpc_doacross_post /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_csupport.cpp:4280 (libomp.so+0x74d98) #2 .omp_outlined. for_ordered_01.c:? (for_ordered_01.exe+0x5186cb) #3 __kmp_invoke_microtask /ptmp/bhararit/llvm-project/openmp/runtime/src/z_Linux_asm.S:1166 (libomp.so+0x14e592) #4 __kmp_invoke_task_func /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_runtime.cpp:7556 (libomp.so+0x909ad) #5 __kmp_fork_call /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_runtime.cpp:2284 (libomp.so+0x8461a) #6 __kmpc_fork_call /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_csupport.cpp:308 (libomp.so+0x6db55) #7 main ??:? (for_ordered_01.exe+0x51828f) #8 __libc_start_main ??:? (libc.so.6+0x24349) #9 _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120 (for_ordered_01.exe+0x4214e9) ThreadSanitizer can not provide additional info. SUMMARY: ThreadSanitizer: SEGV /ptmp/bhararit/llvm-project/openmp/tools/archer/ompt-tsan.cpp:1004 in ompt_tsan_dependences(ompt_data_t, ompt_dependence_t const, int) ==140865==ABORTING ``` To reproduce the error, use the following openmp code snippet: ``` /* initialise testMatrixInt Matrix, cols, r and c */ #pragma omp parallel private(r,c) shared(testMatrixInt) { #pragma omp for ordered(2) for (r=1; r < rows; r++) { for (c=1; c < cols; c++) { #pragma omp ordered depend(sink:r-1, c+1) depend(sink:r-1,c-1) testMatrixInt[r][c] = (testMatrixInt[r-1][c] + testMatrixInt[r-1][c-1]) % cols ; #pragma omp ordered depend (source) } } } ``` Compilation: ``` clang -g -stdlib=libc++ -fsanitize=thread -fopenmp -larcher test_case.c ``` It seems like the changes introduced by the commit https://reviews.llvm.org/D114005 causes this particular SEGV while using Archer. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D115328	2022-01-03 11:23:57 -06:00
Shilei Tian	458db51c10	[OpenMP] Add missing `tt_hidden_helper_task_encountered` along with `tt_found_proxy_tasks` In most cases, hidden helper task behave similar as detached tasks. That means, for example, if we have to wait for detached tasks, we have to do the same thing for hidden helper tasks as well. This patch adds the missing condition for hidden helper task accordingly along with detached task. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D107316	2021-12-29 23:22:53 -05:00
Johannes Doerfert	73104ad65b	[OpenMP][NFC] Move headers into include folder	2021-12-28 23:53:28 -06:00
Shilei Tian	943d1d83dd	[OpenMP][CUDA] Add resource pool for CUevent Following D111954, this patch adds the resource pool for CUevent. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D116315	2021-12-28 17:42:38 -05:00
Shilei Tian	357c8031ff	[OpenMP][Plugin] Minor adjustments to ResourcePool This patch makes some minor adjustments to `ResourcePool`: - Don't initialize the resources if `Size` is 0 which can avoid assertion. - Add a new interface function `clear` to release all hold resources. - If initial size is 0, resize to 1 when the first request is encountered. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D116340	2021-12-28 16:11:03 -05:00
Joseph Huber	7cdaa5a94e	[OpenMP][FIX] Change globalization alignment to 16 This patch changes the default aligntment from 8 to 16, and encodes this information in the `__kmpc_alloc_shared` runtime call to communicate it to the HeapToStack pass. The previous alignment of 8 was not sufficient for the maximum size of primitive types on 64-bit systems, and needs to be increaesd. This reduces the amount of space availible in the data sharing stack, so this implementation will need to be improved later to include the alignment requirements in the allocation call, and use it properly in the data sharing stack in the runtime. Depends on D115888 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115971	2021-12-27 16:58:25 -05:00
Shilei Tian	a697a0a4b6	[OpenMP][Plugin] Introduce generic resource pool Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now we have the need that some resources, such as CUDA stream and event, will be hold by `libomptarget`. It is always good to buffer those resources. What's more important, given the way that `libomptarget` and plugins are connected, we cannot make sure whether plugins are still alive when `libomptarget` is destroyed. That leads to an issue that those resouces hold by `libomptarget` might not be released correctly. As a result, we need an unified management of all the resources that can be shared between `libomptarget` and plugins. `ResourcePoolTy` is designed to manage the type of resource for one device. It has to work with an allocator which is supposed to provide `create` and `destroy`. In this way, when the plugin is destroyed, we can make sure that all resources allocated from native runtime library will be released correctly, no matter whether `libomptarget` starts its destroy. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D111954	2021-12-27 11:32:14 -05:00
Jonathan Peyton	6a556ecaf4	[OpenMP][libomp] Add use-all syntax to KMP_HW_SUBSET This patch allows the user to request all resources of a particular layer (or core-attribute). The syntax of KMP_HW_SUBSET is modified so the number of units requested is optional or can be replaced with an '' character. e.g., KMP_HW_SUBSET=c:intel_atom@3 will use all the cores after offset 3 e.g., KMP_HW_SUBSET=c:intel_core will use all the big cores e.g., KMP_HW_SUBSET=s,c,1t will use all the sockets, all cores per each socket and 1 thread per core. Differential Revision: https://reviews.llvm.org/D115826	2021-12-20 13:45:21 -06:00
Jon Chesterfield	38af5b4fd1	[libomptarget][nfc] Refactor dlwrap.h for easier reuse in D115966 and upcoming patches	2021-12-17 22:28:31 +00:00
Jon Chesterfield	91dfb32f2f	[openmp][amdgpu][nfc] Mark all external functions extern C to get type checking	2021-12-17 18:46:43 +00:00
Carlo Bertolli	d3abb04e14	[OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter I missed the async info parameter in the first version of this API. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115887	2021-12-17 15:58:18 +00:00
Carlo Bertolli	d83dc4c648	[OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin This patch extends the AMDGPU plugin for OpenMP target offloading from using a single HSA queue to multiple queues (four in this patch) per device. This enables concurrent threads to concurrently submit kernel launches to the same GPU. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115771	2021-12-15 15:33:17 +00:00
Jonathan Peyton	9769340905	[OpenMP][libomp] Fix compile errors with new KMP_HW_SUBSET changes Add missing guards around x86-specific code. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D115664	2021-12-14 08:33:05 +01:00
John Ericson	ddcc02dbcc	Quote some more destination paths with variables Just defensive CMake-ing. I pulled this from D115544 and D99484 which are blocked on some lldb CI failures I don't yet understand. Hoping to land something smaller in the meantime. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D115566	2021-12-13 17:29:08 +00:00
Michael Kruse	77e019c233	[OpenMP] Add "not" to test dependencies. The `not` program is used to test executions prefixed with `%libomptarget-run-fail-`. Currently `not` is not used for libomp tests, but might be used in the future and its dependency does not add any additional burden over the already established `FileCheck` dependency. Required to add libomptarget testing to the Phabricator pre-merge check (see https://github.com/google/llvm-premerge-checks/issues/368) Reviewed By: jdenny, JonChesterfield Differential Revision: https://reviews.llvm.org/D115454	2021-12-12 10:52:53 -06:00
Med Ismail Bennani	30fc88bf1d	Revert "Revert "Revert "Use `GNUInstallDirs` to support custom installation dirs. -- LLVM""" This reverts commit `492de35df4`. I tried to apply John's changes in `8d897ec915` that were expected to fix his patch but that didn't work unfortunately. Reverting this again to fix the macOS bots and leave him more time to investigate the issue.	2021-12-10 17:33:54 -08:00
John Ericson	492de35df4	Revert "Revert "Use `GNUInstallDirs` to support custom installation dirs. -- LLVM"" This reverts commit `797b50d4be`. See the original D99484. @mib who noticed the original problem could not longer reproduce it, after I tried and also failed. We are threfore hoping it went away on its own! Reviewed By: mib Differential Revision: https://reviews.llvm.org/D115544	2021-12-10 20:59:43 +00:00
Joseph Huber	8425bde82d	Revert "[OpenMP] Avoid costly shadow map traversals whenever possible" This reverts commit `7c8f4e7b85`. Fails a few OpenMP tests, causes a few updates to segfault.	2021-12-10 15:57:58 -05:00
Jonathan Peyton	df20599597	[OpenMP][libomp] Add core attributes to KMP_HW_SUBSET Allow filtering of resources based on core attributes. There are two new attributes added: 1) Core Type (intel_atom, intel_core) 2) Core Efficiency (integer) where the higher the efficiency, the more performant the core On hybrid architectures , e.g., Alder Lake, users can specify KMP_HW_SUBSET=4c:intel_atom,4c:intel_core to select the first four Atom and first four Big cores. The can also use the efficiency syntax. e.g., KMP_HW_SUBSET=2c:eff0,2c:eff1 Differential Revision: https://reviews.llvm.org/D114901	2021-12-10 14:34:33 -06:00
Joseph Huber	7c8f4e7b85	[OpenMP] Avoid costly shadow map traversals whenever possible In the OpenMC app we saw `omp target update` spending an awful lot of time in the shadow map traversal without ever doing any update there. There are two cases that allow us to avoid the traversal completely. The simplest thing is that small updates cannot (reasonably) contain an attached pointer part. The other case requires to track in the mapping table if an entry might contain an attached pointer as part. Given that we have a single location shadow map entries are created, the latter is actually fairly easy as well. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D113124	2021-12-10 14:33:18 -05:00
Carlo Bertolli	28309c5436	[OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version. This patch prepares the plugin for asynchronous implementation by: Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function (submitted at D115267) - Separating the control flow path of asynchronous and synchronous kernel launch functions** (this diff) Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115273	2021-12-10 19:21:05 +00:00
Joel E. Denny	51168ce8d5	[OpenMP] Add test for custom state machine if have reduction D113602 broke the custom state machine when a reduction is present, as revealed by the reproducer this patch adds to the test suite. In that case, openmp-opts changes the return value to undef in `__kmpc_get_warp_size` (which the custom state machine calls as of D113602). Later optimizations then optimize away the custom state machine code as if all threads are outside the thread block, so the target region does not execute. D114802 fixed that but didn't add a reproducer. This patch also adds a `__OMP_RTL_ATTRS` entry for `__kmpc_get_warp_size` to OMPKinds.def, which D113602 missed. This change does not seem to have any impact on the reduction problem. Reviewed By: JonChesterfield, jdoerfert Differential Revision: https://reviews.llvm.org/D113824	2021-12-10 12:53:54 -05:00
AndreyChurbanov	1031e43052	[OpenMP] libomp: fix Fortran header: lines exceeded 72-char length Added line continuation to two long lines in Fortran header. Differential Revision: https://reviews.llvm.org/D114537	2021-12-10 16:23:21 +03:00
Joseph Huber	bc9c4d7216	[OpenMP][FIX] Pass the num_threads value directly to parallel_51 The problem with the old scheme is that we would need to keep track of the "next region" and reset the num_threads value after it. The new RT doesn't do it and an assertion is triggered. The old RT doesn't do it either, I haven't tested it but I assume a num_threads clause might impact multiple parallel regions "accidentally". Further, in SPMD mode num_threads was simply ignored, for some reason beyond me. In any case, parallel_51 is designed to take the clause value directly, so let's do that instead. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D113623	2021-12-09 16:30:29 -05:00
Carlo Bertolli	cc8dc5e28b	[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy. Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115279	2021-12-08 23:02:39 +00:00
AndreyChurbanov	4dd8fccb71	[OpenMP] libomp: Fix crash if application send us negative thread_limit value Regardless that specification requires thread_limit to be positive, it is better to warn user instead of crash in case the value is negative. Differential Revision: https://reviews.llvm.org/D115340	2021-12-08 19:02:57 +03:00
Jon Chesterfield	14ff611fe1	Revert "[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version" This reverts commit `6de698bf10`. It didn't build in the dynamic_hsa configuration	2021-12-08 08:23:12 +00:00
Carlo Bertolli	6de698bf10	[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy. Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115279	2021-12-07 23:05:23 +00:00
Carlo Bertolli	d9b1d827d2	[NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version. This patch prepares the plugin for asynchronous implementation by: - Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function Actual separation of kernel launch code (async vs sync) is a following patch. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115267	2021-12-07 21:02:45 +00:00
Martin Storsjö	db32c4f456	[OpenMP] Disable libomptarget profiling by default if built via the "runtimes" setup In the "runtimes" setup, the runtime (e.g. OpenMP) can be built for a target entirely different from the current host build (where LLVM and Clang are built). If profiling is enabled, libomptarget links against LLVMSupport (which only has been built for the host). Thus, don't enable profiling by default in this setup. This should allow relanding D113253. Differential Revision: https://reviews.llvm.org/D114083	2021-12-07 22:23:50 +02:00
Ye Luo	21a51cebf1	[OpenMP][libomptarget] amdgpu plugin adds runpath for dependencies amdgpu plugin depends on libhsa-runtime64 library. Add runpath in case it is not on the LD_LIBRARY_PATH. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115198	2021-12-06 18:19:18 -06:00
Jon Chesterfield	a05a0c3c2f	[libomptarget] Add cmake variables to disable building the amdgpu or cuda plugins Analogous to the controls on building device runtimes Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115148	2021-12-06 16:42:26 +00:00
Jon Chesterfield	a2b3b4dadc	[openmp] Run tests on both runtimes, independent of the default Minor fix to the lit.cfg. Currently, nvptx runs the tests twice on the new runtime. Soon, amdgpu will run them on the new runtime as well as the old. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115150	2021-12-06 16:41:23 +00:00
Jon Chesterfield	9e08c2054a	[openmp] Enable tests on new devicertl on amdgpu Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D114891	2021-12-06 15:26:18 +00:00
Jon Chesterfield	1a87a18955	[openmp][amdgpu] Disable tests requiring USM on amdgcn These tests tend to hang or crash on hardware that doesn't support USM. Disabling them helps diagnose other issues. To safely enable we require a means of testing whether USM is expected to work. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115144	2021-12-06 13:25:23 +00:00
Matt Arsenault	90f914c870	OpenMP: Un-xfail tests that pass now `729bf9b26b` should have fixed these	2021-12-04 11:25:22 -05:00
Ron Lieberman	8f4013ad46	Restric xfail on openmp/libomptarget/test/mapping/reduction_implicit_map.cpp to amdgcn-amd-amdhsa	2021-12-02 20:58:26 +00:00
Ron Lieberman	f87c2c637e	xfail: libomptarget reduction_implicit_map.cpp after reapply of Start calling setTargetAttributes	2021-12-02 20:38:25 +00:00
Jon Chesterfield	fb9fc3c951	[openmp][amdgpu] Disable three tests in preparation for new runtime	2021-12-02 07:57:14 +00:00
Kazushi (Jam) Marukawa	5e2358c781	[runtimes][openmp] Change to not treat ARCH-unknown-linux-gnu as errors When OpenMP is compiled as a part runtimes for multiple targets, openmp is compiled under build/runtimes/runtimes-arch-unknown-linux-gnu-bins directory. Old implementation treats this directory name as errors. This patch adds a guard like "[Uu]known[^-]". Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D114346	2021-12-01 08:33:37 +09:00
Jonathan Peyton	618f8dc5e5	[OpenMP][libomp][doc] Add environment variables documentation Add documentation for the environment variables for libomp Differential Revision: https://reviews.llvm.org/D114269	2021-11-30 16:29:31 -06:00
Jon Chesterfield	3ab150f6e4	[openmp][devicertl] Add a missing loader_uninitialized attribute	2021-11-29 23:54:37 +00:00
Matt Arsenault	935abeaace	OpenMP: Correctly query location for amdgpu-arch This was trying to figure out the build path for amdgpu-arch, and making assumptions about where it is which were not working on my system. Whether a standalone build or not, we should have a proper imported target to get the location from.	2021-11-29 16:31:32 -05:00
Jon Chesterfield	ae5348a38e	[openmp][amdgpu] Make plugin robust to presence of explicit implicit arguments OpenMP (compiler) does not currently request any implicit kernel arguments. OpenMP (runtime) allocates and initialises a reasonable guess at the implicit kernel arguments anyway. This change makes the plugin check the number of explicit arguments, instead of all arguments, and puts the pointer to hostcall buffer in both the current location and at the offset expected when implicit arguments are added to the metadata by D113538. This is intended to keep things running while fixing the oversight in the compiler (in D113538). Once that patch lands, and a following one marks openmp kernels that use printf such that the backend emits an args element with the right type (instead of hidden_node), the over-allocation can be removed and the hardcoded 8*e+3 offset replaced with one read from the .offset of the corresponding metadata element. Reviewed By: estewart08 Differential Revision: https://reviews.llvm.org/D114274	2021-11-22 23:00:20 +00:00
Joseph Huber	fbfe8fcbc3	[Libomptarget] Remove undefined symbol in old runtime A function with no definition was left in the old runtime, causing linker errors when trying to compile. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D114264	2021-11-20 08:26:57 -05:00
Jon Chesterfield	04954824ee	[openmp][amdgpu][nfc] Simplify implicit args handling Removes a +x/-x pair on the only store/load of a variable and deletes some nearby dead code. Also reduces the size of the implicit struct to reflect the code currently emitted by clang. Differential Revision: https://reviews.llvm.org/D114270	2021-11-19 20:18:23 +00:00
Jon Chesterfield	9cdaf0b01b	[openmp][amdgpu][nfc] Inline interop_hsa_get_kernel_info into only caller	2021-11-19 18:45:17 +00:00

1 2 3 4 5 ...

2101 Commits