llvm-project

Commit Graph

Author	SHA1	Message	Date
Joachim Protze	3895148d7c	[OpenMP] Fix a race in task queue reallocation __kmp_realloc_task_deque implicitly assumes, that the task queue is full (ntasks == size), therefore tail = size in line 319. An assertion is added to document this assumption. The first check for a full queue is before the locking and might not hold when the lock is taken. So, we need to check again for this condition when we have the lock. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D80480	2020-05-25 10:23:22 +02:00
AndreyChurbanov	57d8b8d6f0	[openmp] Fixed hang if detached task was serialized. The patch fixes https://bugs.llvm.org/show_bug.cgi?id=45904. Differential Revision: https://reviews.llvm.org/D79944	2020-05-18 15:32:13 +03:00
Joachim Protze	d23131a3c0	[OpenMP] Fix race condition in the completion/freeing of detached tasks Spurious assertion failures are symptoms of a race condition for the handling of detached tasks: Assertion failure at kmp_tasking.cpp(3744): taskdata->td_flags.complete == 1. Assertion failure at kmp_tasking.cpp(710): taskdata->td_flags.executing == 0. in the case of detach=true, all accesses to taskdata in __kmp_task_finish need to happen before (~line 873): taskdata->td_flags.proxy = TASK_PROXY; This assignment signals to __kmp_fulfill_event, that the task will need to be freed there. So, conceptionally the ownership of taskdata is moved. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D79702	2020-05-17 12:28:38 +02:00
Manoel Roemmer	6b9e43c67e	[Openmp][VE] Libomptarget plugin for NEC SX-Aurora This patch adds a libomptarget plugin for the NEC SX-Aurora TSUBASA Vector Engine (VE target). The code is largely based on the existing generic-elf plugin and uses the NEC VEO and VEOSINFO libraries for offloading. Differential Revision: https://reviews.llvm.org/D76843	2020-05-12 10:47:30 +02:00
Joel E. Denny	dd5ba4b585	[OpenMP][NFC] Fix `not` sustitution in tests D78566 introduced a `\bnot\b` lit substitution in OpenMP test suites. However, that would corrupt a command like `FileCheck -implicit-check-not` or any file name like `%t.not`. We could use lookbehind/lookahead assertions to avoid such cases, but this patch switches to `%not` (suggested during the D78566 review) as a safer option. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D79529	2020-05-11 14:53:48 -04:00
Shilei Tian	cb038927ef	[OpenMP] Fix an issue of wrong return type of DeviceRTLTy::getNumOfDevices Summary: There is a typo in DeviceRTLTy::getNumOfDevices that the type of its return value is bool. It will lead to a problem of wrong device number returned from omp_get_num_devices. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D79255	2020-05-03 15:59:06 -04:00
Ron Lieberman	ee9c53d271	[libomptarget] Initialize reference parameter IsNew within Device::getOrAllocTgtPtr The two locals IsNew and Pointer_IsNew were uninitialized at declaration, and then passed by reference to Device.getOrAllocTgtPtr which in turn did not assign on all paths within the function. This resulted in occasional runtime failures in one application. Device::getOrAllocTgtPtr will now initialize IsNew to false on entry to function. Differential Revision: https://reviews.llvm.org/D78744	2020-04-24 15:33:37 -05:00
Joel E. Denny	5f6aa9680c	[OpenMP] target_data_begin: fail on device alloc fail Without this patch, target_data_begin continues after an illegal mapping or an out-of-memory error on the device. With this patch, it terminates the runtime with an error instead. The new test exercises only illegal mappings. I didn't think of a good way to exercise out-of-memory errors from the test suite. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D78170	2020-04-21 17:10:50 -04:00
Joel E. Denny	ba942610f6	[OpenMP] Add scaffolding for negative runtime tests Without this patch, the openmp project's test suites do not appear to have support for negative tests. However, D78170 needs to add a test that an expected runtime failure occurs. This patch makes `not` visible in all of the openmp project's test suites. In all but `libomptarget/test`, it should be possible for a test author to insert `not` before a use of the lit substitution for running a test program. In `libomptarget/test`, that substitution is target-specific, and its value is `echo` when the target is not available. In that case, inserting `not` before a lit substitution would expect an `echo` fail, so this patch instead defines a separate lit substitution for expected runtime fails. Reviewed By: jdoerfert, Hahnfeld Differential Revision: https://reviews.llvm.org/D78566	2020-04-21 17:10:50 -04:00
Bryan Chan	b86ff5f6ef	[OpenMP] Sync writes to child thread's data before reduction On systems with weak memory consistency, this patch fixes an intermittent crash in the reduction function called by __kmp_hyper_barrier_gather, which suffers from a race on a child thread's data. Reviewed-By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D77603	2020-04-14 14:31:06 -04:00
Shilei Tian	4031bb982b	[OpenMP] Refined CUDA plugin to put all CUDA operations into class Summary: Current implementation mixed everything up so that there is almost no encapsulation. In this patch, all CUDA related operations are put into a new class DeviceRTLTy and only necessary functions are exposed. In addition, all C++ code now conforms with LLVM code standard, keeping those API functions following C style. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: jfb, yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77951	2020-04-13 13:32:46 -04:00
Shilei Tian	feed674dec	[OpenMP] Introduce stream pool to make sure the correctness of device synchr... ...onization Summary: In previous patch, in order to optimize performance, we only synchronize once for each target region. The syncrhonization is via stream synchronization. However, in the extreme situation, the performce might be bad. Consider the following case: There is a task that requires transferring huge amount of data (call many times of data transferring function). It is scheduled to the first stream. And then we have 255 very light tasks scheduled to the remaining 255 streams (by default we have 256 streams). They can be finished before we do synchronization at the end of the first task. Next, we get another very huge task. It will be scheduled again to the first stream. Now the first task finishes its kernel launch and call stream synchronization. Right now, the stream already contains two kernels, and the synchronization will wait until the two kernels finish instead of just the first one for the first task. In this patch, we introduce stream pool. After each synchronization, the stream will be returned back to the pool to make sure that for each synchronization, only expected operations are waited. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: gregrodgers, yaxunl, lildmh, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77412	2020-04-11 07:08:56 -04:00
Shilei Tian	03ff643d2e	[OpenMP] Put old APIs back and added new _async series for backward compatibility Summary: According to comments on bi-weekly meeting, this patch put back old APIs and added new `_async` series Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77822	2020-04-09 22:40:58 -04:00
Shilei Tian	32ed29271f	[OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream Summary: This patch introduces two things for offloading: 1. Asynchronous data transferring: those functions are suffix with `_async`. They have one more argument compared with their synchronous counterparts: `__tgt_async_info`, which is a new struct that only has one field, `void Identifier`. This struct is for information exchange between different asynchronous operations. It can be used for stream selection, like in this case, or operation synchronization, which is also used. We may expect more usages in the future. 2. Optimization of stream selection for data mapping. Previous implementation was using asynchronous device memory transfer but synchronizing after each memory transfer. Actually, if we say kernel A needs four memory copy to device and two memory copy back to host, then we can schedule these seven operations (four H2D, two D2H, and one kernel launch) into a same stream and just need synchronization after memory copy from device to host. In this way, we can save a huge overhead compared with synchronization after each operation. Reviewers: jdoerfert, ye-luo Reviewed By: jdoerfert Subscribers: yaxunl, lildmh, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77005	2020-04-07 14:55:47 -04:00
Kazuaki Ishizaki	4201679110	[OpenMP] NFC: Fix trivial typo Differential Revision: https://reviews.llvm.org/D77430	2020-04-04 12:06:54 +09:00
Vitaly Buka	c9ae3c5e10	[openmp] Disable tests flaky on Debian https://bugs.llvm.org/show_bug.cgi?id=45397	2020-04-01 21:58:05 -07:00
JonChesterfield	09834f9761	[libomptarget][nfc] Move non-freestanding headers out of common Summary: [libomptarget][nfc] Move non-freestanding headers out of common Lowers the bar for building deviceRTL. Drops math.h entirely as it wasn't used and libm is a big dependency. Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77071	2020-03-31 23:43:18 +01:00
Alexey Bataev	0fca766458	[OPENMP50]Fix PR45117: Orphaned task reduction should be allowed. Add support for orpahned task reductions.	2020-03-27 17:47:30 -04:00
Henry Kao	236ac68fa5	[OpenMP] Add memory barrier to solve data race Data race occurs when acquiring lock for critical section triggering assertion failure. Added barrier to ensure all memory is commited before checking assertion. Reviewed By: Hahnfeld Differential Revision: https://reviews.llvm.org/D76780	2020-03-27 16:32:28 -04:00
Jon Chesterfield	856c995436	[libomptarget] Add missing elf_end call in elf_common.c Summary: [libomptarget] Add missing elf_end call in elf_common.c Noticed when reviewing D76843. Reviewers: simoll, jdoerfert, efocht, AndreyChurbanov, grokos, manorom Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D76874	2020-03-26 19:07:33 +00:00
JonChesterfield	0813f41005	[libomptarget][nfc] Explicitly static function scope shared variables Summary: [libomptarget][nfc] Explicitly static function scope shared variables `__shared__` in CUDA implies static in function scope. See e.g. D.2.1.1 in CUDA_C_Programming_Guide.pdf, http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ This is surprising for non-cuda developers, see e.g. D73239 where I thought local variables would be thread local. Tested by IR diff of libomptarget.bc (no change), running in tree tests, and binary diff of the nvcc static archives (no significant change). Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D76713	2020-03-24 18:51:50 +00:00
AndreyChurbanov	ae044467ed	[openmp][runtime] Fixed hang for explicit task inside a taskloop. Added missed initialization of td_last_tied field for taskloop tasks. Differential Revision: https://reviews.llvm.org/D75673	2020-03-23 20:07:30 +03:00
Sylvestre Ledru	72fd1033ea	Doc: Links should use https	2020-03-22 22:49:33 +01:00
JonChesterfield	298527587c	[libomptarget][nfc] Disable amdgcn rtl build. The cmake logic for finding llvm is misbehaving.	2020-03-21 00:01:03 +00:00
George Rokos	0a42c9bfe4	Enable CUDA offloading on aarch64 host Differential Revision: https://reviews.llvm.org/D76469	2020-03-20 15:38:47 -07:00
Tom Scogland	a23d7282ca	openmp: fix memcpy memory leak Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D72637	2020-03-12 23:24:16 -05:00
Alexey Bataev	c422d69b1a	[LIBOMPTARGET]Fix PR45139: Bug in mixing Python and OpenMP target offload. Summary: Explicitly initialize data members of RTLsTy class upon construction. Reviewers: grokos Subscribers: guansong, openmp-commits, caomhin, kkwli0 Tags: #openmp Differential Revision: https://reviews.llvm.org/D75946	2020-03-11 09:12:02 -04:00
Jonas Hahnfeld	f0689d2e62	archer: Remove superfluous dot from warning message	2020-03-06 15:19:30 +01:00
Jon Chesterfield	221ada654b	[libomptarget] Implement locks for amdgcn Summary: [libomptarget] Implement locks for amdgcn The nvptx implementation deadlocks on amdgcn. atomic_cas with multiple active lanes can deadlock - if one lane succeeds, all the others are locked out. The set_lock implementation therefore runs on a single lane. Also uses a sleep intrinsic instead of the system clock for a probably minor performance improvement. The unset/test implementations may be revised later, based on code size / performance or similar concerns. This implements the lock at a per-wavefront scope. That's not strictly as specified, since openmp describes locks in terms of threads. I think the nvptx implementation provides true per-thread locking on volta and the same per-warp locking on other architectures. Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D75546	2020-03-05 20:25:31 +00:00
Jon Chesterfield	918a1065be	[libomptarget][nfc] Move GetWarp/LaneId functions into per arch code Summary: [libomptarget][nfc] Move GetWarp/LaneId functions into per arch code No code change for nvptx. Amdgcn currently has two implementations of GetLaneId, this patch keeps the one a colleague considered to be superior for our ISA. GetWarpId is currently the same function for amdgcn and nvptx, but I think it's cleaner to keep it grouped with all the others than to keep it in support.cu. Reviewers: jdoerfert, grokos, ABataev Reviewed By: jdoerfert Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D75587	2020-03-05 17:05:58 +00:00
Jon Chesterfield	84ac0dffd4	[libomptarget][nfc][amdgcn] Replace magic number with named intrinsic	2020-03-05 11:50:30 +00:00
Jon Chesterfield	133db44996	[libomptarget] Implement most hip atomic functions in terms of intrinsics Summary: [libomptarget] Implement hip atomic functions in terms of intrinsics All but atomicInc can be implemented using type generic clang intrinsics. There is not yet a corresponding intrinsic for atomicInc in clang, only one in LLVM. This patch leaves atomicInc as an unresolved symbol. Reviewers: jdoerfert, ABataev, hfinkel, grokos, arsenm Reviewed By: arsenm Subscribers: sri, saiislam, wdng, jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D73076	2020-03-04 17:56:40 +00:00
AndreyChurbanov	95df6747cf	[openmp] OpenMP 5.1 omp_display_env function implementation. Patch by Michael Klemm. Differential Revision: https://reviews.llvm.org/D74956	2020-03-04 18:15:05 +03:00
Jon Chesterfield	ad3d021b9e	[libomptarget][nfc][amdgcn] Simplify assert_fail implementation	2020-03-03 18:24:51 +00:00
Alexey Bataev	c4a9d976c1	[LIBOMPTARGET]Lower priority of global constructor/destructor to silence the warning from gcc. Summary: fixed the warning from gcc since prios 0-100 are reserved for the internal use. Reviewers: grokos Subscribers: kkwli0, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D75458	2020-03-02 15:15:11 -05:00
Alexey Bataev	63cef621f9	[LIBOMPTARGET]Fix PR44933: fix crash because of the too early deinitialization of libomptarget. Summary: Instead of using global variables with unpredicted time of deinitialization, use dynamically allocated variables with functions explicitly marked as global constructor/destructor and priority. This allows to prevent the crash because of the incorrect order of dynamic libraries deinitialization. Reviewers: grokos, hfinkel Subscribers: caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D74837	2020-02-25 15:54:37 -05:00
Kelvin Li	e16e267bb6	[OpenMP][cmake] ignore warning on unknown CUDA version Differential Revision: https://reviews.llvm.org/D75001	2020-02-25 09:29:07 -05:00
Shoaib Meenai	e34ddc09f4	[arcconfig] Delete subproject arcconfigs From https://secure.phabricator.com/book/phabricator/article/arcanist_new_project/: > An .arcconfig file is a JSON file which you check into your project's root. I've done some experimentation, and it looks like the subproject .arcconfigs just get ignored, as the documentation says. Given that we're fully on the monorepo now, it's safe to remove them. Differential Revision: https://reviews.llvm.org/D74996	2020-02-24 16:20:36 -08:00
serge-sans-paille	99b03c1c18	Detect and disable openmp tests that require multiple hardware processor to run Team tests seem to require at least two physical cores, and using the same trick as in https://reviews.llvm.org/D55598 doesn't work (why?) . Using lit configuration instead. Differential Revision: https://reviews.llvm.org/D74921	2020-02-21 14:02:12 +01:00
Yuanfang Chen	c2c4f1c120	[openmp][cmake] passing option argument correctly From the context, it looks like the test should not be run with `check-all`, but it does. It turns out option argument resolving to True/False which could not be passed down as is. There is one such example in AddLLVM.cmake.	2020-02-13 09:33:58 -08:00
Alexey Bataev	578c13d13c	[OPENMP]Fix the test, NFC.	2020-02-13 10:40:06 -05:00
Ethan Stewart	190a11148b	Changed omp_get_max_threads() implementation to more closely match spec description. Summary: The 5.0 spec states, "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a num_threads clause were encountered after execution returns from this routine." The attached test shows Max Threads: 96, Num Threads: 128 without the proposed change. The number of threads should not exceed the (max) nthreads ICV, hence we should return the higher SPMD thread number even when omp_get_max_threads() is called in a generic kernel. This change does fail the api test, max_threads.c, because now it would return 64 instead of 32. Reviewers: jdoerfert, ABataev, grokos, JonChesterfield Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D74092	2020-02-12 23:29:34 +00:00
JonChesterfield	c2ce9ea4e3	[libomptarget][nfc] Change enum values to match those in cuda/rtl Summary: [libomptarget][nfc] Change enum values to match those in cuda/rtl support.h and cuda/rtl.cpp (and downsteam hsa/rtl.cpp) have enums for execution mode. These are actually independent - the numbers that used within support, or within the plugin, are never passed across the boundary. Nevertheless, trying to work out why the values are different between the two has generated a reasonable amount of confusion. This patch changes support to match the values in plugin, on the basis that the plugin also has some comments which I'd have to update if I changed that one instead. Credit to Ron for working through this in our own fork. See rocm-developer-tools/aomp/issues/7 for that earlier diagnostic write up. Also happy with generic = 0, spmd = 1 - provided it's the same in both places. Reviewers: jdoerfert, grokos, ABataev, ronlieb Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D74503	2020-02-12 23:27:08 +00:00
Kelvin Li	4f1f2b7a5b	[OpenMP] update strings output of libomp.so [NFC] Change the string from "Intel(R) OMP" to "LLVM OMP" in libomp.so Differential Revision: https://reviews.llvm.org/D74462	2020-02-12 15:45:55 -05:00
Johannes Doerfert	a5153dbc36	[OpenMP][Offloading] Added support for multiple streams so that multiple kernels can be executed concurrently Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D74145	2020-02-11 22:07:14 -06:00
Johannes Doerfert	3ff4e2eee8	[OpenMP] Switch default C++ standard to C++ 14 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D74258	2020-02-11 17:11:54 -06:00
Jonas Devlieghere	4fe839ef3a	[CMake] Rename EXCLUDE_FROM_ALL and make it an argument to add_lit_testsuite EXCLUDE_FROM_ALL means something else for add_lit_testsuite as it does for something like add_executable. Distinguish between the two by renaming the variable and making it an argument to add_lit_testsuite. Differential revision: https://reviews.llvm.org/D74168	2020-02-06 15:33:18 -08:00
Jon Chesterfield	6a82f0f0b9	[libomptarget] Implement wavefront functions for amdgcn Summary: [libomptarget] Implement wavefront functions for amdgcn Reviewers: jdoerfert, ABataev, grokos, arsenm Reviewed By: arsenm Subscribers: saiislam, wdng, arsenm, jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D73077	2020-02-04 21:55:29 +00:00
protze@itc.rwth-aachen.de	90e4ebdce5	[OpenMP][OMPT] fix reduction test for 32-bit x86 Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=44733 \| TEST 'libomp :: ompt/synchronization/reduction/tree_reduce.c' FAILED on 32-bit x86 ]] For 32-bit we need at least 3 variables to avoid atomic reduction to be choosen by runtime function `__kmp_determine_reduction_method`. This patch adds reduction variables to the testcase. Reviewers: mgorny, Hahnfeld Differential Revision: https://reviews.llvm.org/D73850	2020-02-04 12:19:10 +01:00
Jon Chesterfield	ab9762a9f5	Revert "[nfc][libomptarget] Remove SHARED annotation from local variables" This reverts commit `0e9374e374`. Revert D73239. It fails some local testing, cause presently unknown	2020-01-27 20:05:17 +00:00

1 2 3 4 5 ...

1201 Commits