llvm-project

Commit Graph

Author	SHA1	Message	Date
Han Zhu	1eaebe192f	[openmp] Use config.test_extra_flags in archer and multiplex tests Summary: `config.test_extra_flags` is passed in from `lit.site.cfg.in` files, but they're not used in the LIT configs. This variable can be useful for distros which don't have the standard c/c++ headers in the default search paths. Since the tests run clang on c/c++ source code, we rely on `test_extra_flags` to pass in the necessary header files. This is a similar setup that's also done in litomptarget https://github.com/llvm/llvm-project/blob/master/openmp/libomptarget/test/lit.cfg#L42 and openmp/runtime. Reviewers: jdoerfert, jdenny, protze.joachim Reviewed By: jdoerfert Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D82516	2020-06-25 11:58:52 -07:00
Ye Luo	6e5f64c44f	[OpenMP] Adopt std::set in HostDataToTargetMap Summary: lookupMapping took significant time due to linear complexity searching. This is bad for offloading from multiple host threads because lookupMapping is protected by mutex. Use std::set for logarithmic complexity searching. Before my change. libomptarget inclusive time 16.7 sec, exclusive time 8.6 sec. After the change libomptarget inclusive time 7.3 sec, exclusive time 0.4 sec. Most of the overhead of libomptarget (exclusive time) is gone. Reviewers: jdoerfert, grokos Reviewed By: grokos Subscribers: tianshilei1992, yaxunl, guansong, sstefan1 Tags: #openmp Differential Revision: https://reviews.llvm.org/D82264	2020-06-24 12:22:45 -04:00
Joachim Protze	73b7ff4e16	[OpenMP] NFC: Create OpenMP release notes file	2020-06-24 13:42:32 +02:00
Joachim Protze	63a3c5925d	[OpenMP][OMPT] Pass mutexinoutset to the tool Adds OMPT support for the mutexinoutset dependency Reviewed by: hbae Differential Revision: https://reviews.llvm.org/D81890	2020-06-19 12:51:18 +02:00
Shilei Tian	aaf50adb53	Revert "[OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info" This reverts commit `ee1bf45e1d`.	2020-06-17 15:01:16 -04:00
Shilei Tian	ee1bf45e1d	[OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info DeviceID is added for some cases that we only have the __tgt_async_info but do not know its corresponding device id. However, to communicate with target plugins, we need that information. Event is added for another way to synchronize.	2020-06-17 14:29:09 -04:00
Alexey Bataev	08029595ca	[OPENMP]Fix overflow during counting the number of iterations. Summary: The OpenMP loops are normalized and transformed into the loops from 0 to max number of iterations. In some cases, original scheme may lead to overflow during calculation of number of iterations. If it is unknown, if we can end up with overflow or not (the bounds are not constant and we cannot define if there is an overflow), cast original type to the unsigned. Reviewers: jdoerfert Subscribers: yaxunl, guansong, sstefan1, openmp-commits, cfe-commits, caomhin Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D81881	2020-06-17 08:47:01 -04:00
Joachim Protze	8580af3f7d	subdirectories should not use cmake project command	2020-06-17 09:38:56 +02:00
Joachim Protze	e9b8ed1fd7	[OpenMP][Tool] Header-only multiplexing of OMPT tools Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D76012	2020-06-17 09:16:46 +02:00
Joachim Protze	cbea36903e	[OpenMP][OMPT] Add callbacks for doacross loops Adds the callbacks for ordered with source/sink dependencies. The test for task dependencies changed, because callbach.h now actually prints the passed dependencies and the test also checks for the address. Reviewed by: hbae Differential Revision: https://reviews.llvm.org/D81807	2020-06-16 16:53:40 +02:00
Joachim Protze	9e5aefc5f9	[OpenMP][Tests] fix data race in an OpenMP runtime test Reviewed by: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D81804	2020-06-15 18:48:35 +02:00
Joachim Protze	d056d7592a	[OpenMP][Tool] Extend reuse of OMPT testing This patch allows to specify a prefix (default:empty) to be included into print-out written by callback.h. Also adding a cmake target to find the header file from other tests. Reviewed by: jdoerfert Differential Revision: https://reviews.llvm.org/D76008	2020-06-14 15:55:32 +02:00
Joachim Protze	add8d90cb3	[OpenMP] support alloc of serialized tasks Reviewed by: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D81497	2020-06-14 15:55:32 +02:00
Joachim Protze	e7577d1d76	Remove mention of counter from Archer readme The feature was removed before upstreaming Archer, so the documentation is wrong	2020-06-05 14:31:03 +02:00
Shilei Tian	a014fbbc21	[OpenMP] Improve D2D memcpy to use more efficient driver API Summary: In current implementation, D2D memcpy is first to copy data back to host and then copy from host to device. This is very efficient if the device supports D2D memcpy, like CUDA. In this patch, D2D memcpy will first try to use native supported driver API. If it fails, fall back to original way. It is worth noting that D2D memcpy in this scenerio contains two ideas: - Same devices: this is the D2D memcpy in the CUDA context. - Different devices: this is the PeerToPeer memcpy in the CUDA context. My implementation merges this two parts. It chooses the best API according to the source device and destination device. Reviewers: jdoerfert, AndreyChurbanov, grokos Reviewed By: jdoerfert Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D80649	2020-06-04 16:59:06 -04:00
AndreyChurbanov	abe64360ae	[openmp] Fixed nonmonotonic schedule implementation. Differential Revision: https://reviews.llvm.org/D80942	2020-06-04 15:39:45 +03:00
Joachim Protze	10995c77b4	[OpenMP][OMPT] Fix and add event callbacks for detached tasks The OpenMP spec has the task-fulfill event for a call to omp_fulfill_event. If the task did not yet finish execution, ompt_task_early_fulfill is used, otherwise ompt_task_late_fulfill. If a task does not complete, when the execution finishes (i.e., the task goes in detached mode), ompt_task_detach instead of ompt_task_complete must be used, when the next task is scheduled. A test for both cases is included, which only work with clang-11+ Reviewed By: hbae Differential revision: https://reviews.llvm.org/D80843	2020-06-02 09:52:40 +02:00
AndreyChurbanov	5e111c5df8	[openmp] Fixed taskloop recursive splitting so that taskloop tasks have same parent tasks. Differential Revision: https://reviews.llvm.org/D80577	2020-06-01 17:51:02 +03:00
Joachim Protze	3895148d7c	[OpenMP] Fix a race in task queue reallocation __kmp_realloc_task_deque implicitly assumes, that the task queue is full (ntasks == size), therefore tail = size in line 319. An assertion is added to document this assumption. The first check for a full queue is before the locking and might not hold when the lock is taken. So, we need to check again for this condition when we have the lock. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D80480	2020-05-25 10:23:22 +02:00
AndreyChurbanov	57d8b8d6f0	[openmp] Fixed hang if detached task was serialized. The patch fixes https://bugs.llvm.org/show_bug.cgi?id=45904. Differential Revision: https://reviews.llvm.org/D79944	2020-05-18 15:32:13 +03:00
Joachim Protze	d23131a3c0	[OpenMP] Fix race condition in the completion/freeing of detached tasks Spurious assertion failures are symptoms of a race condition for the handling of detached tasks: Assertion failure at kmp_tasking.cpp(3744): taskdata->td_flags.complete == 1. Assertion failure at kmp_tasking.cpp(710): taskdata->td_flags.executing == 0. in the case of detach=true, all accesses to taskdata in __kmp_task_finish need to happen before (~line 873): taskdata->td_flags.proxy = TASK_PROXY; This assignment signals to __kmp_fulfill_event, that the task will need to be freed there. So, conceptionally the ownership of taskdata is moved. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D79702	2020-05-17 12:28:38 +02:00
Manoel Roemmer	6b9e43c67e	[Openmp][VE] Libomptarget plugin for NEC SX-Aurora This patch adds a libomptarget plugin for the NEC SX-Aurora TSUBASA Vector Engine (VE target). The code is largely based on the existing generic-elf plugin and uses the NEC VEO and VEOSINFO libraries for offloading. Differential Revision: https://reviews.llvm.org/D76843	2020-05-12 10:47:30 +02:00
Joel E. Denny	dd5ba4b585	[OpenMP][NFC] Fix `not` sustitution in tests D78566 introduced a `\bnot\b` lit substitution in OpenMP test suites. However, that would corrupt a command like `FileCheck -implicit-check-not` or any file name like `%t.not`. We could use lookbehind/lookahead assertions to avoid such cases, but this patch switches to `%not` (suggested during the D78566 review) as a safer option. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D79529	2020-05-11 14:53:48 -04:00
Shilei Tian	cb038927ef	[OpenMP] Fix an issue of wrong return type of DeviceRTLTy::getNumOfDevices Summary: There is a typo in DeviceRTLTy::getNumOfDevices that the type of its return value is bool. It will lead to a problem of wrong device number returned from omp_get_num_devices. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D79255	2020-05-03 15:59:06 -04:00
Ron Lieberman	ee9c53d271	[libomptarget] Initialize reference parameter IsNew within Device::getOrAllocTgtPtr The two locals IsNew and Pointer_IsNew were uninitialized at declaration, and then passed by reference to Device.getOrAllocTgtPtr which in turn did not assign on all paths within the function. This resulted in occasional runtime failures in one application. Device::getOrAllocTgtPtr will now initialize IsNew to false on entry to function. Differential Revision: https://reviews.llvm.org/D78744	2020-04-24 15:33:37 -05:00
Joel E. Denny	5f6aa9680c	[OpenMP] target_data_begin: fail on device alloc fail Without this patch, target_data_begin continues after an illegal mapping or an out-of-memory error on the device. With this patch, it terminates the runtime with an error instead. The new test exercises only illegal mappings. I didn't think of a good way to exercise out-of-memory errors from the test suite. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D78170	2020-04-21 17:10:50 -04:00
Joel E. Denny	ba942610f6	[OpenMP] Add scaffolding for negative runtime tests Without this patch, the openmp project's test suites do not appear to have support for negative tests. However, D78170 needs to add a test that an expected runtime failure occurs. This patch makes `not` visible in all of the openmp project's test suites. In all but `libomptarget/test`, it should be possible for a test author to insert `not` before a use of the lit substitution for running a test program. In `libomptarget/test`, that substitution is target-specific, and its value is `echo` when the target is not available. In that case, inserting `not` before a lit substitution would expect an `echo` fail, so this patch instead defines a separate lit substitution for expected runtime fails. Reviewed By: jdoerfert, Hahnfeld Differential Revision: https://reviews.llvm.org/D78566	2020-04-21 17:10:50 -04:00
Bryan Chan	b86ff5f6ef	[OpenMP] Sync writes to child thread's data before reduction On systems with weak memory consistency, this patch fixes an intermittent crash in the reduction function called by __kmp_hyper_barrier_gather, which suffers from a race on a child thread's data. Reviewed-By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D77603	2020-04-14 14:31:06 -04:00
Shilei Tian	4031bb982b	[OpenMP] Refined CUDA plugin to put all CUDA operations into class Summary: Current implementation mixed everything up so that there is almost no encapsulation. In this patch, all CUDA related operations are put into a new class DeviceRTLTy and only necessary functions are exposed. In addition, all C++ code now conforms with LLVM code standard, keeping those API functions following C style. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: jfb, yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77951	2020-04-13 13:32:46 -04:00
Shilei Tian	feed674dec	[OpenMP] Introduce stream pool to make sure the correctness of device synchr... ...onization Summary: In previous patch, in order to optimize performance, we only synchronize once for each target region. The syncrhonization is via stream synchronization. However, in the extreme situation, the performce might be bad. Consider the following case: There is a task that requires transferring huge amount of data (call many times of data transferring function). It is scheduled to the first stream. And then we have 255 very light tasks scheduled to the remaining 255 streams (by default we have 256 streams). They can be finished before we do synchronization at the end of the first task. Next, we get another very huge task. It will be scheduled again to the first stream. Now the first task finishes its kernel launch and call stream synchronization. Right now, the stream already contains two kernels, and the synchronization will wait until the two kernels finish instead of just the first one for the first task. In this patch, we introduce stream pool. After each synchronization, the stream will be returned back to the pool to make sure that for each synchronization, only expected operations are waited. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: gregrodgers, yaxunl, lildmh, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77412	2020-04-11 07:08:56 -04:00
Shilei Tian	03ff643d2e	[OpenMP] Put old APIs back and added new _async series for backward compatibility Summary: According to comments on bi-weekly meeting, this patch put back old APIs and added new `_async` series Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77822	2020-04-09 22:40:58 -04:00
Shilei Tian	32ed29271f	[OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream Summary: This patch introduces two things for offloading: 1. Asynchronous data transferring: those functions are suffix with `_async`. They have one more argument compared with their synchronous counterparts: `__tgt_async_info`, which is a new struct that only has one field, `void Identifier`. This struct is for information exchange between different asynchronous operations. It can be used for stream selection, like in this case, or operation synchronization, which is also used. We may expect more usages in the future. 2. Optimization of stream selection for data mapping. Previous implementation was using asynchronous device memory transfer but synchronizing after each memory transfer. Actually, if we say kernel A needs four memory copy to device and two memory copy back to host, then we can schedule these seven operations (four H2D, two D2H, and one kernel launch) into a same stream and just need synchronization after memory copy from device to host. In this way, we can save a huge overhead compared with synchronization after each operation. Reviewers: jdoerfert, ye-luo Reviewed By: jdoerfert Subscribers: yaxunl, lildmh, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77005	2020-04-07 14:55:47 -04:00
Kazuaki Ishizaki	4201679110	[OpenMP] NFC: Fix trivial typo Differential Revision: https://reviews.llvm.org/D77430	2020-04-04 12:06:54 +09:00
Vitaly Buka	c9ae3c5e10	[openmp] Disable tests flaky on Debian https://bugs.llvm.org/show_bug.cgi?id=45397	2020-04-01 21:58:05 -07:00
JonChesterfield	09834f9761	[libomptarget][nfc] Move non-freestanding headers out of common Summary: [libomptarget][nfc] Move non-freestanding headers out of common Lowers the bar for building deviceRTL. Drops math.h entirely as it wasn't used and libm is a big dependency. Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77071	2020-03-31 23:43:18 +01:00
Alexey Bataev	0fca766458	[OPENMP50]Fix PR45117: Orphaned task reduction should be allowed. Add support for orpahned task reductions.	2020-03-27 17:47:30 -04:00
Henry Kao	236ac68fa5	[OpenMP] Add memory barrier to solve data race Data race occurs when acquiring lock for critical section triggering assertion failure. Added barrier to ensure all memory is commited before checking assertion. Reviewed By: Hahnfeld Differential Revision: https://reviews.llvm.org/D76780	2020-03-27 16:32:28 -04:00
Jon Chesterfield	856c995436	[libomptarget] Add missing elf_end call in elf_common.c Summary: [libomptarget] Add missing elf_end call in elf_common.c Noticed when reviewing D76843. Reviewers: simoll, jdoerfert, efocht, AndreyChurbanov, grokos, manorom Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D76874	2020-03-26 19:07:33 +00:00
JonChesterfield	0813f41005	[libomptarget][nfc] Explicitly static function scope shared variables Summary: [libomptarget][nfc] Explicitly static function scope shared variables `__shared__` in CUDA implies static in function scope. See e.g. D.2.1.1 in CUDA_C_Programming_Guide.pdf, http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ This is surprising for non-cuda developers, see e.g. D73239 where I thought local variables would be thread local. Tested by IR diff of libomptarget.bc (no change), running in tree tests, and binary diff of the nvcc static archives (no significant change). Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D76713	2020-03-24 18:51:50 +00:00
AndreyChurbanov	ae044467ed	[openmp][runtime] Fixed hang for explicit task inside a taskloop. Added missed initialization of td_last_tied field for taskloop tasks. Differential Revision: https://reviews.llvm.org/D75673	2020-03-23 20:07:30 +03:00
Sylvestre Ledru	72fd1033ea	Doc: Links should use https	2020-03-22 22:49:33 +01:00
JonChesterfield	298527587c	[libomptarget][nfc] Disable amdgcn rtl build. The cmake logic for finding llvm is misbehaving.	2020-03-21 00:01:03 +00:00
George Rokos	0a42c9bfe4	Enable CUDA offloading on aarch64 host Differential Revision: https://reviews.llvm.org/D76469	2020-03-20 15:38:47 -07:00
Tom Scogland	a23d7282ca	openmp: fix memcpy memory leak Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D72637	2020-03-12 23:24:16 -05:00
Alexey Bataev	c422d69b1a	[LIBOMPTARGET]Fix PR45139: Bug in mixing Python and OpenMP target offload. Summary: Explicitly initialize data members of RTLsTy class upon construction. Reviewers: grokos Subscribers: guansong, openmp-commits, caomhin, kkwli0 Tags: #openmp Differential Revision: https://reviews.llvm.org/D75946	2020-03-11 09:12:02 -04:00
Jonas Hahnfeld	f0689d2e62	archer: Remove superfluous dot from warning message	2020-03-06 15:19:30 +01:00
Jon Chesterfield	221ada654b	[libomptarget] Implement locks for amdgcn Summary: [libomptarget] Implement locks for amdgcn The nvptx implementation deadlocks on amdgcn. atomic_cas with multiple active lanes can deadlock - if one lane succeeds, all the others are locked out. The set_lock implementation therefore runs on a single lane. Also uses a sleep intrinsic instead of the system clock for a probably minor performance improvement. The unset/test implementations may be revised later, based on code size / performance or similar concerns. This implements the lock at a per-wavefront scope. That's not strictly as specified, since openmp describes locks in terms of threads. I think the nvptx implementation provides true per-thread locking on volta and the same per-warp locking on other architectures. Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D75546	2020-03-05 20:25:31 +00:00
Jon Chesterfield	918a1065be	[libomptarget][nfc] Move GetWarp/LaneId functions into per arch code Summary: [libomptarget][nfc] Move GetWarp/LaneId functions into per arch code No code change for nvptx. Amdgcn currently has two implementations of GetLaneId, this patch keeps the one a colleague considered to be superior for our ISA. GetWarpId is currently the same function for amdgcn and nvptx, but I think it's cleaner to keep it grouped with all the others than to keep it in support.cu. Reviewers: jdoerfert, grokos, ABataev Reviewed By: jdoerfert Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D75587	2020-03-05 17:05:58 +00:00
Jon Chesterfield	84ac0dffd4	[libomptarget][nfc][amdgcn] Replace magic number with named intrinsic	2020-03-05 11:50:30 +00:00
Jon Chesterfield	133db44996	[libomptarget] Implement most hip atomic functions in terms of intrinsics Summary: [libomptarget] Implement hip atomic functions in terms of intrinsics All but atomicInc can be implemented using type generic clang intrinsics. There is not yet a corresponding intrinsic for atomicInc in clang, only one in LLVM. This patch leaves atomicInc as an unresolved symbol. Reviewers: jdoerfert, ABataev, hfinkel, grokos, arsenm Reviewed By: arsenm Subscribers: sri, saiislam, wdng, jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D73076	2020-03-04 17:56:40 +00:00

1 2 3 4 5 ...

1219 Commits