llvm-project

Commit Graph

Author	SHA1	Message	Date
Joseph Huber	ae95ceeb8f	[OpenMP] Consolidate error handling and debug messages in Libomptarget Summary: This patch consolidates the error handling and messaging routines to a single file omptargetmessage. The goal is to simplify the error handling interface prior to adding more error handling support Reviewers: jdoerfert grokos ABataev AndreyChurbanov ronlieb JonChesterfield ye-luo tianshilei1992 Subscribers: danielkiss guansong jvesely kerbowa nhaehnle openmp-commits sstefan1 yaxunl	2020-09-01 15:28:19 -04:00
Alexey Bataev	6aa7228a62	[LIBOMPTARGET]Do not try to optimize bases for the next parameters. PrivateArgumentManager shall immediately allocate firstprivates if they are bases for the next parameters and the next paramaters rely on the fact that the base musst be allocated already. Differential Revision: https://reviews.llvm.org/D86781	2020-08-28 15:46:31 -04:00
Shilei Tian	46e0ced762	[OpenMP] Fixed wrong test command in the test private_mapping.c The test command in `private_mapping.c` was set to expect failure by mistake. It is fixed in this patch. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D86758	2020-08-28 12:19:46 -04:00
Joseph Huber	7a5a74ea96	[OpenMP] Always emit debug messages that indicate offloading failure Summary: This patch changes the libomptarget runtime to always emit debug messages that occur before offloading failure. The goal is to provide users with information about why their application failed in the target region rather than a single failure message. This is only done in regions that precede offloading failure so this should not impact runtime performance. if the debug environment variable is set then the message is forwarded to the debug output as usual. A new environment variable was added for future use but does nothing in this current patch. LIBOMPTARGET_INFO will be used to report runtime information to the user if requrested, such as grid size, SPMD usage, or data mapping. It will take an integer indicating the level of information verbosity and a value of 0 will disable it. Reviewers: jdoerfort Subscribers: guansong sstefan1 yaxunl ye-luo Tags: #OpenMP Differential Revision: https://reviews.llvm.org/D86483	2020-08-26 19:30:41 -04:00
JonChesterfield	5d989fb37d	[libomptarget][amdgpu] Improve thread safety, remove dead code	2020-08-26 22:04:03 +01:00
Jon Chesterfield	28fbf422f2	[libomptarget][amdgpu] Update plugin CMake to work with latest rocr library	2020-08-26 20:01:42 +01:00
Shilei Tian	0775c1dfbc	[OpenMP] Pack first-private arguments to improve efficiency of data transfer In this patch, we pack all small first-private arguments, allocate and transfer them all at once to reduce the number of data transfer which is very expensive. Let's take the test case as example. ``` int main() { int data1[3] = {1}, data2[3] = {2}, data3[3] = {3}; int sum[16] = {0}; #pragma omp target teams distribute parallel for map(tofrom: sum) firstprivate(data1, data2, data3) for (int i = 0; i < 16; ++i) { for (int j = 0; j < 3; ++j) { sum[i] += data1[j]; sum[i] += data2[j]; sum[i] += data3[j]; } } } ``` Here `data1`, `data2`, and `data3` are three first-private arguments of the target region. In the previous `libomptarget`, it called data allocation and data transfer three times, each of which allocated and transferred 12 bytes. With this patch, it only calls allocation and transfer once. The size is `(12+4)3=48` where 12 is the size of each array and 4 is the padding to keep the address aligned with 8. It is implemented in this way: 1. First collect all information for those first*-private arguments. _private_ arguments are not the case because private arguments don't need to be mapped to target device. It just needs a data allocation. With the patch for memory manager, the data allocation could be very cheap, especially for the small size. For each qualified argument, push a place holder pointer `nullptr` to the `vector` for kernel arguments, and we will update them later. 2. After we have all information, create a buffer that can accommodate all arguments plus their paddings. Copy the arguments to the buffer at the right place, i.e. aligned address. 3. Allocate a target memory with the same size as the host buffer, transfer the host buffer to target device, and finally update all place holder pointers in the arguments `vector`. The reason we only consider small arguments is, the data transfer is asynchronous. Therefore, for the large argument, we could continue to do things on the host side meanwhile, hopefully, the data is also being transferred. The "small" is defined by that the argument size is less than a predefined value. Currently it is 1024. I'm not sure whether it is a good one, and that is an open question. Another question is, do we need to make it configurable via an environment variable? Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D86307	2020-08-25 16:06:29 -04:00
Shilei Tian	f93b42a629	[NFC][OpenMP] Remove outdated comments about potential issues The issue mentioned has been fixed in D84996	2020-08-24 01:21:06 +00:00
Shilei Tian	0289696751	[OpenMP] Introduce target memory manager Target memory manager is introduced in this patch which aims to manage target memory such that they will not be freed immediately when they are not used because the overhead of memory allocation and free is very large. For CUDA device, cuMemFree even blocks the context switch on device which affects concurrent kernel execution. The memory manager can be taken as a memory pool. It divides the pool into multiple buckets according to the size such that memory allocation/free distributed to different buckets will not affect each other. In this version, we use the exact-equality policy to find a free buffer. This is an open question: will best-fit work better here? IMO, best-fit is not good for target memory management because computation on GPU usually requires GBs of data. Best-fit might lead to a serious waste. For example, there is a free buffer of size 1960MB, and now we need a buffer of size 1200MB. If best-fit, the free buffer will be returned, leading to a 760MB waste. The allocation will happen when there is no free memory left, and the memory free on device will take place in the following two cases: 1. The program ends. Obviously. However, there is a little problem that plugin library is destroyed before the memory manager is destroyed, leading to a fact that the call to target plugin will not succeed. 2. Device is out of memory when we request a new memory. The manager will walk through all free buffers from the bucket with largest base size, pick up one buffer, free it, and try to allocate immediately. If it succeeds, it will return right away rather than freeing all buffers in free list. Update: A threshold (8KB by default) is set such that users could control what size of memory will be managed by the manager. It can also be configured by an environment variable `LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD`. Reviewed By: jdoerfert, ye-luo, JonChesterfield Differential Revision: https://reviews.llvm.org/D81054	2020-08-19 23:12:23 -04:00
Shilei Tian	83c3d07994	[OpenMP] Refactored the function `DeviceTy::data_exchange` This patch contains the following changes: 1. Renamed the function `DeviceTy::data_exchange` to `DeviceTy::dataExchange`; 2. Changed the second argument `DeviceTy DstDev` to `DeviceTy &DstDev`; 3. Renamed the last argument. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D86238	2020-08-19 16:08:14 -04:00
Jon Chesterfield	6e1b11087f	[libomptarget][amdgpu] Support building with static rocm libraries	2020-08-19 15:44:30 +01:00
George Rokos	32ebdc70f3	[libomptarget][NFC] Sort list of plugins in chronological order Differential Revision: https://reviews.llvm.org/D86082	2020-08-17 08:33:36 -07:00
Johannes Doerfert	5272d29e2c	[OpenMP][CUDA] Keep one kernel list per device, not globally. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D86039	2020-08-16 14:38:35 -05:00
Johannes Doerfert	aa27cfc1e7	[OpenMP][CUDA] Cache the maximal number of threads per block (per kernel) Instead of calling `cuFuncGetAttribute` with `CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation, we can do it for the first one and cache the result as part of the `KernelInfo` struct. The only functional change is that we now expect `cuFuncGetAttribute` to succeed and otherwise propagate the error. Ignoring any error seems like a slippery slope... Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D86038	2020-08-16 14:38:33 -05:00
Jon Chesterfield	d0b312955f	[libomptarget] Implement host plugin for amdgpu [libomptarget] Implement host plugin for amdgpu Replacement for D71384. Primary difference is inlining the dependency on atmi followed by extensive simplification and bugfixes. This is the latest version from https://github.com/ROCm-Developer-Tools/amd-llvm-project/tree/aomp12 with minor patches and a rename from hsa to amdgpu, on the basis that this can't be used by other implementations of hsa without additional work. This will not build unless the ROCM_DIR variable is passed so won't break other builds. That variable is used to locate two amdgpu specific libraries that ship as part of rocm: libhsakmt at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface libhsa-runtime64 at https://github.com/RadeonOpenCompute/ROCR-Runtime These libraries build from source. The build scripts in those repos are for shared libraries, but can be adapted to statically link both into this plugin. There are caveats. - This works well enough to run various tests and benchmarks, and will be used to support the current clang bring up - It is adequately thread safe for the above but there will be races remaining - It is not stylistically correct for llvm, though has had clang-format run - It has suboptimal memory management and locking strategies - The debug printing / error handling is inconsistent I would like to contribute this pretty much as-is and then improve it in-tree. This would be advantagous because the aomp12 branch that was in use for fixing this codebase has just been joined with the amd internal rocm dev process. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85742	2020-08-15 23:58:28 +01:00
Joel E. Denny	518a27e559	[OpenMP] Fix ref count dec for implicit map of partial data D85342 broke this case. The new test case presents an example. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D85369	2020-08-06 11:39:29 -04:00
Joel E. Denny	8c8bb128df	[OpenMP] Fix `target data` exit for array extension For example: ``` #pragma omp target data map(tofrom:arr[0:100]) { #pragma omp target exit data map(delete:arr[0:100]) #pragma omp target enter data map(alloc:arr[98:2]) } ``` Without this patch, the transfer at the end of the target data region is broken and fails depending on the target device. According to my read of the spec, the transfer shouldn't even be attempted because `arr[0:100]` isn't (fully) present there. To fix that, this patch makes `DeviceTy::getTgtPtrBegin` return null for this case. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D85342	2020-08-05 16:51:25 -04:00
Joel E. Denny	41b1aefecb	[OpenMP] Fix `present` diagnostic for array extension For example, without this patch, the following fails as expected with or without the `present` modifier, but the `present` modifier doesn't produce its usual diagnostic: ``` #pragma omp target data map(alloc: arr[0:2]) { #pragma omp target map(present, tofrom: arr[0:100]) // not fully present ; } ``` Reviewed By: grokos, vzakhari Differential Revision: https://reviews.llvm.org/D85320	2020-08-05 16:51:24 -04:00
George Rokos	40470eb27a	[libomptarget][NFC] Replace `%ld` with PRId64 for data of type int64_t. The standard way of printing `int64_t` data is via the PRId64 macro, `ld` is for `long int` and int64_t is not guaranteed to be typedef'ed as `long int` on all platforms. E.g. on Windows we get mismatch warnings. Differential Revision: https://reviews.llvm.org/D85353	2020-08-05 13:28:35 -07:00
Alexey Bataev	6780d5675b	[LIBOMPTARGET]Fix order of mapper data for targetDataEnd function. targetDataMapper function fills arrays with the mapping data in the direct order. When this function is called by targetDataBegin or tgt_target_update functions, it works as expected. But targetDataEnd function processes mapped data in reverse order. In this case, the base pointer might be deleted before the associated data is deleted. Need to reverse data, mapped by mapper, too, since it always adds data that must be deleted at the end of the buffer. Fixes the test declare_mapper_target_update.cpp. Also, reduces the memry fragmentation by preallocation the memory buffers. Differential Revision: https://reviews.llvm.org/D85216	2020-08-05 13:42:24 -04:00
Joel E. Denny	5ab43989c3	[OpenMP] Fix `omp target update` for array extension OpenMP TR8 sec. 2.15.6 "target update Construct", p. 183, L3-4 states: > If the corresponding list item is not present in the device data > environment and there is no present modifier in the clause, then no > assignment occurs to or from the original list item. L10-11 states: > If a present modifier appears in the clause and the corresponding > list item is not present in the device data environment then an > error occurs and the program termintates. (OpenMP 5.0 also has the first passage but without mention of the present modifier of course.) In both passages, I assume "is not present" includes the case of partially but not entirely present. However, without this patch, the target update directive misbehaves in this case both with and without the present modifier. For example: ``` #pragma omp target enter data map(to:arr[0:3]) #pragma omp target update to(arr[0:5]) // might fail on data transfer #pragma omp target update to(present:arr[0:5]) // might fail on data transfer ``` The problem is that `DeviceTy::getTgtPtrBegin` does not return a null pointer in that case, so `target_data_update` sees the data as fully present, and the data transfer then might fail depending on the target device. However, without the present modifier, there should never be a failure. Moreover, with the present modifier, there should always be a failure, and the diagnostic should mention the present modifier. This patch fixes `DeviceTy::getTgtPtrBegin` to return null when `target_data_update` is the caller. I'm wondering if it should do the same for more callers. Reviewed By: grokos, jdoerfert Differential Revision: https://reviews.llvm.org/D85246	2020-08-05 10:03:31 -04:00
Joel E. Denny	002d61db2b	[OpenMP] Fix `present` for exit from `omp target data` Without this patch, the following example fails but shouldn't according to OpenMP TR8: ``` #pragma omp target enter data map(alloc:i) #pragma omp target data map(present, alloc: i) { #pragma omp target exit data map(delete:i) } // fails presence check here ``` OpenMP TR8 sec. 2.22.7.1 "map Clause", p. 321, L23-26 states: > If the map clause appears on a target, target data, target enter > data or target exit data construct with a present map-type-modifier > then on entry to the region if the corresponding list item does not > appear in the device data environment an error occurs and the > program terminates. There is no corresponding statement about the exit from a region. Thus, the `present` modifier should: 1. Check for presence upon entry into any region, including a `target exit data` region. This behavior is already implemented correctly. 2. Should not check for presence upon exit from any region, including a `target` or `target data` region. Without this patch, this behavior is not implemented correctly, breaking the above example. In the case of `target data`, this patch fixes the latter behavior by removing the `present` modifier from the map types Clang generates for the runtime call at the end of the region. In the case of `target`, we have not found a valid OpenMP program for which such a fix would matter. It appears that, if a program can guarantee that data is present at the beginning of a `target` region so that there's no error there, that data is also guaranteed to be present at the end. This patch adds a comment to the runtime to document this case. Reviewed By: grokos, RaviNarayanaswamy, ABataev Differential Revision: https://reviews.llvm.org/D84422	2020-08-05 10:03:31 -04:00
Shilei Tian	f2400f024d	[OpenMP] Fixed the issue that target memory deallocation might be called when they're being used This patch fixed the issue that target memory might be deallocated when they're still being used or before they're used. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84996	2020-07-31 18:54:18 -04:00
Shilei Tian	0f10165626	[OpenMP] Refactored the function `targetDataEnd` Refactored the function `targetDataEnd` to make preparation of fixing the issue of ahead-of-time target memory deallocation. This patch only renamed `targetDataEnd` related variables and functions to conform with LLVM code standard. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84991	2020-07-30 21:39:26 -04:00
Shilei Tian	8218eee269	[OpenMP] Refactored the function `target` Refactored the function `target` to make preparation for fixing the issue of ahead-of-time device memory deallocation. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84816	2020-07-30 21:05:55 -04:00
Alexey Bataev	622e46156d	[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region. Need to map the base pointer for all directives, not only target data-based ones. The base pointer is mapped for array sections, array subscript, array shaping and other array-like constructs with the base pointer. Also, codegen for use_device_ptr clause was modified to correctly handle mapping combination of array like constructs + use_device_ptr clause. The data for use_device_ptr clause is emitted as the last records in the data mapping array. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84767	2020-07-30 11:18:33 -04:00
Alexey Bataev	b69357c2f4	Revert "[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region." This reverts commit `142d0d3ed8` to investigate undefined behavior revealed by buildbots.	2020-07-30 10:57:56 -04:00
Alexey Bataev	142d0d3ed8	[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region. Need to map the base pointer for all directives, not only target data-based ones. The base pointer is mapped for array sections, array subscript, array shaping and other array-like constructs with the base pointer. Also, codegen for use_device_ptr clause was modified to correctly handle mapping combination of array like constructs + use_device_ptr clause. The data for use_device_ptr clause is emitted as the last records in the data mapping array. It applies only for global pointers. Differential Revision: https://reviews.llvm.org/D84767	2020-07-30 09:40:05 -04:00
Joel E. Denny	cee52dd026	[OpenMP] Implement TR8 `present` motion modifier in runtime (2/2) This patch implements OpenMP runtime support for the OpenMP TR8 `present` motion modifier for `omp target update` directives. The previous patch in this series implements Clang front end support. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D84712	2020-07-29 12:18:50 -04:00
Shilei Tian	30440924d4	[OpenMP] Replaced mutex lock/unlock in `target` with `std::lock_guard` Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84799	2020-07-28 20:31:40 -04:00
Joel E. Denny	65564e5eaf	Revert "[OpenMP] Implement TR8 `present` motion modifier in runtime (2/2)" This reverts commit `2cb926a447`. It depends on `3c3faae497`, which is being reverted.	2020-07-28 20:30:05 -04:00
Shilei Tian	3ce69d4d50	[NFC][OpenMP] Renamed all variable and function names in `target` to conform with LLVM code standard This patch only touched variables and functions in `target`. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84797	2020-07-28 20:11:09 -04:00
Joel E. Denny	2cb926a447	[OpenMP] Implement TR8 `present` motion modifier in runtime (2/2) This patch implements OpenMP runtime support for the OpenMP TR8 `present` motion modifier for `omp target update` directives. The previous patch in this series implements Clang front end support. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D84712	2020-07-28 19:15:18 -04:00
Joel E. Denny	9b4826d18b	[OpenMP] Fix libomptarget negative tests to expect abort On runtime failures, D83963 causes the runtime to abort instead of merely exiting with a non-zero value, but many tests in the libomptarget test suite still expect the former behavior. This patch updates the test suite and was discussed in post-commit comments on D83963 and D84557.	2020-07-28 09:02:16 -04:00
Joachim Protze	e2f5444c9c	[OpenMP][Tests] Enable nvptx64 testing for most libomptarget tests Also add $BUILD/lib to the LIBRARY_PATH to fix https://bugs.llvm.org/show_bug.cgi?id=46836. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D84557	2020-07-28 11:08:24 +02:00
Ye Luo	9323166601	[OpenMP] Add more pass-through functions in DeviceTy Summary: 1. Add DeviceTy::data_alloc, DeviceTy::data_delete, DeviceTy::data_alloc, DeviceTy::synchronize pass-through functions. Avoid directly accessing Device.RTL 2. Fix the type of the first argument of synchronize_ty in rth.h, device id is int32_t which is consistent with other functions. Reviewers: tianshilei1992, jdoerfert Reviewed By: tianshilei1992 Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D84487	2020-07-27 16:08:30 -04:00
Johannes Doerfert	9c87466c39	[OpenMP] Use `abort` not `error` for fatal runtime exceptions See PR46515 for the rational but generally, we want to really abort not gracefully shut down. Reviewed By: grokos, ABataev Differential Revision: https://reviews.llvm.org/D83963	2020-07-24 15:15:38 -05:00
Shilei Tian	c0185dc7df	Revert "[OpenMP] Wait for kernel prior to memory deallocation" This reverts commit `9b2832c089`.	2020-07-22 23:03:36 -04:00
Shilei Tian	9b2832c089	[OpenMP] Wait for kernel prior to memory deallocation Summary: In the function `target`, memory deallocation and `target_data_end` is called immediately returning from launching kernel. This might cause a race condition that the corresponding memory is still being used by the kernel and a potential issue that when the kernel starts to execute, its required data have already been deallocated, especially when multiple kernels running concurrently. Since nevertheless, we will block the thread issuing the target offloading at the end of the target, we just move the synchronization ahead a little bit to make sure the correctness. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D84381	2020-07-22 22:55:34 -04:00
Joel E. Denny	708752b2f6	[OpenMP] Implement TR8 `present` map type modifier in runtime (2/2) This implements OpenMP runtime support for the OpenMP TR8 `present` map type modifier. The previous patch in this series implements Clang front end support. See that patch summary for behaviors that are not yet supported. Reviewed By: grokos, jdoerfert Differential Revision: https://reviews.llvm.org/D83062	2020-07-22 14:04:58 -04:00
Joel E. Denny	fc247c8f3c	Revert "[OpenMP] Implement TR8 `present` map type modifier in runtime (2/2)" This reverts commit `45b8f7ec35`. It attempts to use debug macros `DPxMOD` and `DPxPTR` in release builds. Will fix and reapply later.	2020-07-22 11:22:08 -04:00
Joel E. Denny	45b8f7ec35	[OpenMP] Implement TR8 `present` map type modifier in runtime (2/2) This implements OpenMP runtime support for the OpenMP TR8 `present` map type modifier. The previous patch in this series implements Clang front end support. See that patch summary for behaviors that are not yet supported. Reviewed By: grokos, jdoerfert Differential Revision: https://reviews.llvm.org/D83062	2020-07-22 10:15:32 -04:00
Joachim Protze	ae31d7838c	[OpenMP][NFC] pass on env variables to libomptarget tests	2020-07-22 12:14:45 +02:00
George Rokos	140ab574a1	[OpenMP][Offload] Declare mapper runtime implementation Libomptarget patch adding runtime support for "declare mapper". Patch co-developed by Lingda Li and George Rokos. Differential revision: https://reviews.llvm.org/D68100	2020-07-15 18:11:43 -07:00
Johannes Doerfert	5937434677	[OpenMP] Silence unused symbol warning with proper ifdefs	2020-07-11 11:57:42 -05:00
Johannes Doerfert	c98699582a	[OpenMP][NFC] Remove unused (always fixed) arguments There are various runtime calls in the device runtime with unused, or always fixed, arguments. This is bad for all sorts of reasons. Clean up two before as we match them in OpenMPOpt now. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83268	2020-07-11 00:51:51 -05:00
Johannes Doerfert	cd0ea03e6f	[OpenMP][NFC] Remove unused and untested code from the device runtime Summary: We carried a lot of unused and untested code in the device runtime. Among other reasons, we are planning major rewrites for which reduced size is going to help a lot. The number of code lines reduced by 14%! Before: ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- CUDA 13 489 841 2454 C/C++ Header 14 322 493 1377 C 12 117 124 559 CMake 4 64 64 262 C++ 1 6 6 39 ------------------------------------------------------------------------------- SUM: 44 998 1528 4691 ------------------------------------------------------------------------------- After: ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- CUDA 13 366 733 1879 C/C++ Header 14 317 484 1293 C 12 117 124 559 CMake 4 64 64 262 C++ 1 6 6 39 ------------------------------------------------------------------------------- SUM: 44 870 1411 4032 ------------------------------------------------------------------------------- Reviewers: hfinkel, jhuber6, fghanim, JonChesterfield, grokos, AndreyChurbanov, ye-luo, tianshilei1992, ggeorgakoudis, Hahnfeld, ABataev, hbae, ronlieb, gregrodgers Subscribers: jvesely, yaxunl, bollu, guansong, jfb, sstefan1, aaron.ballman, openmp-commits, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D83349	2020-07-10 19:09:41 -05:00
Ye Luo	c5348aecd7	[OpenMP] Use primary context in CUDA plugin Summary: Retaining per device primary context is preferred to creating a context owned by the plugin. From CUDA documentation 1. Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used." from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html 2. Right under cuCtxCreate. In most cases it is recommended to use cuDevicePrimaryCtxRetain. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf 3. The primary context is unique per device and shared with the CUDA runtime API. These functions allow integration with other libraries using CUDA. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX Two issues are addressed by this patch: 1. Not using the primary context caused interoperability issue with libraries like cublas, cusolver. CUBLAS_STATUS_EXECUTION_FAILED and cudaErrorInvalidResourceHandle 2. On OLCF summit, "Error returned from cuCtxCreate" and "CUDA error is: invalid device ordinal" Regarding the flags of the primary context. If it is inactive, we set CU_CTX_SCHED_BLOCKING_SYNC. If it is already active, we respect the current flags. Reviewers: grokos, ABataev, jdoerfert, protze.joachim, AndreyChurbanov, Hahnfeld Reviewed By: jdoerfert Subscribers: openmp-commits, yaxunl, guansong, sstefan1, tianshilei1992 Tags: #openmp Differential Revision: https://reviews.llvm.org/D82718	2020-07-07 10:14:51 -04:00
Saiyedul Islam	38d6640ba5	[libomptarget] Implement atomic inc and fence functions for AMDGCN using clang builtins This function uses __builtin_amdgcn_atomic_inc32(): uint32_t atomicInc(uint32_t *address, uint32_t max); These functions use __builtin_amdgcn_fence(): __kmpc_impl_threadfence() __kmpc_impl_threadfence_block() __kmpc_impl_threadfence_system() They will take place of current mechanism of directly calling IR functions. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83132	2020-07-07 06:36:25 +00:00
Fangrui Song	6ba4380ed6	[libomptarget][test] Fix text relocations by adding -fPIC	2020-07-05 12:51:28 -07:00

1 2 3 4 5 ...

327 Commits