llvm-project

Commit Graph

Author	SHA1	Message	Date
Joel E. Denny	f17f7a5d4d	[OpenMP] Fix nvidia-cuda-toolkit detection on Debian/Ubuntu The OpenMP runtime's cmake scripts do not correctly locate the libdevice that the Debian/Ubuntu package nvidia-cuda-toolkit currently includes, at least on my Ubuntu 18.04.1 installation. This patch fixes that for me. This problem was discussed at length in D55269. D40453 added a similar adjustment in clang, but reviewers of D55269 concluded that, for the OpenMP runtime, the right place to address this problem is in cmake's CUDA support. However, it was also suggested we could add a workaround to OpenMP's cmake scripts now. This patch contains such a workaround, which I've tried to design so that it will have no harmful effect if cmake improves in the future. nvidia-cuda-toolkit also needs improvements because its intended monolithic CUDA tree shim, /usr/lib/cuda, has many empty directories, such as bin. I reported that at: <https://bugs.launchpad.net/ubuntu/+source/nvidia-cuda-toolkit/+bug/1808999> Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D55588 llvm-svn: 350377	2019-01-04 02:07:13 +00:00
Jonathan Peyton	76f3980a20	[OpenMP] Add omp_get_device_num() and update several other device API functions Add omp_get_device_num() function for 5.0 which returns the number of the device the current thread is running on. Currently, we are leaving it to the compiler to handle this properly if it is called inside target. Also, did some cleanup and updating of duplicate device API functions (in both libomp and libomptarget) to make them into weak functions that check for the symbol from libomptarget, and will call the version in libomptarget if it is present. If any additional device API functions are implemented also in libomptarget in the future, we should add the dlsym calls to the host functions. Also, if the omp_target_* functions are to be implemented for the host (this has been requested), they should attempt to call the libomptarget versions as well. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D55578 llvm-svn: 350352	2019-01-03 21:14:19 +00:00
Alexey Bataev	3c74be8049	[OPENMP][NVPTX]Fix incompatibility of __syncthreads with LLVM, NFC. Summary: One of the LLVM optimizations, split critical edges, also clones tail instructions. This is a dangerous operation for __syncthreads() functions and this transformation leads to undefined behavior or incorrect results. Patch fixes this problem by replacing __syncthreads() function with the assembler instruction, which cost is too high and wich cannot be copied. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56274 llvm-svn: 350333	2019-01-03 17:43:46 +00:00
Vyacheslav Zakharin	e889ac7e6b	[libomptarget] Added install component for libomptarget Differential Revision: https://reviews.llvm.org/D56108 llvm-svn: 350254	2019-01-02 19:39:49 +00:00
Alexey Bataev	d1cd005ec5	[OPENMP][NVPTX]Added/fixed debugging messages, NFC. Summary: Added or fixed new/old debugging messages for the better diagnostics. Reviewers: gtbercea, kkwli0, grokos Reviewed By: grokos Subscribers: caomhin, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D56102 llvm-svn: 350137	2018-12-28 21:36:09 +00:00
Alexey Bataev	28eccf5ba0	[OPENMP][NVPTX]Fixed initialization of the data-sharing interface. Summary: Avoid using of the atomic loop to wait for the completion of the data-sharing interface initialization, use __shfl_sync instead for the communication within the warp to signal other threads in the warp about completion of the initialization. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D56100 llvm-svn: 350129	2018-12-28 17:31:06 +00:00
Alexey Bataev	1708858dbd	[OPENMP][NVPTX]Outline assert into noinline function, NFC. Summary: At high optimization level asserts lead to some unexpected results because of auto-inserted unreachable instructions. This outlining prevents some of such dangerous optimizations and leads to better stability. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D56101 llvm-svn: 350128	2018-12-28 17:29:47 +00:00
Alexey Bataev	9056f1116d	[OPENMP][NVPTX]Revert __kmpc_shuffle_int64 to its original form. Summary: Use the original shuffle implementation for __kmpc_shuffle_int64 since default implementation uses the same implementation. Reviewers: gtbercea Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55514 llvm-svn: 348772	2018-12-10 16:50:36 +00:00
Alexey Bataev	cc6cf64c38	[OPENMP][NVPTX]Enable fast shuffles on 64bit values only if CUDA >= 9. Summary: Shuffle on 64bit data is allowed only for CUDA >= 9.0. Also, fixed the constant for the mask, need one extra L in the end. Reviewers: gtbercea, kkwli0 Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55440 llvm-svn: 348758	2018-12-10 14:29:05 +00:00
Alexey Bataev	8acafff404	[OPENMP][NVPTX]Save registers for optimized builds with enabled logging. Summary: Introduced special noinline function log that allows to save some registers for optimized builds but with enabled logging. Also, it increases the stability of the optimized builds with inlined runtime. Reviewers: gtbercea, kkwli0 Reviewed By: gtbercea Subscribers: caomhin, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D55436 llvm-svn: 348606	2018-12-07 16:08:29 +00:00
Alexey Bataev	653e8ba79a	[OPENMP][NVPTX]Correct type casting for printf args + simplified shfl64 function. Summary: Explicitly casted printf's args to the required types + simplified shfl64 function. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55379 llvm-svn: 348521	2018-12-06 19:45:48 +00:00
Alexey Bataev	5442f3e549	[OPENMP][NVPTX]Fix __kmpc_flush to flush the memory per system, not per block. Summary: According to the standard, after memory flushing the changes in the memory must be visible to all the threads in all teams. Patch fixes this. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55370 llvm-svn: 348491	2018-12-06 15:27:58 +00:00
Gheorghe-Teodor Bercea	10b2e60b7e	[OpenMP][libomptarget] Flush intermediate values during team reduction Summary: Ensure intermediate values of a team reduction are flushed to memory. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D55219 llvm-svn: 348148	2018-12-03 15:21:49 +00:00
Alexey Bataev	0f221f53d8	[OPENMP][NVPTX]Make runtime compatible with the original runtime. Summary: Reworked runtime to make it compatible with the requirements of the original runtime library. Also, simplified some code to reduce number of function calls. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55130 llvm-svn: 348003	2018-11-30 16:52:38 +00:00
Gheorghe-Teodor Bercea	31c1589ab0	[OpenMP][libomptarget] Add new version of SPMD deinit kernel function with argument Summary: To enable the compiler to optimize parts of the function that are not needed when runtime can be omitted, a new version of the SPMD deinit kernel function is needed. This function takes the runtime required flag as an argument. Reviewers: ABataev, kkwli0, caomhin Reviewed By: ABataev Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D54969 llvm-svn: 347714	2018-11-27 21:23:40 +00:00
Alexey Bataev	d4de439cf4	[OPENMP][NVPTX]Basic support for reductions across the teams. Summary: Added functions __kmpc_nvptx_teams_reduce_nowait_simple and __kmpc_nvptx_teams_end_reduce_nowait_simple to implement basic support for reductions across the teams. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D54967 llvm-svn: 347710	2018-11-27 21:06:09 +00:00
Gheorghe-Teodor Bercea	ad8632a9ba	[OpenMP][libomptarget] Refactor SPMD and runtime requirement checking Summary: Refactor the checking for SPMD mode and whether the runtime is initialized or not. This uses constant flags which enables the runtime to optimize out unused sections of code that depend on these flags. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D54960 llvm-svn: 347698	2018-11-27 19:45:10 +00:00
Alexey Bataev	8ab0924ab4	[OPENMP][NVPTX]Improved lock/critical constructs. Summary: Improved support for critical constructs + omp_..._lock... constructs. Reviewers: gtbercea, kkwli0, caomhin Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D54766 llvm-svn: 347342	2018-11-20 20:19:36 +00:00
Alexey Bataev	15ab891e68	[OPENMP]Make lambda mapping follow reqs for PTR_AND_OBJ mapping. Summary: The base pointer for the lambda mapping must point to the lambda capture placement and pointer must point to the captured variable itself. Patch fixes this problem. Reviewers: gtbercea Subscribers: guansong, openmp-commits, kkwli0, caomhin Differential Revision: https://reviews.llvm.org/D54260 llvm-svn: 346407	2018-11-08 15:47:30 +00:00
Alexey Bataev	9476ca7db9	[OPENMP][OFFLOADING]Change the lambda capturing flags. Summary: The previously used combination `PTR_AND_OBJ \| PRIVATE` could be used for mapping of some data in Fortran. Changed it to `PTR_AND_OBJ \| LITERAL`. Reviewers: gtbercea Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D54035 llvm-svn: 345981	2018-11-02 15:24:47 +00:00
Alexey Bataev	463e9f3224	[OPENMP][NVPTX]Fixed/improved support for globalization in team contexts. Summary: Current globalization scheme works correctly only for SPMD+lightweight runtime mode and does not work for full runtime. Patch improves support for the globalization scheme + reduces global memory consumption in lightweight runtime mode. Patch adds runtime functions to work with the statically allocated global memory. It allows to improve performance and memory consumption. This global memory must be allocated by the compiler. Reviewers: grokos, kkwli0, gtbercea, caomhin Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D53943 llvm-svn: 345976	2018-11-02 14:43:23 +00:00
Gheorghe-Teodor Bercea	b10bacf122	[OpenMP][libomptarget] Add runtime function for pushing coalesced global records Summary: In the case of coalesced global records, we need to push the exact data size passed in. This patch fixes this by outlining the common functionality of the previous push function and by adding a separate entry point for coalesced pushes. The pop function remains unchanged. Reviewers: ABataev, grokos, caomhin Reviewed By: ABataev, grokos Subscribers: jholewinski, cfe-commits, Hahnfeld, guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D53141 llvm-svn: 345867	2018-11-01 18:08:12 +00:00
Alexey Bataev	e5369885dd	[LIBOMPTARGET] Add support for mapping of lambda captures. Summary: Added support for correct mapping of variables captured by reference in lambdas. That kind of mapping may appear only in target-executable regions and must follow the original lambda or another lambda capture for the same lambda. The expected data: base address - the address of the lambda, begin pointer - pointer to the address of the lambda capture, size - size of the captured variable. When OMP_TGT_MAPTYPE_PTR_AND_OBJ mapping type is seen in target-executable region, the target address of the last processed item is taken as the address of the original lambda `tgt_lambda_ptr`. Then, the pointer to capture on the device is calculated like `tgt_lambda_ptr + (host_begin_pointer - host_begin_base)` and the target-based address of the original variable (which host address is `(void*)begin_pointer`) is written to that pointer. Reviewers: kkwli0, gtbercea, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D51107 llvm-svn: 345608	2018-10-30 15:42:12 +00:00
Jonas Hahnfeld	a762bfc03a	[libomptarget-nvptx] Enable asserts in bclib If the user requested LIBOMPTARGET_NVPTX_DEBUG, include asserts in the bitcode library. Everything else will have very unpleasent effects because asserts will appear when falling back to the static library libomptarget-nvptx.a. Differential Revision: https://reviews.llvm.org/D52701 llvm-svn: 343477	2018-10-01 14:16:55 +00:00
Jonas Hahnfeld	a1100e6b9a	[libomptarget-nvptx] reduction: Determine if runtime uninitialized Pass in the correct value of isRuntimeUninitialized() which solves parallel reductions as reported on the mailing list. For reference: r333285 did the same for loop scheduling. Differential Revision: https://reviews.llvm.org/D52725 llvm-svn: 343476	2018-10-01 14:14:26 +00:00
Jonas Hahnfeld	1bf767fb8e	[libomptarget-nvptx] Align data sharing stack NVPTX requires addresses of pointer locations to be 8-byte aligned or there will be an exception during runtime. This could happen without this patch as shown in the added test: getId() requires 4 byte of stack and putValueInParallel() uses 16 bytes to store the addresses of the captured variables. Differential Revision: https://reviews.llvm.org/D52655 llvm-svn: 343402	2018-09-30 09:23:21 +00:00
Jonas Hahnfeld	067235f227	[libomptarget-nvptx] Fix ancestor_thread_num and team_size (non-SPMD) According to OpenMP 4.5, p250:12-14: If the requested nest level is outside the range of 0 and the nest level of the current thread, as returned by the omp_get_level routine, the routine returns -1. The SPMD code path will need a similar fix. Differential Revision: https://reviews.llvm.org/D51787 llvm-svn: 343401	2018-09-30 09:23:14 +00:00
Jonas Hahnfeld	fb1b80191e	[libomptarget-nvptx] Add tests for nested parallelism Clang trunk will serialize nested parallel regions. Check that this is correctly reflected in various API methods. Differential Revision: https://reviews.llvm.org/D51786 llvm-svn: 343382	2018-09-29 16:02:32 +00:00
Jonas Hahnfeld	c89a14f5d2	[libomptarget-nvptx] Ignore calls to dynamic API There is no support and according to the OpenMP 4.5, p238:7-9: For implementations that do not support dynamic adjustment of the number of threads this routine has no effect: the value of dyn-var remains false. Add a test that cancellation and nested parallelism aren't supported either. Differential Revision: https://reviews.llvm.org/D51785 llvm-svn: 343381	2018-09-29 16:02:25 +00:00
Jonas Hahnfeld	a743c04412	[libomptarget-nvptx] Fix number of threads in parallel If there is no num_threads() clause we must consider the nthreads-var ICV. Its value is set by omp_set_num_threads() and can be queried using omp_get_max_num_threads(). The rewritten code now closely resembles the algorithm given in the OpenMP standard. Differential Revision: https://reviews.llvm.org/D51783 llvm-svn: 343380	2018-09-29 16:02:17 +00:00
Alexey Bataev	418af6f6cf	[OPENMP] Add the test to check that the libomptarget does not cause infinite loop on removing non-mapped pointer-with-object. Added test to check that libomptarget does not cause infinite loop when trying to unmap the pointer-with-object data that was not previously mapped. llvm-svn: 343344	2018-09-28 17:13:11 +00:00
Jonas Hahnfeld	122dbb5dce	[libomptarget-nvptx] Add testing infrastructure This patch also introduces testing for libomptarget-nvptx which has been missing until now. I propose to add tests for all bugs that are fixed in the future. The target check-libomptarget-nvptx is not run by default because - we can't determine if there is a GPU plugged into the system. - it will require the latest Clang compiler. Keeping compatibility with older releases would prevent testing newer code generation developed in trunk. Differential Revision: https://reviews.llvm.org/D51687 llvm-svn: 343324	2018-09-28 15:05:43 +00:00
Gheorghe-Teodor Bercea	f7256a593f	[OpenMP][libomptarget] Set the frame pointer then test empty slot condition Summary: NFC - just fixing a bug: the empty slot test was before the re-setting of the Stack pointer. Reviewers: ABataev, caomhin, Hahnfeld Reviewed By: ABataev Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D52122 llvm-svn: 343006	2018-09-25 18:48:14 +00:00
Gheorghe-Teodor Bercea	9bc3bfffb4	[OpenMP][libomptarget] Simplify warp master selection for data sharing Summary: There is currently no supported situation where the warp master is not the first thread in the warp. This also avoids the device execution from hanging on Volta GPUs when ballot_sync is called by a number of threads that is less that the size of a warp. Reviewers: ABataev, caomhin, grokos Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D50188 llvm-svn: 342972	2018-09-25 13:23:32 +00:00
Alexey Bataev	022bf16b41	[OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime. Summary: We need the support for per-team shared variables to support codegen for lastprivates/reductions. Patch adds this support by using shared memory if the total size of the reductions/lastprivates is <= 128 bytes, then pre-allocated buffer in global memory if size is <= 4K bytes,or uses malloc/free, otherwise. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51875 llvm-svn: 342737	2018-09-21 14:11:41 +00:00
Alexey Bataev	06b6e0f406	[OPENMP]Increment iterator when the loop is continued. Summary: Missed operation of the incrementing iterator when required just to continue execution. Reviewers: kkwli0, gtbercea, grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51937 llvm-svn: 341964	2018-09-11 17:16:26 +00:00
Jonas Hahnfeld	dc79c7187c	[libomptarget-nvptx] Remove last mentions of __kmpc_print_* Their implementation was removed during review, delete their prototype declarations. llvm-svn: 341748	2018-09-08 12:10:19 +00:00
Jonas Hahnfeld	21e3ee0afe	[libomptarget] Remove two unneeded includes, NFCI. Follow-up to r340542 and r340767. llvm-svn: 341563	2018-09-06 17:00:57 +00:00
Jonas Hahnfeld	f27dcf01d2	[libomptaret][test] Announce compiler features This is a follow-up to r341371: The new test for PR38704 doesn't work with Clang 6.0. It uses an UNSUPPORTED: clang-6, but that hasn't worked because the compiler features weren't known to lit. llvm-svn: 341448	2018-09-05 07:26:00 +00:00
Sergey Dmitriev	b4dc69ff80	[libomptarget] Remove `Devices` from `RTLInfoTy` This patch removes unused field `Devices` from `RTLInfoTy`. Differential Revision: https://reviews.llvm.org/D51653 llvm-svn: 341399	2018-09-04 20:23:09 +00:00
Jonas Hahnfeld	bb51d39871	[libomptarget][CUDA] Use cuDeviceGetAttribute, NFCI. cuDeviceGetProperties has apparently been deprecated since CUDA 5.0. Nvidia started using annotations only in CUDA 9.2, so nobody noticed nor cared before. The new function returns the same values, tested with a P100. Differential Revision: https://reviews.llvm.org/D51624 llvm-svn: 341372	2018-09-04 15:13:28 +00:00
Jonas Hahnfeld	f7f86971e6	[libomptarget] PR38704: Fix erase of ShadowPtrMap erase() invalidates the iterator and returns a new one pointing to the following element. The code now follows the example at https://en.cppreference.com/w/cpp/container/map/erase. (The added testcase crashes without this patch.) Reported by David Binderman (https://llvm.org/PR38704)! Differential Revision: https://reviews.llvm.org/D51623 llvm-svn: 341371	2018-09-04 15:13:23 +00:00
Jonas Hahnfeld	82d20201d0	[libomptarget][NVPTX] Drop dead code and data structures, NFCI. * cg and HasCancel in WorkDescr were never read and can be removed. * This eliminates the last use of priv in ThreadPrivateContext. * CounterGroup is unused afterwards. * Remove duplicate external declares in omptarget-nvptx.cu that are already in the header omptarget-nvptx.h. Differential Revision: https://reviews.llvm.org/D51622 llvm-svn: 341370	2018-09-04 15:13:17 +00:00
Jonas Hahnfeld	96c13488ab	[libomptarget][NVPTX] Fix __kmpc_spmd_kernel_deinit If the runtime is uninitialized the master thread must Enqueue the state object, and ALL threads must return immediately. Found post-commit of https://reviews.llvm.org/D51222. llvm-svn: 341328	2018-09-03 17:24:23 +00:00
Alexey Bataev	39a4724095	[OPENMP][NVPTX] Replace assert() by ASSERT0() macro, NFC. Required to fix the buildbots. llvm-svn: 340956	2018-08-29 19:22:06 +00:00
Alexey Bataev	b7a5d38cf5	[OPENMP][NVPTX] Lightweight runtime support for SPMD mode. Summary: Implemented simple and lightweight runtime support for SPMD mode-based constructs. It adds support for L2 sequential parallelism wihtout full runtime support. Also, patch fixes some use cases for uninitialized\|lightweight runtime. Reviewers: grokos, kkwli0, Hahnfeld, gtbercea Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51222 llvm-svn: 340944	2018-08-29 17:35:09 +00:00
Alexandre Eichenberger	e9b7d8dcd6	[OpenMP][libomptarget] rework of fatal error reporting Summary: Removed the function that used a lock and varargs Used the same mechanism as for debug messages Reviewers: ABataev, gtbercea, grokos, Hahnfeld Reviewed By: gtbercea, Hahnfeld Subscribers: mikerice, ABataev, RaviNarayanaswamy, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51226 llvm-svn: 340767	2018-08-27 18:20:15 +00:00
Alexandre Eichenberger	1b4a666ba5	[OpenMP][libomptarget] Bringing up to spec with respect to OMP_TARGET_OFFLOAD env var Summary: Right now, only the OMP_TARGET_OFFLOAD=DISABLED was implemented. Added support for the other MANDATORY and DEFAULT values. Reviewers: gtbercea, ABataev, grokos, caomhin, Hahnfeld Reviewed By: Hahnfeld Subscribers: protze.joachim, gtbercea, AlexEichenberger, RaviNarayanaswamy, Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D50522 llvm-svn: 340542	2018-08-23 16:22:42 +00:00
Alexey Bataev	37d4156b11	[OPNEMP, NVPTX] Fixed sychronization construct + code cleanup. Summary: 1. Fixed internal problem in `__kmpc_barrier` function: SPMD mode synchronization function should be called only in L1 parallel level. 2. Removed some extra code for synchronization inside of the code, used `__kmpc_barrier` instead. 3. Some code cleanup. Reviewers: gtbercea, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49564 llvm-svn: 337691	2018-07-23 13:52:12 +00:00
George Rokos	a0da24683b	[OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members This patch removes the translation code since this functionality is now implemented in the compiler. target_data_begin and target_data_end are also patched to handle some special cases that used to be handled by the obsolete translation function, namely ensure proper alignment of struct members when we have partially mapped structs. Mapping a struct from a higher address (i.e. not from its beginning) can result in distortion of the alignment for some of its member fields. Padding restores the original (proper) alignment. Differential revision: https://reviews.llvm.org/D44186 llvm-svn: 337455	2018-07-19 13:41:03 +00:00
Joachim Protze	bb869f42b7	[libomptarget] Also support several images for elf In revision r336569 (D49036) libomptarget support for multiple nvidia images has been fixed in case a target region resides inside one or multiple libraries and in the compiled application. But the issues is still present for elf images. This fix will also support multiple images for elf. Patch by Jannis Klinkenberg Reviewers: protze.joachim, ABataev, grokos Reviewed By: protze.joachim, ABataev, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49418 llvm-svn: 337355	2018-07-18 07:23:46 +00:00
Azharuddin Mohammed	6712b8675b	[cmake] Fix libomptarget/test/CMakeLists.txt Summary: Should be variable name instead of variable reference. If the variable is somehow unset, it messes up the if condition expression and causes a CMake error. Reviewers: jlpeyton, AndreyChurbanov, Hahnfeld Reviewed By: Hahnfeld Subscribers: mgorny, llvm-commits, openmp-commits Differential Revision: https://reviews.llvm.org/D47221 llvm-svn: 337133	2018-07-15 17:29:43 +00:00
Gheorghe-Teodor Bercea	9e94326185	[OpenMP][libomptarget] Fix data sharing and globalization infrastructure to work in SPMD mode Summary: This patch fixes the data sharing infrastructure to work for the SPMD and non-SPMD cases. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: ABataev, grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D49204 llvm-svn: 337013	2018-07-13 16:14:22 +00:00
Alexey Bataev	c2c0138a04	[OPENMP, NVPTX] Fix loop boundaries calculation for dynamic loops. Summary: Patch fixes the next problems. 1. Removes unused functions from omptarget_nvptx_ThreadPrivateContext class + simplified data members. 2. Fixed calculation of loop boundaries for dynamic loops with static scheduling. 3. Introduced saving/restoring of the dynamic loop boundaries to support several nested parallel dynamic loops. Reviewers: grokos Subscribers: guansong, kkwli0, openmp-commits Differential Revision: https://reviews.llvm.org/D49241 llvm-svn: 336915	2018-07-12 15:18:28 +00:00
Alexey Bataev	2622e9e5b3	[OPENMP, NVPTX] Support several images in the executable. Summary: Currently Cuda plugin supports loading of the single image, though we may have the executable with the several images, if it has target regions inside of the dynamically loaded library. Patch allows to load multiple images. Reviewers: grokos Subscribers: guansong, openmp-commits, kkwli0 Differential Revision: https://reviews.llvm.org/D49036 llvm-svn: 336569	2018-07-09 17:46:55 +00:00
Alexey Bataev	3994bafbc7	[OPENMP, NVPTX] Sync threads before start ordered loops. Summary: Threads must be synchronized before starting ordered construct. Reviewers: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D48732 llvm-svn: 335987	2018-06-29 16:16:00 +00:00
Alexey Bataev	0ac29350b5	[OPENMP, NVPTX] Fixes for NVPTX RTL Summary: Patch fixes several problems in the implementation of NVPTX RTL. 1. Detection of the last iteration for loops with static scheduling, no chunks. 2. Fixes reductions for the serialized parallel constructs. 3. Fixes handling of the barriers. Reviewers: grokos Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D48480 llvm-svn: 335469	2018-06-25 13:43:35 +00:00
Guansong Zhang	f9e56e5982	[OpenMP] [CUDA] Expose teamid to the debug path Summary: Small bug fix for debug build. A previous fix causing trouble for debug build. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D48286 llvm-svn: 335046	2018-06-19 14:05:38 +00:00
Jonas Hahnfeld	17aabf83e9	[libomptarget-nvptx] loop: Determine if runtime uninitialized The generic entry points for static loop scheduling previously hardcoded that the runtime was initialized. This can be wrong if the compiler analyzes that the runtime is not needed and calls the init functions accordingly. This didn't affect clang-ykt because they have entry points for different combinations of SPMD x Runtime not needed. I didn't do measurements yet but with inlining we might get away with always calling the generic interface and letting compiler and runtime figure out the rest. In any case, a correct runtime is always better than having functions that may only be called if previous calls passed in a specific set of arguments! Differential Revision: https://reviews.llvm.org/D47131 llvm-svn: 333285	2018-05-25 15:56:48 +00:00
Jonas Hahnfeld	65e0b8784c	[CMake] Unify install path for libraries Introduce OPENMP_INSTALL_LIBDIR and use in all install() commands. This also fixes installation of libomptarget-nvptx that previously didn't honor {OPENMP,LLVM}_LIBDIR_SUFFIX. Differential Revision: https://reviews.llvm.org/D47130 llvm-svn: 333284	2018-05-25 15:56:41 +00:00
George Rokos	6da6f433a0	[CUDA]Fix dynamic\|guided scheduling. The existing implementation of the dynamic scheduling breaks the contract introduced by the original openmp runtime and, thus, is incorrect. Patch fixes it and introduces correct dynamic scheduling model. Thanks to Alexey Bataev for submitting this patch. Differential Revision: https://reviews.llvm.org/D47333 llvm-svn: 333225	2018-05-24 21:12:41 +00:00
Jonas Hahnfeld	9228f9718c	[libomptarget-nvptx-bc] Pass found CUDA installations We already know where the CUDA SDK is, so there is no point in letting Clang search for it again and possibly finding no or a different installation. --cuda-path is supported since the beginning of CUDA support in Clang, so making this required doesn't impose additional restrictions. Differential Revision: https://reviews.llvm.org/D46930 llvm-svn: 332495	2018-05-16 17:20:27 +00:00
Jonas Hahnfeld	37bbe1a698	[libomptarget-nvptx] Test bitcode compiler flags and enable by default Move all logic related to selecting the bitcode compiler and linker into a new file and dynamically test required compiler flags. This also adds -fcuda-rdc for Clang trunk as previously attempted in D44992 which fixes the build. As a result this change also enables building the library by default if all prerequisites are met. Differential Revision: https://reviews.llvm.org/D46901 llvm-svn: 332494	2018-05-16 17:20:21 +00:00
Gheorghe-Teodor Bercea	787a350021	[OpenMP][libomptarget] Add function for checking SPMD mode Summary: Add function to the NVPTX libomptarget library that will return true if the current target region is being executed in SPMD mode. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D46840 llvm-svn: 332360	2018-05-15 15:16:43 +00:00
Guansong Zhang	e1c7a46d5b	[OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side Summary: Enable the device side debug messages at compile time, use env var to control at runtime. To achieve this, an environment data block is passed to the device lib when it is loaded. By default, the message is off, to enable it, a user need to set LIBOMPDEVICE_DEBUG=1. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D46210 llvm-svn: 331550	2018-05-04 19:29:28 +00:00
Guansong Zhang	ad6c26516b	[OpenMP] Remove compilation warning when using clang to compile bc files. Summary: Minor printf format correction. NVCC ignore those. Clang will give warning on these if debug is enabled. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D45528 llvm-svn: 330944	2018-04-26 14:06:53 +00:00
Guansong Zhang	334c379e32	[OpenMP] Make bc file compilation sensitive to LIBOMPTARGET_NVPTX_DEBUG flag Summary: The LIBOMPTARGET_NVPTX_DEBUG flag is inconsistent between using nvcc to generate .a file and clang to generate .bc file. Sync the two setting so we can get debug messages from the bc file path as well. Reviewers: grokos Subscribers: Hahnfeld, openmp-commits, mgorny Tags: #openmp Differential Revision: https://reviews.llvm.org/D45530 llvm-svn: 330477	2018-04-20 20:41:00 +00:00
Guansong Zhang	f679431f91	[OpenMP] Remove extra warning when we build Summary: This one line change is to remove this warning message "warning: integer conversion resulted in a change of sign" Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D45415 llvm-svn: 329713	2018-04-10 15:28:31 +00:00
Guansong Zhang	f0029a7738	Revert "[OpenMP] enable bc file compilation using the latest clang" This reverts commit 6849e31c36d712d97433bca9af39b7a09c8c1207. llvm-svn: 329576	2018-04-09 14:45:41 +00:00
Guansong Zhang	e47fbc9da8	[OpenMP] enable bc file compilation using the latest clang Summary: adding cuda-rdc flag to allow extern global data Reviewers: grokos Reviewed By: grokos Subscribers: gregrodgers, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D44992 llvm-svn: 329072	2018-04-03 15:01:34 +00:00
Gheorghe-Teodor Bercea	4bc36a06e2	[OpenMP][libomptarget] Initialize global memory stack only once. Summary: The global stack initialization function may be called multiple times. The initialization of the shared memory slots should only happen when the function is called for the first time for a given warp master thread. Reviewers: grokos, carlo.bertolli, ABataev, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44754 llvm-svn: 328148	2018-03-21 21:02:55 +00:00
Gheorghe-Teodor Bercea	b4332ca3da	[OpenMP][libomptarget] Fix master warp check Summary: The check for the master warp must take into consideration the actual number of warps: the master warp is equal to the last active warp not necessarily WARPSIZE - 1. Reviewers: grokos, carlo.bertolli, ABataev, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44537 llvm-svn: 328146	2018-03-21 20:51:16 +00:00
Gheorghe-Teodor Bercea	c8d395a168	[OpenMP][libomptarget] Enable globalization for workers Summary: This patch allows worker to have a global memory stack managed by the runtime. This patch is needed for completeness and consistency with the globalization policy: if a worker-side variable escapes the current context it then needs to be globalized. Until now, only the master thread was allowed to have such a stack. These global values can now potentially be shared amongst workers if the semantics of the OpenMP program require it. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44487 llvm-svn: 328144	2018-03-21 20:34:19 +00:00
George Rokos	6b9bb5e1c2	Bugfix, extern declarations for libomp functions are `extern "C"` declarations llvm-svn: 327763	2018-03-17 02:07:42 +00:00
George Rokos	2878c3957b	Moved extern declarations to private header file, they are only used from within libomptarget, they don't need to be in omptarget.h. llvm-svn: 327740	2018-03-16 20:40:09 +00:00
Gheorghe-Teodor Bercea	876c1ed2e5	[OpenMP][libomptarget] Enable usage of shared memory slots Summary: Allow the runtime to use the existing shared memory statically allocated slots. When a variable is globalized, the underlying memory can be either shared or global memory (both have block-wide visibility). In this case, we allow that the storage to use a limited amount of shared memory that has been statically allocated already. Only if shared memory doesn't prove to be enough do we then invoke malloc() to create a new global memory slot. Reviewers: ABataev, carlo.bertolli, grokos, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44486 llvm-svn: 327639	2018-03-15 16:05:34 +00:00
Gheorghe-Teodor Bercea	f3de222b0d	[OpenMP][libomptarget] Enable multiple frames per global memory slot Summary: To save on calls to malloc, this patch enables the re-use of pre-allocated global memory slots. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44470 llvm-svn: 327637	2018-03-15 15:56:04 +00:00
George Rokos	59be4b434f	[libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread. llvm-svn: 327556	2018-03-14 19:11:36 +00:00
Gheorghe-Teodor Bercea	49b62649cf	[OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing. Summary: This patch adds support for the sharing of variables from the master thread of a team to the worker threads of the team. The runtime uses a stack structure implemented as a doubly-linked list of slots with each slot having the exact same size as the size requested. This implementation leverages existing data structures. The runtime functions are added as separate functions to avoid interfering with the current interface. Limitations to be addressed in future patches: - This current patch only employs global memory. In a future patch we will enable to usage for shared memory as an optimization. - Allow the allocation of several requested sizes in the same slot. Reviewers: ABataev, grokos, caomhin, carlo.bertolli Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44260 llvm-svn: 327440	2018-03-13 19:44:53 +00:00
Gheorghe-Teodor Bercea	d5e5992f9a	[OpenMP][libomptarget] Fix union. Summary: To make the two parts of the union have the same size, the size of vect needs to be increased by 16 bits. Reviewers: grokos, carlo.bertolli, caomhin, ABataev Reviewed By: grokos, ABataev Subscribers: fedor.sergeev, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44254 llvm-svn: 327040	2018-03-08 18:44:02 +00:00
Gheorghe-Teodor Bercea	7a5fa21ae2	[OpenMP] Remove implicit data sharing using device shared memory from libomptarget Summary: This patch reverts the changes to libomptarget that were coupled with the changes to Clang code gen for data sharing using shared memory. A similar patch exists for Clang: D43625 Shared memory is meant to be used as an optimization on top of a more general scheme. So far we didn't have a global memory implementation ready so shared memory was a solution which applied to the current level of OpenMP complexity supported by trunk on GPU devices (due to the missing NVPTX backend patch this functionality has never been exercised). Now that we have a global memory solution this patch is "in the way" and needs to be removed (for now). This patch (or an equivalent version of it) will be put out for review once the global memory scheme is in place. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D43626 llvm-svn: 326950	2018-03-07 22:10:10 +00:00
Gheorghe-Teodor Bercea	d5ae4e6501	[OpenMP][libomptarget] Enable the compilation of multiple bc libraries for runtime inlining Summary: Different NVIDIA GPUs support different compute capabilities. To enable the inlining of runtime functions and the best performance on different generations of NVIDIA GPUs, a bc library for each compute capability needs to be compiled. The same compiler build will then be usable in conjunction with multiple generations of NVIDIA GPUs. To differentiate between versions of the same bc lib, the output file name will contain the compute capability ID. Depends on D14254 Reviewers: Hahnfeld, hfinkel, carlo.bertolli, caomhin, ABataev, grokos Reviewed By: Hahnfeld, grokos Subscribers: guansong, mgorny, openmp-commits Differential Revision: https://reviews.llvm.org/D41724 llvm-svn: 324904	2018-02-12 16:45:20 +00:00
Jonas Hahnfeld	3cfaf3dd0d	[libomptarget] Fix detection of CUDA stubs library CUDA_LIBRARIES contains additional linker arguments since CMake 3.3 which breakes the current way of finding the stubs library. llvm-svn: 324879	2018-02-12 11:01:56 +00:00
Gheorghe-Teodor Bercea	aaeab8d4ef	[OpenMP][libomptarget] Add data sharing support in libomptarget Summary: This patch extends the libomptarget functionality in patch D14254 with support for the data sharing scheme for supporting implicitly shared variables. The runtime therefore maintains a list of references to shared variables. Reviewers: carlo.bertolli, ABataev, Hahnfeld, grokos, caomhin, hfinkel Reviewed By: Hahnfeld, grokos Subscribers: guansong, llvm-commits, openmp-commits Differential Revision: https://reviews.llvm.org/D41485 llvm-svn: 324495	2018-02-07 18:21:55 +00:00
Carlo Bertolli	57e9f44a8c	[OpenMP-RT] Fix debug string for NVPTX runtime library https://reviews.llvm.org/D42757 The method ThreadsInTeam is used to determine the number of threads to be used in a parallel region under SPMD mode (see line 127 of supporti.h in libomptarget/deviceRTLs/nvptx/src/). This patch fixes the corresponding debug print upon initialization of the kernel in SPMD mode. llvm-svn: 323978	2018-02-01 16:12:16 +00:00
Jonas Hahnfeld	a349d4820c	[libomptarget] Check for library with CUDA Driver API That's what we really need to link the CUDA plugin against, not the CUDA runtime API in CUDA_LIBRARIES! While the latter comes with the CUDA SDK, the Driver API is installed with the kernel driver and there is at most one per system. As fallback we can use the stubs library distributed with the CUDA SDK for linking. Differential Revision: https://reviews.llvm.org/D42643 llvm-svn: 323787	2018-01-30 16:49:13 +00:00
Jonas Hahnfeld	c189523529	[libomptarget] Only use CUDA Driver API Use equivalents for the last calls to the Runtime API. Remove stray assert in case of an error found during review, we should only return OFFLOAD_FAIL. Differential Revision: https://reviews.llvm.org/D42686 llvm-svn: 323786	2018-01-30 16:49:06 +00:00
George Rokos	0dd6ed74fd	[OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs. This patch implements the device runtime library whose interface is used in the code generation for OpenMP offloading devices. Currently there is a single device RTL written in CUDA meant to CUDA enabled GPUs. The interface is a variation of the kmpc interface that includes some extra calls to do thread and storage management that only make sense for a GPU target. Differential revision: https://reviews.llvm.org/D14254 llvm-svn: 323649	2018-01-29 13:59:35 +00:00
Dimitry Andric	9f49676a8a	Sprinkle a few <cstdlib> includes, for libomptarget sources using malloc, free, alloca and getenv. NFCI. llvm-svn: 322869	2018-01-18 18:24:22 +00:00
Jonas Hahnfeld	e5499111b9	Add missing headers for Debug builds llvm-svn: 322830	2018-01-18 10:58:43 +00:00
Jonas Hahnfeld	2e809acd0b	Unify build documentation and convert to reStructuredText We now have several options that apply for both libraries and they shouldn't be documented in multiple files. When already merging the two Build_With_CMake.txt documents, convert them to reStructuredText which is used for all of LLVM's documentation. Differential Revision: https://reviews.llvm.org/D40920 llvm-svn: 321481	2017-12-27 09:15:10 +00:00
Jonas Hahnfeld	a7c4f3202b	[libomptarget] Split implementation of interface functions This last of four patches adds a new file for the interface functions that Clang uses during code generation. The only change except simply moving the current code is renaming the function CheckDeviceAndCtors() and using the correct type for 64bit device ids. Differential Revision: https://reviews.llvm.org/D40801 llvm-svn: 319972	2017-12-06 21:59:15 +00:00
Jonas Hahnfeld	a3b147ab45	[libomptarget] Split implementation of API functions This third patch moves the implementation of the user-facing OpenMP API functions into its own file. For now, the code is only moved, no cleanups applied yet. Differential Revision: https://reviews.llvm.org/D40800 llvm-svn: 319971	2017-12-06 21:59:12 +00:00
Jonas Hahnfeld	3029616dfc	[libomptarget] Split device functionality This is the second patch to split the current monolithic implementation into separate files. Note that this change doesn't cleanup the code yet. Differential Revision: https://reviews.llvm.org/D40799 llvm-svn: 319970	2017-12-06 21:59:09 +00:00
Jonas Hahnfeld	433228048a	[libomptarget] Split RTL plugin functionality This is the first of four patches to split the target agnostic library into multiple (smaller) files. It only moves the code to separate implementation files and does no cleanup (yet) except removing unneeded headers. Differential Revision: https://reviews.llvm.org/D40798 llvm-svn: 319969	2017-12-06 21:59:07 +00:00
Jonas Hahnfeld	2559fbdf08	[libomptarget] Move header files and CMake library definition Future patches will add (private) header files in src/ that should not be visible to plugins, so move the "public" ones to a new include/ directory. This is still internal in a sense that the contained files won't be installed for the user. Similarly, the target agnostic offloading library should be built directly in src/. The parent directory is responsible for finding dependencies and including all subdirectories. Differential Revision: https://reviews.llvm.org/D40797 llvm-svn: 319968	2017-12-06 21:59:04 +00:00
Jonas Hahnfeld	fc473dee98	[CMake] Detect information about test compiler Perform a nested CMake invocation to avoid writing our own parser for compiler versions when we are not testing the in-tree compiler. Use the extracted information to mark a test as unsupported that hangs with Clang prior to version 4.0.1 and restrict tests for libomptarget to Clang version 6.0.0 and later. Differential Revision: https://reviews.llvm.org/D40083 llvm-svn: 319448	2017-11-30 17:08:31 +00:00
Jonas Hahnfeld	18bec60bc2	[CMake] Refactor testing infrastructure The code for the two OpenMP runtime libraries was very similar. Move to common CMake file that is included and provides a simple interface for adding testsuites. Also add a common check-openmp target that runs all testsuites that have been registered. Note that this renames all test options to the common OPENMP namespace, for example OPENMP_TEST_C_COMPILER instead of LIBOMP_TEST_COMPILER and so on. Differential Revision: https://reviews.llvm.org/D40082 llvm-svn: 319343	2017-11-29 19:31:52 +00:00
Jonas Hahnfeld	5af381acad	[CMake] Refactor common settings and flags These are needed by both libraries, so we can do that in a common namespace and unify configuration parameters. Also make sure that the user isn't requesting libomptarget if the library cannot be built on the system. Issue an error in that case. Differential Revision: https://reviews.llvm.org/D40081 llvm-svn: 319342	2017-11-29 19:31:48 +00:00
Jonas Hahnfeld	3e921d3c52	[CMake] Disallow direct configuration As a first step, this allows us to generalize the detection of standalone builds and make it fully compatible when building in llvm/runtimes/ which automatically sets OPENMP_STANDLONE_BUILD. Differential Revision: https://reviews.llvm.org/D40080 llvm-svn: 319341	2017-11-29 19:31:43 +00:00

1 2 3 4

178 Commits