llvm-project

Commit Graph

Author	SHA1	Message	Date
JonChesterfield	c2ce9ea4e3	[libomptarget][nfc] Change enum values to match those in cuda/rtl Summary: [libomptarget][nfc] Change enum values to match those in cuda/rtl support.h and cuda/rtl.cpp (and downsteam hsa/rtl.cpp) have enums for execution mode. These are actually independent - the numbers that used within support, or within the plugin, are never passed across the boundary. Nevertheless, trying to work out why the values are different between the two has generated a reasonable amount of confusion. This patch changes support to match the values in plugin, on the basis that the plugin also has some comments which I'd have to update if I changed that one instead. Credit to Ron for working through this in our own fork. See rocm-developer-tools/aomp/issues/7 for that earlier diagnostic write up. Also happy with generic = 0, spmd = 1 - provided it's the same in both places. Reviewers: jdoerfert, grokos, ABataev, ronlieb Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D74503	2020-02-12 23:27:08 +00:00
Johannes Doerfert	a5153dbc36	[OpenMP][Offloading] Added support for multiple streams so that multiple kernels can be executed concurrently Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D74145	2020-02-11 22:07:14 -06:00
Johannes Doerfert	3ff4e2eee8	[OpenMP] Switch default C++ standard to C++ 14 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D74258	2020-02-11 17:11:54 -06:00
Jonas Devlieghere	4fe839ef3a	[CMake] Rename EXCLUDE_FROM_ALL and make it an argument to add_lit_testsuite EXCLUDE_FROM_ALL means something else for add_lit_testsuite as it does for something like add_executable. Distinguish between the two by renaming the variable and making it an argument to add_lit_testsuite. Differential revision: https://reviews.llvm.org/D74168	2020-02-06 15:33:18 -08:00
Jon Chesterfield	6a82f0f0b9	[libomptarget] Implement wavefront functions for amdgcn Summary: [libomptarget] Implement wavefront functions for amdgcn Reviewers: jdoerfert, ABataev, grokos, arsenm Reviewed By: arsenm Subscribers: saiislam, wdng, arsenm, jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D73077	2020-02-04 21:55:29 +00:00
Jon Chesterfield	ab9762a9f5	Revert "[nfc][libomptarget] Remove SHARED annotation from local variables" This reverts commit `0e9374e374`. Revert D73239. It fails some local testing, cause presently unknown	2020-01-27 20:05:17 +00:00
Jon Chesterfield	0e9374e374	[nfc][libomptarget] Remove SHARED annotation from local variables Summary: [nfc][libomptarget] Remove SHARED annotation from local variables A few local variables in reduction.cu were marked SHARED. This patch leaves all per-kernel global state localised in omp_data.cu. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D73239	2020-01-23 00:00:23 +00:00
Alexey Bataev	9148b8b734	[OpenMP][Offloading] Fix the issue that omp_get_num_devices returns wrong number of devices, by Shiley Tian. Summary: This patch is to fix issue in the following simple case: #include <omp.h> #include <stdio.h> int main(int argc, char *argv[]) { int num = omp_get_num_devices(); printf("%d\n", num); return 0; } Currently it returns 0 even devices exist. Since this file doesn't contain any target region, the host entry is empty so further actions like initialization will not be proceeded, leading to wrong device number returned by runtime function call. Reviewers: jdoerfert, ABataev, protze.joachim Reviewed By: ABataev Subscribers: protze.joachim Tags: #openmp Differential Revision: https://reviews.llvm.org/D72576	2020-01-21 13:25:18 -05:00
Jon Chesterfield	03c2a59cd6	[libomptarget] Implement smid for amdgcn Summary: [libomptarget] Implement smid for amdgcn Implementation is in a new file as it uses an intrinsic with complicated encoding that warranted substantial comments. Reviewers: jdoerfert, grokos, ABataev, ronlieb Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72956	2020-01-20 14:52:17 +00:00
George Rokos	e244145ab0	[LIBOMPTARGET] Do not increment/decrement the refcount for "declare target" objects The reference counter for global objects marked with declare target is INF. This patch prevents the runtime from incrementing /decrementing INF refcounts. Without it, the map(delete: global_object) directive actually deallocates the global on the device. With this patch, such a directive becomes a no-op. Differential Revision: https://reviews.llvm.org/D72525	2020-01-14 16:30:38 -08:00
Jon Chesterfield	2a43688a0a	[nfc][libomptarget] Refactor nvptx/target_impl.cu Summary: [nfc][libomptarget] Refactor nxptx/target_impl.cu Use __kmpc_impl_atomic_add instead of atomicAdd to match the rest of the file. Alternatively, target_impl.cu could use the cuda functions directly. Using a mixture in this file was an oversight, happy to resolve in either direction. Removed some comments that look outdated. Call __kmpc_impl_unset_lock directly to avoid a redundant diagnostic and remove an implict dependency on interface.h. Reviewers: ABataev, grokos, jdoerfert Reviewed By: jdoerfert Subscribers: jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72719	2020-01-14 19:27:45 +00:00
Jon Chesterfield	2d287bec3c	[nfc][libomptarget] Refactor amdgcn target_impl Summary: [nfc][libomptarget] Refactor amdgcn target_impl Removes references to internal libraries from the header Standardises on C++ mangling for all the target_impl functions Update comment block clang-format Move some functions into a new target_impl.hip source file This lays the groundwork for implementing the remaining unresolved symbols in the target_impl.hip source. Reviewers: jdoerfert, grokos, ABataev, ronlieb Reviewed By: jdoerfert Subscribers: jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72712	2020-01-14 19:27:07 +00:00
Alexey Bataev	b19c0810e5	[LIBOMPTARGET]Ignore empty target descriptors. Summary: If the dynamically loaded module has been compiled with -fopenmp-targets and has no target regions, it has empty target descriptor. It leads to a crash at the runtime if another module has at least one target region and at least one entry in its descriptor. The runtime library is unable to load the empty binary descriptor and terminates the execution. Caused by a clang-offload-wrapper. Reviewers: grokos, jdoerfert Subscribers: caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72472	2020-01-10 09:45:27 -05:00
Kazuaki Ishizaki	4c6a098ad5	[OpenMP] NFC: Fix trivial typos in comments Reviewers: jdoerfert, Jim Reviewed By: Jim Subscribers: Jim, mgorny, guansong, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72285	2020-01-07 14:05:03 +08:00
Jon Chesterfield	bc48af8c57	[libomptarget][nfc] Change unintentional target_impl prefix to kmpc_impl	2019-12-30 20:50:23 +00:00
Jon Chesterfield	63e2aa5658	[libomptarget][nfc] Provide target_impl malloc/free Summary: [libomptarget][nfc] Provide target_impl malloc/free Sufficient to build support.cu for amdgcn Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71685	2019-12-19 16:54:28 +00:00
JonChesterfield	b40822fc14	[libomptarget][nvptx] Fix build, second symbol reordering	2019-12-19 02:02:44 +00:00
Jon Chesterfield	89a2bef27a	[libomptarget][nvptx] Fix build, symbol ordering in target_impl.h	2019-12-19 01:50:06 +00:00
JonChesterfield	9aefe5f65e	[libomptarget][amdgcn] Correct return type of extern __clock64 to unsigned	2019-12-19 00:11:21 +00:00
Jon Chesterfield	2caeaf2f45	[libomptarget][nfc] Introduce atomic wrapper function Summary: [libomptarget][nfc] Introduce atomic wrapper function Wraps atomic functions in a template prefixed __kmpc_atomic that dispatches to cuda or hip atomic functions. Intended to be easily extended to dispatch to OpenCL or C++ atomics for a third target. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: Anastasia, jvesely, mgrang, dexonsmith, llvm-commits, mgorny, jfb, openmp-commits Tags: #openmp, #llvm Differential Revision: https://reviews.llvm.org/D71404	2019-12-18 20:06:17 +00:00
JonChesterfield	8adae6027c	[libomptarget][nfc] Extract function from data_sharing, move to common Summary: [libomptarget][nfc] Extract function from data_sharing, move to common Finding the first active thread in the warp is different on nvptx and amdgcn, mostly due to warp size and the desire for efficiency. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71643	2019-12-18 19:39:35 +00:00
Alexey Bataev	15d47deedd	[LIBOPENMP][NVPTX]Fix the build error in the runtime.	2019-12-17 14:46:04 -05:00
JonChesterfield	0c83f8ccc7	[libomptarget][nfc] Move three files under common, build them for amdgcn Summary: [libomptarget][nfc] Move three files under common, build them for amdgcn Change to reduction.cu to remove two dead includes, otherwise no code change. Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71601	2019-12-17 18:02:49 +00:00
JonChesterfield	3d3e4076cd	[libomptarget][nfc] Move omp locks under target_impl Summary: [libomptarget][nfc] Move omp locks under target_impl These are likely to be target specific, even down to the lock_t which is correspondingly moved out of interface.h. The alternative is to include interface.h in target_impl which substantiatially increases the scope of those symbols. The current nvptx implementation deadlocks on amdgcn. The preferred implementation for that arch is still under discussion - this change leaves declarations in target_impl. The functions could be inline for nvptx. I'd prefer to keep the internals hidden in the target_impl translation unit, but will add the (possibly renamed) macros to target_impl.h if preferred. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71574	2019-12-17 12:18:57 +00:00
Jon Chesterfield	ce12a523b0	[libomptarget][nfc] Move timer functions behind target_impl Summary: [libomptarget][nfc] Move timer functions behind target_impl Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71584	2019-12-17 02:22:29 +00:00
Jon Chesterfield	53bcd1e141	[libomptarget][nfc] Wrap cuda min() in target_impl Summary: [libomptarget][nfc] Wrap cuda min() in target_impl nvptx forwards to cuda min, amdgcn implements directly. Sufficient to build parallel.cu for amdgcn, added to CMakeLists. All call sites are homogenous except one that passes a uint32_t and an int32_t. This could be smoothed over by taking two type parameters and some care over the return type, but overall I think the inline <uint32_t> calling attention to what was an implicit sign conversion is cleaner. Reviewers: ABataev, jdoerfert Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71580	2019-12-17 01:30:04 +00:00
JonChesterfield	69fcc6ecc1	Revert "Revert "[libomptarget] Move resource id functions into target specific code, implement for amdgcn"" Summary: This reverts commit `dd8a7fcdd7`. Alexey reports undefined symbols for the new inline functions defined in target_impl.h This does not reproduce for me for nvptx, or amdgcn, under release or debug builds. I believe the patch is fine, based on: - the semantics of an inline function in C++ (the cuda INLINE functions end up as linkonce_odr in IR), which are only legal to drop if they have no uses - the code generated from a debug build of clang 9 does not show these undef symbols - the tests pass - the code is trivial To progress from here I either need: - A tie break - someone to play the role of CI in determining whether the patch works - Alexey to provide sufficient information about his build for me to reproduce the failure - Alexey to debug why the symbols are disappearing for him and report back Reviewers: ABataev, jdoerfert, grokos Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71502	2019-12-16 16:16:14 +00:00
Alexey Bataev	dd8a7fcdd7	Revert "[libomptarget] Move resource id functions into target specific code, implement for amdgcn" This reverts commit `dbb3fec8ad` since it breaks the NVPTX tests.	2019-12-13 16:36:06 -05:00
Jon Chesterfield	40d72134fd	[libomptarget] Build most of common/src for amdgcn Summary: [libomptarget] Build most of common/src for amdgcn Excluding parallel.cu, which uses an integer min() from cuda, Excluding support.cu, which calls malloc that is not yet available for amdgcn Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: gregrodgers, ronlieb, jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71446	2019-12-13 17:48:19 +00:00
Jon Chesterfield	56adcebfda	[libomptarget][nfc] Add nop syncwarp function for amdgcn	2019-12-13 14:27:52 +00:00
Jon Chesterfield	479868646a	[libomptarget][nfc] Add declarations of atomic functions for amdgcn Summary: [libomptarget][nfc] Add declarations of atomic functions for amdgcn This enables building more source for amdgcn. The functions are usually available in a hip runtime header, but are duplicated here to decouple the implementation Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71412	2019-12-12 22:56:14 +00:00
Jon Chesterfield	dbb3fec8ad	[libomptarget] Move resource id functions into target specific code, implement for amdgcn Summary: [libomptarget] Move resource id functions into target specific code, implement for amdgcn Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71382	2019-12-12 22:49:02 +00:00
Jon Chesterfield	b399252028	[libomptarget][nfc] Add missing header for amdgcn/target_impl	2019-12-12 09:36:57 +00:00
JonChesterfield	0dd62c5c2e	[libomptarget][nfc] Move cuda threadfence functions behind kmpc_impl Summary: [libomptarget][nfc] Move cuda threadfence functions behind kmpc_impl Part of building code under common/ without requiring a cuda compiler Reviewers: ABataev, jdoerfert, grokos Reviewed By: ABataev Subscribers: jvesely, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71102	2019-12-06 15:41:18 +00:00
Jon Chesterfield	cd90f49d70	[libomptarget][nfc] Move three more files to common Summary: [libomptarget][nfc] Move three more files to common Reviewers: ABataev, jdoerfert, grokos Reviewed By: ABataev Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71103	2019-12-06 15:29:50 +00:00
Jon Chesterfield	4af84d2686	[libomptarget][nfc] Introduce SHARED, ALIGN macros Summary: [libomptarget][nfc] Introduce SHARED, ALIGN macros Move remaining cuda attributes behind such macros Reviewers: ABataev, jdoerfert, grokos Reviewed By: ABataev Subscribers: openmp-commits, jvesely Tags: #openmp Differential Revision: https://reviews.llvm.org/D71076	2019-12-05 21:57:58 +00:00
Jon Chesterfield	d0b9ed5c49	[libomptarget][nfc] Move omptarget-nvptx under common Summary: [libomptarget][nfc] Move omptarget-nvptx under common Almost all files depend on require omptarget-nvptx, which no longer contains any obviously architecture dependent code. Moving it under common unblocks task/loop for amdgcn, and allows moving other code. At some point there should probably be a widespread symbol renaming to replace the nvptx string. I'd prefer to get things working first. Building this (and task.cu, loop.cu) without a cuda library requires some more refactoring, e.g. wrap threadfence(), use DEVICE macro more consistently. Patches for that are orthogonal and will be posted shortly. Reviewers: jdoerfert, ABataev, grokos Reviewed By: ABataev Subscribers: mgorny, fedor.sergeev, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71073	2019-12-05 20:34:15 +00:00
JonChesterfield	3ada8d2a87	[libomptarget] Build a minimal deviceRTL for amdgcn Summary: [libomptarget] Build a minimal deviceRTL for amdgcn Repeat of D70414, with an include path fixed. Diff for sanity checking. The CMakeLists.txt file is functionally identical to the one used in the aomp fork. Whitespace changes were made based on nvptx/CMakeLists.txt, plus the copyright notice updated to match (Greg was the original author so would like his sign off on that here). This change will build a small subset of the deviceRTL if an appropriate toolchain is available, e.g. a local install of rocm. Support.h is moved from nvptx as a dependency of debug.h. Reviewers: ABataev, jdoerfert Reviewed By: ABataev Subscribers: jvesely, mgorny, jfb, openmp-commits, jdoerfert Tags: #openmp Differential Revision: https://reviews.llvm.org/D70971	2019-12-04 16:43:37 +00:00
Alexey Bataev	02b9c5d963	Revert "[libomptarget] Build a minimal deviceRTL for amdgcn" This reverts commit `877ffa716f` because it breaks the build.	2019-12-03 12:35:08 -05:00
Jon Chesterfield	877ffa716f	[libomptarget] Build a minimal deviceRTL for amdgcn Summary: [libomptarget] Build a minimal deviceRTL for amdgcn The CMakeLists.txt file is functionally identical to the one used in the aomp fork. Whitespace changes were made based on nvptx/CMakeLists.txt, plus the copyright notice updated to match (Greg was the original author so would like his sign off on that here). This change will build a small subset of the deviceRTL if an appropriate toolchain is available, e.g. a local install of rocm. Support.h is moved from nvptx as a dependency of debug.h. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Reviewed By: jdoerfert Subscribers: jfb, Hahnfeld, jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70414	2019-12-03 15:18:41 +00:00
Bryan Chan	4d3198e243	[OpenMP] build offload plugins before testing them Summary: "make check-all" or "make check-libomptarget" would attempt to run offloading tests before the offload plugins are built. This patch corrects that by adding dependencies to the libomptarget CMake rules. Reviewers: jdoerfert Subscribers: mgorny, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70803	2019-11-28 17:43:56 -05:00
JonChesterfield	a84b48d01e	[nfc][libomptarget] Remove casts of string literals to char*	2019-11-19 19:41:59 +00:00
JonChesterfield	4681e2e434	[nfc][libomptarget] Write amdgcn macros in terms of compiler intrinsics	2019-11-19 17:23:46 +00:00
Jon Chesterfield	5a4a05d776	[libomptarget][nfc] Move some source into common from nvptx Summary: [libomptarget][nfc] Move some source into common from nvptx Moves some source that compiles cleanly under amdgcn into a common subdirectory Includes some non-trivial files and some headers. Keeps the cuda file extension. The build systems for different architectures seem unlikely to have much in common. The idea is therefore to set include paths such that files under common/src compile as if they were under arch/src as the mechanism for sharing. In particular, files under common/src need to be able to include target_impl.h. The corresponding -Icommon is left out in favour of explicit includes on the basis that the it makes it clearer which files under common are used by a given architecture. Reviewers: jdoerfert, ABataev, grokos Reviewed By: ABataev Subscribers: jfb, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70328	2019-11-18 18:17:36 +00:00
JonChesterfield	32dfbd131d	[libomptarget][nfc] Use cuda variable wrappers from support.h Summary: [libomptarget][nfc] Use cuda variable wrappers from support.h Reimplementation of D69693, after the revert of D69885 Use the wrappers in support.h for cuda builtin variables at all call sites. Localises use of cuda and removes WARPSIZE==32 assumption in debug.h. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70186	2019-11-14 12:45:09 +00:00
JonChesterfield	fd9fa9995c	[libomptarget] Move supporti.h to support.cu Summary: [libomptarget] Move supporti.h to support.cu Reimplementation of D69652, without the unity build and refactors. Will need a clean build of libomptarget as the cmakelists changed. Reviewers: ABataev, jdoerfert Reviewed By: jdoerfert Subscribers: mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70131	2019-11-13 11:36:46 +00:00
Jon Chesterfield	7cea0cea77	[libomptarget] Revert all improvements to support Summary: [libomptarget] Revert all improvements to support The change to unity build for nvcc has broken the build for some developers. This patch reverts to a known-working state. There has been some confusion over exactly how the build broke. I think we have reached a common understanding that the disappearing symbols are from the bitcode library built by clang. The static archive built by nvcc may show the same problem. Some of the confusion arose from building the deviceRTL twice and using one or the other library based on various environmental factors. I'm pretty sure the problem is clang expanding `__forceinline__` into both `__inline__` and `attribute(("always_inline"))`. The `__inline__` attribute resolves to linkonce_odr which is not safe for exporting symbols from translation units. "always_inline" is the desired semantic for small functions defined in one translation unit that are intended to be inlined at link time. "inline" is not. This therefore reintroduces the dependency hazard of supporti.h and some code duplication, and blocks progress separating deviceRTL into reusable components. See also D69857, D69859 for attempts at a fix instead of a revert. Reviewers: ABataev, jdoerfert, grokos, ikitayama, tianshilei1992 Reviewed By: ABataev Subscribers: mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69885	2019-11-06 15:44:10 +00:00
Ron Lieberman	dc34b1c94d	Test commit: adds a . to comment. NFC	2019-11-04 16:51:03 -06:00
JonChesterfield	94c59ea8dd	[libomptarget] Implement target_impl for amdgcn Summary: [libomptarget] Implement target_impl for amdgcn Smallest atomic addition for a new target. Implements enough of the amdgcn specific code that some of the source files under nvptx/src could be compiled, without modification, to run on amdgcn. This foreshadows a work in progress patch to move said source out of nvptx/src. Patch based on fork at https://github.com/ROCm-Developer-Tools/llvm-project Reviewers: ABataev, jdoerfert, grokos, ronlieb Subscribers: jvesely, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69718	2019-11-01 15:46:35 +00:00
Alexey Bataev	e57f8ad914	[LIBOMPTARGET]Call GetLaneId function, do not use its address in debug log functions.	2019-11-01 09:43:47 -04:00
JonChesterfield	9b06ac98d0	[nfc][omptarget] Use builtin var abstraction. Second pass at D69476 Summary: [nfc][omptarget] Use builtin var abstraction. Second pass at D69476 Use the wrappers in support.h for cuda builtin variables at all call sites. Localises use of cuda and removes WARPSIZE==32 assumption in debug.h. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69693	2019-11-01 02:21:44 +00:00
JonChesterfield	764c8420e4	[nfc][libomptarget] Reorganise support header Summary: [nfc][libomptarget] Reorganise support header All functions defined in support implementation are now declared in support.h Reordered functions in support implementation to match the sequence in support.h Added include guards to support.h Added #include interface to support.h to provide kmp_Ident declaration Move supporti.h to support.cu and s/INLINE/EXTERN/g Add remaining includes to support.cu A minor side effect is to change the name mangling of the support functions to extern "C". If this matters another macro along the lines of INLINE/EXTERN can be added - perhaps DEVICE as that's the obvious implementation. Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69652	2019-10-31 17:15:02 +00:00
Jon Chesterfield	e9f9dfab82	[libomptarget] Change nvcc compilation to use a unity build Summary: [libomptarget] Change nvcc compilation to use a unity build This allows nvcc to inline functions between what would otherwise be distinct translation units, which in turn removes any runtime cost from implementing functions in source files (as opposed to inline in headers). This will then allow the circular dependencies in deviceRTL to be readily broken and individual components more easily shared between architectures. Reviewers: ABataev, jdoerfert, grokos, RaviNarayanaswamy, hfinkel, ronlieb, gregrodgers Reviewed By: jdoerfert Subscribers: mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69489	2019-10-31 01:58:51 +00:00
Jon Chesterfield	8548e2f543	[nfc][libomptarget] Move named_sync() into target_impl Summary: [nfc][libomptarget] Move named_sync() into target_impl Reviewers: ABataev, jdoerfert, grokos Reviewed By: ABataev Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69487	2019-10-30 16:25:05 +00:00
Jon Chesterfield	74bb5ee674	[nfc][libomptarget] Move smid() into target_impl Summary: [nfc][libomptarget] Move smid() into target_impl Reviewers: ABataev, jdoerfert, grokos Reviewed By: ABataev Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69485	2019-10-30 13:39:15 +00:00
Jon Chesterfield	62a161cc00	[libomptarget] Always call malloc, free via SafeMalloc, SafeFree wrapper Summary: [libomptarget] Always call malloc, free via SafeMalloc, SafeFree wrapper NFC for release, adds some verbosity to debug printing. Motivation is to provide one place where local modifications can be made to the behaviour of all heap allocation or deallocation while debugging. Reviewers: jdoerfert, ABataev, grokos Reviewed By: ABataev Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69492	2019-10-30 13:35:34 +00:00
Alexey Bataev	d7941a6ab9	[LIBOMPTARGET]Fix build, NFC. Need to include nvptx_interface.h in target_impl.h, otherwise the build is failed because of missing __kmpc_impl_lanemask_t type.	2019-10-28 10:43:00 -04:00
Jon Chesterfield	174967f153	[nfc][libomptarget] Decrease coupling between files Summary: [nfc][libomptarget] Decrease coupling between files debug.h used the symbol omptarget_device_environment so implicitly required an include of omptarget-nvptx.h to compile. Similarly interface.h uses size_t. Moving this declaration to a new header means cancel, critical can now build without omptarget-nvptx.h. After this change, debug.h, cancel.cu, critical.cu could move under a common source directory. Reviewers: ABataev, jdoerfert, grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69473	2019-10-27 14:27:54 +00:00
Jon Chesterfield	ad4c42666d	[nfc][libomptarget] Inline option into target_impl Summary: [nfc][libomptarget] Inline option into target_impl Subset of D69423. The macros that were in option.h are all target dependent. Inlining the header simplifies the dependency graph when looking to move code into a common subdir. Reviewers: ABataev, jdoerfert, grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69472	2019-10-27 14:26:55 +00:00
Jon Chesterfield	f7c3c640af	[NFC][libomptarget]Remove TRUE,FALSE macros from option.h Summary: [NFC][libomptarget]Remove TRUE,FALSE macros from option.h Subset of D69423. Patch series ends with removing option.h. Reviewers: ABataev, jdoerfert, grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69463	2019-10-27 01:31:12 +01:00
Jon Chesterfield	197b7b24c3	[NFC][libomptarget] move remaining device specific code out of omptarget-nvptx.h Summary: [NFC][libomptarget] move remaining device specific code out of omptarget-nvptx.h Strictly there is one remaining difference wrt amdgcn - parallelLevel is volatile qualified on amdgcn and not on nvptx. Determining whether this is correct - and how to represent the different semantics of 'volatile' under various conditions - is beyond the scope of this code motion patch. Reviewers: ABataev, jdoerfert, grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69424	2019-10-25 18:58:31 +01:00
Jon Chesterfield	d69d1aa131	[libomptarget][nfc] Make interface.h target independent Summary: [libomptarget][nfc] Make interface.h target independent Move interface.h under a top level include directory. Remove #includes to avoid the interface depending on the implementation. Reviewers: ABataev, jdoerfert, grokos, ronlieb, RaviNarayanaswamy Reviewed By: jdoerfert Subscribers: mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D68615 llvm-svn: 374919	2019-10-15 17:15:26 +00:00
Jon Chesterfield	58fd6b5b9c	[libomptarget][nfc] Update remaining uint32 to use lanemask_t Summary: [libomptarget][nfc] Update remaining uint32 to use lanemask_t Update a few functions in the API to use lanemask_t instead of i32. NFC for nvptx. Also update the ActiveThreads type in DataSharingStateTy. This removes a lot of #ifdef from the downsteam amdgcn implementation. Reviewers: ABataev, jdoerfert, grokos, ronlieb, RaviNarayanaswamy Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D68513 llvm-svn: 373806	2019-10-04 22:30:28 +00:00
Jon Chesterfield	4f75a73796	Use named constant to indicate all lanes, to handle 32 and 64 wide architectures Summary: Use named constant to indicate all lanes, to handle 32 and 64 wide architectures Reviewers: ABataev, jdoerfert, grokos, ronlieb Reviewed By: grokos Subscribers: ronlieb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D68369 llvm-svn: 373793	2019-10-04 21:39:22 +00:00
Sergey Dmitriev	4b343fd84c	[Clang][OpenMP Offload] Create start/end symbols for the offloading entry table with a help of a linker Linker automatically provides __start_<section name> and __stop_<section name> symbols to satisfy unresolved references if <section name> is representable as a C identifier (see https://sourceware.org/binutils/docs/ld/Input-Section-Example.html for details). These symbols indicate the start address and end address of the output section respectively. Therefore, renaming OpenMP offload entries section name from ".omp.offloading_entries" to "omp_offloading_entries" to use this feature. This is the first part of the patch for eliminating OpenMP linker script (please see https://reviews.llvm.org/D64943). Differential Revision: https://reviews.llvm.org/D68070 llvm-svn: 373118	2019-09-27 20:00:51 +00:00
Alexey Bataev	4812941776	[OPENMP][NVPTX]Fix parallel level counter in non-SPMD mode. Summary: In non-SPMD mode we may end up with the divergent threads when trying to increment/decrement parallel level counter. It may lead to incorrect calculations of the parallel level and wrong results when threads are divergent. We need to reconverge the threads before trying to modify the parallel level counter. Reviewers: grokos, jdoerfert Subscribers: guansong, openmp-commits, caomhin, kkwli0 Tags: #openmp Differential Revision: https://reviews.llvm.org/D66802 llvm-svn: 370803	2019-09-03 18:11:50 +00:00
Jon Chesterfield	bbdd282371	[libomptarget] Refactor activemask macro to inline function Summary: [libomptarget] Refactor activemask macro to inline function See also abandoned D66846, split into this diff and others. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Reviewed By: jdoerfert, ABataev Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66851 llvm-svn: 370781	2019-09-03 16:31:30 +00:00
Jon Chesterfield	3294421926	Use target_impl functions to replace more inline asm Summary: Use target_impl functions to replace more inline asm Follow on from D65836. Removes remaining asm shuffles and lanemask accessors Also changes the types of target_impl bitwise functions to unsigned. Reviewers: jdoerfert, ABataev, grokos, Hahnfeld, gregrodgers, ronlieb, hfinkel Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66809 llvm-svn: 370216	2019-08-28 15:04:06 +00:00
Jon Chesterfield	80f9a38a76	[libomptarget] Refactor syncthreads macro to inline function Summary: [libomptarget] Refactor syncthreads macro to inline function See also abandoned D66846, split into this diff and others. Rev 2 of D66855 Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66861 llvm-svn: 370210	2019-08-28 14:22:35 +00:00
Jon Chesterfield	be3d487313	[libomptarget] Refactor syncwarp macro to inline function Summary: [libomptarget] Refactor syncwarp macro to inline function See also abandoned D66846, split into this diff and others. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66857 llvm-svn: 370149	2019-08-28 02:02:53 +00:00
Jon Chesterfield	e73e3013a6	Fix build break due to close brace lost in merge llvm-svn: 370148	2019-08-28 01:56:26 +00:00
Jon Chesterfield	327aa81123	[libomptarget] Refactor shfl_down_sync macro to inline function Summary: [libomptarget] Refactor shfl_down_sync macro to inline function See also abandoned D66846, split into this diff and others. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66853 llvm-svn: 370146	2019-08-28 01:47:41 +00:00
Jon Chesterfield	b9b712df82	[libomptarget] Refactor shfl_sync macro to inline function Summary: [libomptarget] Refactor shfl_sync macro to inline function See also abandoned D66846, split into this diff and others. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66852 llvm-svn: 370144	2019-08-28 01:31:04 +00:00
Alexey Bataev	da8b5cc9f1	[OPENMP][NVPTX]Add __kmpc_syncwarp(int32_t) function. Summary: Added function void __kmpc_syncwarp(int32_t) to expose it to the compiler. It is required to fix the problem with the critical regions in Cuda9.0+. We cannot use barrier in the critical region, but still need to reconverge the threads in the warp after. This function allows to do this. Reviewers: grokos, jdoerfert Subscribers: guansong, openmp-commits, kkwli0, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D66672 llvm-svn: 369933	2019-08-26 17:32:45 +00:00
Alexey Bataev	0366168f3a	[OPENMP][NVPTX]Use __syncwarp() to reconverge the threads. Summary: In Cuda 9.0 it is not guaranteed that threads in the warps are convergent. We need to use __syncwarp() function to reconverge the threads and to guarantee the memory ordering among threads in the warps. This is the first patch to fix the problem with the test libomptarget/deviceRTLs/nvptx/src/sync.cu on Cuda9+. This patch just replaces calls to __shfl_sync() function with the call of __syncwarp() function where we need to reconverge the threads when we try to modify the value of the parallel level counter. Reviewers: grokos Subscribers: guansong, jfb, jdoerfert, caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D65013 llvm-svn: 369796	2019-08-23 18:34:48 +00:00
Jon Chesterfield	ed3324f6b6	Factor architecture dependent code out of loop.cu Summary: [libomptarget] Factor architecture dependent code out of loop.cu Related to the patch series starting D64217. Added subscribers to said series as reviewers. This effort is smaller in scope. This patch factors out just enough architecture dependent code from loop.cu to allow the same source to be used with amdgcn, given a different target_impl.h. Testing is that the same bitcode (modulo variable names) is generated for libomptarget before and after the refactor, for nvptx and the out of tree amdgcn. Reviewers: jdoerfert, ABataev, bollu, jfb, tra, grokos, Hahnfeld, guansong, xtian, gregrodgers, ronlieb, hfinkel, gtbercea, guraypp, arpith-jacob Reviewed By: jdoerfert, ABataev Subscribers: dexonsmith, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D65836 llvm-svn: 368751	2019-08-13 21:41:47 +00:00
Gheorghe-Teodor Bercea	6c7b882e52	[OpenMP][libomptarget] Add support for close map modifier Summary: This patch adds support for the close map modifier. The close map modifier will overwrite the unified shared memory requirement and create a device copy of the data. Reviewers: ABataev, Hahnfeld, caomhin, grokos, jdoerfert, AlexEichenberger Reviewed By: Hahnfeld, AlexEichenberger Subscribers: guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D65340 llvm-svn: 368488	2019-08-09 21:32:57 +00:00
Jonas Hahnfeld	7a0f2dc5a4	[libomptarget] Remove duplicate RTLRequiresFlags per device We have one global RTLs.RequiresFlags, I don't see a need to make a copy per device that the runtime manages. This was problematic anyway because the copy happened during the first __tgt_register_lib(). This made it impossible to call __tgt_register_requires() from normal user funtions for testing. Hence, this change also fixes unified_shared_memory/shared_update.c for older versions of Clang that don't call __tgt_register_requires() before __tgt_register_lib(). Differential Revision: https://reviews.llvm.org/D66019 llvm-svn: 368465	2019-08-09 19:20:39 +00:00
Gheorghe-Teodor Bercea	a1d20506e7	[OpenMP][libomptarget] Add support for unified memory for regular maps Summary: This patch adds support for using unified memory in the case of regular maps that happen when a target region is offloaded to the device. For cases where only a single version of the data is required then the host address can be used. When variables need to be privatized in any way or globalized, then the copy to the device is still required for correctness. Reviewers: ABataev, jdoerfert, Hahnfeld, AlexEichenberger, caomhin, grokos Reviewed By: Hahnfeld Subscribers: mgorny, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D65001 llvm-svn: 368192	2019-08-07 17:29:45 +00:00
Jon Chesterfield	ae0178bee7	Use forceinline. Necessary for nvcc to inline small functions within the bitcode library Summary: [libomptarget] Use forceinline. Necessary for nvcc to inline small functions within the bitcode library Suggested in D65836 Reviewers: ABataev, jdoerfert, grokos, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D65876 llvm-svn: 368177	2019-08-07 15:24:12 +00:00
Alexey Bataev	c10180ed8e	[OPENMP][OFFLOADING]Fix the test, NFC. llvm-svn: 368068	2019-08-06 18:13:39 +00:00
Michael Kruse	78769ec403	[libomptarget] Harmonize emitting CUDA errors and general debug messages. Ensures that CUDA fail reasons (such as "No CUDA-capable device detected") are printed together with libomptarget's debug message (e.g. "Error when setting CUDA context"). Previously, the former was printed only in CMAKE_BUILD_TYPE=Debug builds while the latter was enabled by LIBOMPTARGET_ENABLE_DEBUG. With this change, also only call cuGetErrorString when the error will be printed. Suggested-by: Ye Luo <xw111luoye@gmail.com> Differential Revision: https://reviews.llvm.org/D65687 llvm-svn: 367910	2019-08-05 19:12:10 +00:00
Michael Kruse	2c7a8eaf3d	[OpenMP 5.0] libomptarget interface for declare mapper functions. This patch implements the libomptarget runtime interface for OpenMP 5.0 declare mapper functions. The declare mapper functions generated by Clang will call them to complete the mapping of members. kmpc_mapper_num_components gets the current number of components for a user-defined mapper; kmpc_push_mapper_component pushes back one component for a user-defined mapper. The design slides can be found at https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx Patch by Lingda Li <lildmh@gmail.com> Differential Revision: https://reviews.llvm.org/D60972 llvm-svn: 367772	2019-08-04 04:18:28 +00:00
Alexey Bataev	ca424d100c	[OPENMP][NVPTX]Perform memory flush if number of threads to sync is 1 or less. Summary: According to the OpenMP standard, barrier operation must perform implicit flush operation. Currently, if there is only one thread in the team, barrier does not flush the memory. Patch fixes this problem. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62398 llvm-svn: 367024	2019-07-25 15:02:28 +00:00
Jonas Hahnfeld	6e40ae8f3d	[libomptarget] Handle offload policy in push_tripcount If the first target region in a program calls the push_tripcount function, libomptarget didn't handle the offload policy correctly. This could lead to unexpected error messages as seen in http://lists.llvm.org/pipermail/openmp-dev/2019-June/002561.html To solve this, add a check calling IsOffloadDisabled() as all other entry points already do. If this method returns false, libomptarget is effectively disabled. Differential Revision: https://reviews.llvm.org/D64626 llvm-svn: 366810	2019-07-23 14:20:48 +00:00
Alexey Bataev	da43861b4a	[OpenMP][libomptarget] Suppress C++ 11 related warnings when building libomptarget-nvptx bitcode library, by Doru Bercea. Summary: Pass -std=c++11 flag to compiler to suppress C++ 11 related warnings when building NVPTX bitcode library. Reviewers: ABataev, caomhin, Hahnfeld Reviewed By: ABataev, Hahnfeld Subscribers: jdoerfert, Hahnfeld, jholewinski, mgorny, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D55772 llvm-svn: 366438	2019-07-18 13:54:01 +00:00
Ron Lieberman	59532488b1	[OPENMP] Resolve lost LoopTripCnt for subsequent loops in same thread. Remove loopTripCnt from threaded device stack after consuming it. Added a libomptarget DP message to aid in future debugging and to validate the added testcase, which only runs in Debug build. Differential Revision: https://reviews.llvm.org/D64808 llvm-svn: 366349	2019-07-17 17:07:52 +00:00
Alexey Bataev	85b9651edd	[OPENMP][NVPTX]Fixed checks for cuda versions. Summary: We used CUDART_VERSION macro to check for the installed cuda version but this macro is defined in cuda_runtime_api.h, which is not used by project. Better to use CUDA_VERSION macro, which is defined in cuda.h. Also, added the check if this macro is defined. If macro is undefined, there is something wrong with the cuda configuration and we should not continue the compilation. This also fixes problems with runtime building in cuda 10+. Reviewers: grokos Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D64648 llvm-svn: 366224	2019-07-16 16:07:10 +00:00
Alexey Bataev	42816107f7	[OPENMP]Fix threadid in __kmpc_omp_taskwait call for dependent target calls. Summary: We used to call __kmpc_omp_taskwait function with global threadid set to 0. It may crash the application at the runtime if the thread executing target region is not a master thread. Reviewers: grokos, kkwli0 Subscribers: guansong, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D64571 llvm-svn: 366220	2019-07-16 15:51:32 +00:00
Jonas Hahnfeld	aca476b296	[libomptarget] Fix typos and grammar in error messages, NFC. llvm-svn: 365890	2019-07-12 10:21:55 +00:00
Jonas Hahnfeld	2dfc5179f6	[libomptarget-nvptx] Remove dead functions These entry points are never called by Clang trunk nor clang-ykt. If XL doesn't use them either, they can finally go away. Differential Revision: https://reviews.llvm.org/D52700 llvm-svn: 365817	2019-07-11 20:12:51 +00:00
Alexey Bataev	4ad9286a57	[OPENMP]Rename loopTripCnt member data to LoopTripCnt, NFC. Rename variable to follow LLVM coding standard. llvm-svn: 365368	2019-07-08 18:45:48 +00:00
Alexey Bataev	060921dee7	[OPENMP]Make __kmpc_push_tripcount thread safe. Summary: __kmpc_push_tripcount function is not thread safe and may lead to data race when the target regions are executed in parallel threads. The patch makes loopTripCnt counter thread aware and stores the tripcount value per thread in the map. Access to map is guarded by mutex to prevent data race in the map itself. Test is for NVPTX target because it does not work correctly on the host. Seems to me, there is a problem in libomp with target regions in the parallel threads. Reviewers: grokos Subscribers: guansong, jfb, jdoerfert, openmp-commits, kkwli0, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D64080 llvm-svn: 365332	2019-07-08 15:30:23 +00:00
Alexey Bataev	bb55ece269	[OPENMP][NVPTX]Relax flush directive. Summary: According to the OpenMP standard, flush makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied. According to the Cuda toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#memory-fence-functions), __threadfence() functions provides required functionality. __threadfence_system() also provides required functionality, but it also includes some extra functionality, like synchronization of page-locked host memory, synchronization for the host, etc. It is not required per the standard and we can use more relaxed version of memory fence operation. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62397 llvm-svn: 364572	2019-06-27 18:33:09 +00:00
Gheorghe-Teodor Bercea	aace6d285d	[OpenMP][libomptarget] Add support for declare target to clause under unified memory Summary: This patch adds support for handling variables under the: ``` #pragma omp declare target to() ``` clause when the ``` #pragma omp requires unified_shared_memory ``` is used. The address of the host variable is copied into the device pointer just like for the declare target link case. Reviewers: ABataev, caomhin, grokos, AlexEichenberger Reviewed By: grokos Subscribers: jcownie, guansong, jdoerfert, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D63106 llvm-svn: 363825	2019-06-19 15:48:10 +00:00
Alexey Bataev	8a2bd361eb	[OPENMP][CUDA]Use __syncthreads when compiled by nvcc and clang >= 9.0. Summary: The problems with __syncthreads() were fixed in clang >= 9.0 and the original __syncthreads() can be used instead of the ptx instruction. Reviewers: grokos Subscribers: guansong, jdoerfert, openmp-commits, kkwli0, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D63515 llvm-svn: 363807	2019-06-19 14:20:34 +00:00
Gheorghe-Teodor Bercea	c5fe030c16	[OpenMP][libomptarget] Enable usage of unified memory for declare target link variables Summary: This patch enables the usage of a host variable on the device for declare target link variables when unified memory is available. Reviewers: ABataev, caomhin, grokos Reviewed By: grokos Subscribers: Hahnfeld, guansong, jdoerfert, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D60884 llvm-svn: 362505	2019-06-04 15:05:53 +00:00
Alexey Bataev	e1947b84c1	Revert "[OPENMP][NVPTX]Fix barriers and parallel level counters, NFC." This reverts commit r361421 to split the patch into 3 parts. llvm-svn: 361638	2019-05-24 14:06:47 +00:00
Alexey Bataev	9d9e406684	[OPENMP][NVPTX]Fix barriers and parallel level counters, NFC. Summary: Parallel level counter should be volatile to prevent some dangerous optimiations by the ptxas. Otherwise, ptxas optimizations lead to undefined behaviour in some cases. Also, use __threadfence() for #pragma omp flush and if the barrier should not be used (we have only one thread in the team), still perform flush operation since the standard requires implicit flush when executing barriers. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62199 llvm-svn: 361421	2019-05-22 19:50:32 +00:00
Gheorghe-Teodor Bercea	9e9c918259	[OpenMP][libomptarget] Enable requires flags for target libraries. Summary: Target link variables are currently implemented by creating a copy of the variables on the device side and unified memory never gets exploited. When the prgram uses the: ``` #pragma omp requires unified_shared_memory ``` directive in conjunction with a declare target link, the linked variable is no longer allocated on the device and the host version is used instead. This behavior is overridden by performing an explicit mapping. A Clang side patch is required. Reviewers: ABataev, AlexEichenberger, grokos, Hahnfeld Reviewed By: AlexEichenberger, grokos, Hahnfeld Subscribers: Hahnfeld, jfb, guansong, jdoerfert, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D60223 llvm-svn: 361294	2019-05-21 19:35:02 +00:00
Alexey Bataev	f9e00db818	[OPENMP][NVPTX]Simplify handling of thread limit, NFC. Summary: Patch improves performance of the full runtime mode by moving threads limit counter to the shared memory. It also allows to save global memory. Reviewers: grokos, kkwli0, gtbercea Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D61801 llvm-svn: 360584	2019-05-13 14:21:46 +00:00
Alexey Bataev	f62c266de7	[OPENMP][NVPTX]Improve number of threads counter, NFC. Summary: Patch improves performance of the full runtime mode by moving number-of-threads counter to the shared memory. It also allows to save global memory. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D61785 llvm-svn: 360457	2019-05-10 18:56:05 +00:00
Alexey Bataev	a857e31011	[OPENMP][NVPTX]Improve thread limit counter, NFC. Summary: Patch improves performance of the full runtime mode by moving thread-limit counter to the shared memory. It also allows to save global memory. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61526 llvm-svn: 359922	2019-05-03 20:00:38 +00:00
Alexey Bataev	e031e17919	[OPENMP][NVPTX]Improved several standard OpenMP functions, NFC. Summary: Used parallelLevel[] counter to simplify and improve implementation of the existing standard OpenMP functions. Functions are tested already in several tests, the patch is NFC. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61459 llvm-svn: 359892	2019-05-03 14:47:20 +00:00
Alexey Bataev	8ccb8f8647	[OPENMP][NVPTX]Improve code by using parallel level counter. Summary: Previously for the different purposes we need to get the active/common parallel level and with full runtime we iterated over all the records to calculate this level. Instead, we can used the warp-based parallel level counters used in no-runtime mode. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61395 llvm-svn: 359822	2019-05-02 20:05:01 +00:00
Alexey Bataev	4ad6dbc5fd	[OPENMP][NVPTX]Improve omp_get_max_threads() function. Summary: Function omp_get_max_threads() can always return 1 if current execution mode is SPMD. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61379 llvm-svn: 359792	2019-05-02 14:52:52 +00:00
Alexey Bataev	8e6bf88cf7	[OPENMP][NVPTX]Improved omp_get_thread_limit() function. Summary: Function omp_get_thread_limit() in SPMD mode can return the maximum available number of threads as a result. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D61378 llvm-svn: 359790	2019-05-02 14:46:32 +00:00
Alexey Bataev	c03fe73176	[OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode. Summary: The parallelLevel counter must be on per-thread basis to fully support L2+ parallelism, otherwise we may end up with undefined behavior. Introduce the parallelLevel on per-warp basis using shared memory. It allows to avoid the problems with the synchronization and allows fully support L2+ parallelism in SPMD mode with no runtime. Reviewers: gtbercea, grokos Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D60918 llvm-svn: 359341	2019-04-26 19:30:34 +00:00
Alexey Bataev	5de5d74c8d	[OPENMP][NVPTX] Fix the test, NFC. Fix the test to run it really in SPMD mode without runtime. Previously it was run in SPMD + full runtime mode and does not allow to cehck the functionality correctly. llvm-svn: 358902	2019-04-22 17:25:31 +00:00
Alexey Bataev	13532ea623	[OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions. Summary: If the kernel is executed in SPMD mode and the L2+ parallel for region with the dynamic scheduling is executed, dynamic scheduling functions are called. They expect full runtime support, but SPMD kernels may be executed without the full runtime. It leads to the runtime crash of the compiled program. Patch fixes this problem + fixes handling of the parallelism level in SPMD mode, which is required as part of this patch. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D60578 llvm-svn: 358442	2019-04-15 20:15:20 +00:00
Michael Kruse	d97d5ebcfa	[libomptarget] Introduce LIBOMPTARGET_ENABLE_DEBUG cmake option. At the moment, support for runtime debug output using the OMPTARGET_DEBUG=1 environment variable is only available with CMAKE_BUILD_TYPE=Debug builds. The patch allows setting it independently using the LIBOMPTARGET_ENABLE_DEBUG option, which is enabled by default depending on CMAKE_BUILD_TYPE. That is, unless this option is set explicitly, nothing changes. This is the same mechanism used by LLVM for LLVM_ENABLE_ASSERTIONS. This patch also removes adding -g -O0 in debug builds, it should be handled by cmake's CMAKE_{C\|CXX}_FLAGS_DEBUG configuration option. Idea by Hal Finkel Differential Revision: https://reviews.llvm.org/D55952 llvm-svn: 356998	2019-03-26 15:19:15 +00:00
Gheorghe-Teodor Bercea	06e08f0b0a	[OpenMP][libomptarget] New reduction scheme for team reductions Summary: This patch adds a more sophisticated team reduction scheme to the OpenMP libomptarget-nvptx runtime. The scheme uses a fixed size global memory buffer whose length can be adjusted via compiler flag: ``` -fopenmp-cuda-teams-reduction-recs-num=1024 ``` The global buffer is a structure of arrays (with default size of 1024 each and controlled by the above flag), one array for each reduction variable. Values in the buffer are processed by the last team to finish executing the body of the target region. In addition to adding support for the new flag, the compiler also emits special functions used for the reduction of the intermediate reduction values. These changes will be added in a separate compiler patch following this one. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, jdoerfert, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D58409 llvm-svn: 354471	2019-02-20 14:55:55 +00:00
Chandler Carruth	57b08b0944	Update more file headers across all of the LLVM projects in the monorepo to reflect the new license. These used slightly different spellings that defeated my regular expressions. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351648	2019-01-19 10:56:40 +00:00
Gheorghe-Teodor Bercea	1653633a1c	[OpenMP][libomptarget] Use shared memory variable for tracking parallel level Summary: Replace existing infrastructure for tracking parallel level using global memory with a per-team shared memory variable. This minimizes the impact of the overhead of tracking the parallel level for non-nested cases. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D55773 llvm-svn: 350747	2019-01-09 18:30:14 +00:00
Alexey Bataev	26e6c86b79	[OPENMP][NVPTX]Fix dynamic scheduling. Summary: Previous implementation may cause the runtime crash when the number of teams is > 1024. Patch fixes this problem + reduces number of the atomic operations by 32 times. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56332 llvm-svn: 350524	2019-01-07 14:25:25 +00:00
Alexey Bataev	6b3153ada0	[OPENMP][NVPTX]General formatting/code improvement, NFC. Summary: Formatting. Reviewers: gtbercea, grokos, kkwli0 Subscribers: guansong, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56290 llvm-svn: 350431	2019-01-04 20:16:54 +00:00
Alexey Bataev	dcf2edcdf5	[OPENMP][NVPTX]Improve performance + reduce number of used registers. Summary: Reduced number of the used register + improved performance propagating the information about current execution/data sharing mode directly from the compiler, where it is possible. In some cases, it requires new/reworked interfaces of the runtime external functions. Old functions are marked as deprecated. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56278 llvm-svn: 350405	2019-01-04 17:09:12 +00:00
Joel E. Denny	f17f7a5d4d	[OpenMP] Fix nvidia-cuda-toolkit detection on Debian/Ubuntu The OpenMP runtime's cmake scripts do not correctly locate the libdevice that the Debian/Ubuntu package nvidia-cuda-toolkit currently includes, at least on my Ubuntu 18.04.1 installation. This patch fixes that for me. This problem was discussed at length in D55269. D40453 added a similar adjustment in clang, but reviewers of D55269 concluded that, for the OpenMP runtime, the right place to address this problem is in cmake's CUDA support. However, it was also suggested we could add a workaround to OpenMP's cmake scripts now. This patch contains such a workaround, which I've tried to design so that it will have no harmful effect if cmake improves in the future. nvidia-cuda-toolkit also needs improvements because its intended monolithic CUDA tree shim, /usr/lib/cuda, has many empty directories, such as bin. I reported that at: <https://bugs.launchpad.net/ubuntu/+source/nvidia-cuda-toolkit/+bug/1808999> Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D55588 llvm-svn: 350377	2019-01-04 02:07:13 +00:00
Jonathan Peyton	76f3980a20	[OpenMP] Add omp_get_device_num() and update several other device API functions Add omp_get_device_num() function for 5.0 which returns the number of the device the current thread is running on. Currently, we are leaving it to the compiler to handle this properly if it is called inside target. Also, did some cleanup and updating of duplicate device API functions (in both libomp and libomptarget) to make them into weak functions that check for the symbol from libomptarget, and will call the version in libomptarget if it is present. If any additional device API functions are implemented also in libomptarget in the future, we should add the dlsym calls to the host functions. Also, if the omp_target_* functions are to be implemented for the host (this has been requested), they should attempt to call the libomptarget versions as well. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D55578 llvm-svn: 350352	2019-01-03 21:14:19 +00:00
Alexey Bataev	3c74be8049	[OPENMP][NVPTX]Fix incompatibility of __syncthreads with LLVM, NFC. Summary: One of the LLVM optimizations, split critical edges, also clones tail instructions. This is a dangerous operation for __syncthreads() functions and this transformation leads to undefined behavior or incorrect results. Patch fixes this problem by replacing __syncthreads() function with the assembler instruction, which cost is too high and wich cannot be copied. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56274 llvm-svn: 350333	2019-01-03 17:43:46 +00:00
Vyacheslav Zakharin	e889ac7e6b	[libomptarget] Added install component for libomptarget Differential Revision: https://reviews.llvm.org/D56108 llvm-svn: 350254	2019-01-02 19:39:49 +00:00
Alexey Bataev	d1cd005ec5	[OPENMP][NVPTX]Added/fixed debugging messages, NFC. Summary: Added or fixed new/old debugging messages for the better diagnostics. Reviewers: gtbercea, kkwli0, grokos Reviewed By: grokos Subscribers: caomhin, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D56102 llvm-svn: 350137	2018-12-28 21:36:09 +00:00
Alexey Bataev	28eccf5ba0	[OPENMP][NVPTX]Fixed initialization of the data-sharing interface. Summary: Avoid using of the atomic loop to wait for the completion of the data-sharing interface initialization, use __shfl_sync instead for the communication within the warp to signal other threads in the warp about completion of the initialization. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D56100 llvm-svn: 350129	2018-12-28 17:31:06 +00:00
Alexey Bataev	1708858dbd	[OPENMP][NVPTX]Outline assert into noinline function, NFC. Summary: At high optimization level asserts lead to some unexpected results because of auto-inserted unreachable instructions. This outlining prevents some of such dangerous optimizations and leads to better stability. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D56101 llvm-svn: 350128	2018-12-28 17:29:47 +00:00
Alexey Bataev	9056f1116d	[OPENMP][NVPTX]Revert __kmpc_shuffle_int64 to its original form. Summary: Use the original shuffle implementation for __kmpc_shuffle_int64 since default implementation uses the same implementation. Reviewers: gtbercea Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55514 llvm-svn: 348772	2018-12-10 16:50:36 +00:00
Alexey Bataev	cc6cf64c38	[OPENMP][NVPTX]Enable fast shuffles on 64bit values only if CUDA >= 9. Summary: Shuffle on 64bit data is allowed only for CUDA >= 9.0. Also, fixed the constant for the mask, need one extra L in the end. Reviewers: gtbercea, kkwli0 Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55440 llvm-svn: 348758	2018-12-10 14:29:05 +00:00
Alexey Bataev	8acafff404	[OPENMP][NVPTX]Save registers for optimized builds with enabled logging. Summary: Introduced special noinline function log that allows to save some registers for optimized builds but with enabled logging. Also, it increases the stability of the optimized builds with inlined runtime. Reviewers: gtbercea, kkwli0 Reviewed By: gtbercea Subscribers: caomhin, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D55436 llvm-svn: 348606	2018-12-07 16:08:29 +00:00
Alexey Bataev	653e8ba79a	[OPENMP][NVPTX]Correct type casting for printf args + simplified shfl64 function. Summary: Explicitly casted printf's args to the required types + simplified shfl64 function. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55379 llvm-svn: 348521	2018-12-06 19:45:48 +00:00
Alexey Bataev	5442f3e549	[OPENMP][NVPTX]Fix __kmpc_flush to flush the memory per system, not per block. Summary: According to the standard, after memory flushing the changes in the memory must be visible to all the threads in all teams. Patch fixes this. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55370 llvm-svn: 348491	2018-12-06 15:27:58 +00:00
Gheorghe-Teodor Bercea	10b2e60b7e	[OpenMP][libomptarget] Flush intermediate values during team reduction Summary: Ensure intermediate values of a team reduction are flushed to memory. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D55219 llvm-svn: 348148	2018-12-03 15:21:49 +00:00
Alexey Bataev	0f221f53d8	[OPENMP][NVPTX]Make runtime compatible with the original runtime. Summary: Reworked runtime to make it compatible with the requirements of the original runtime library. Also, simplified some code to reduce number of function calls. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55130 llvm-svn: 348003	2018-11-30 16:52:38 +00:00
Gheorghe-Teodor Bercea	31c1589ab0	[OpenMP][libomptarget] Add new version of SPMD deinit kernel function with argument Summary: To enable the compiler to optimize parts of the function that are not needed when runtime can be omitted, a new version of the SPMD deinit kernel function is needed. This function takes the runtime required flag as an argument. Reviewers: ABataev, kkwli0, caomhin Reviewed By: ABataev Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D54969 llvm-svn: 347714	2018-11-27 21:23:40 +00:00
Alexey Bataev	d4de439cf4	[OPENMP][NVPTX]Basic support for reductions across the teams. Summary: Added functions __kmpc_nvptx_teams_reduce_nowait_simple and __kmpc_nvptx_teams_end_reduce_nowait_simple to implement basic support for reductions across the teams. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D54967 llvm-svn: 347710	2018-11-27 21:06:09 +00:00
Gheorghe-Teodor Bercea	ad8632a9ba	[OpenMP][libomptarget] Refactor SPMD and runtime requirement checking Summary: Refactor the checking for SPMD mode and whether the runtime is initialized or not. This uses constant flags which enables the runtime to optimize out unused sections of code that depend on these flags. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D54960 llvm-svn: 347698	2018-11-27 19:45:10 +00:00
Alexey Bataev	8ab0924ab4	[OPENMP][NVPTX]Improved lock/critical constructs. Summary: Improved support for critical constructs + omp_..._lock... constructs. Reviewers: gtbercea, kkwli0, caomhin Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D54766 llvm-svn: 347342	2018-11-20 20:19:36 +00:00
Alexey Bataev	15ab891e68	[OPENMP]Make lambda mapping follow reqs for PTR_AND_OBJ mapping. Summary: The base pointer for the lambda mapping must point to the lambda capture placement and pointer must point to the captured variable itself. Patch fixes this problem. Reviewers: gtbercea Subscribers: guansong, openmp-commits, kkwli0, caomhin Differential Revision: https://reviews.llvm.org/D54260 llvm-svn: 346407	2018-11-08 15:47:30 +00:00
Alexey Bataev	9476ca7db9	[OPENMP][OFFLOADING]Change the lambda capturing flags. Summary: The previously used combination `PTR_AND_OBJ \| PRIVATE` could be used for mapping of some data in Fortran. Changed it to `PTR_AND_OBJ \| LITERAL`. Reviewers: gtbercea Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D54035 llvm-svn: 345981	2018-11-02 15:24:47 +00:00
Alexey Bataev	463e9f3224	[OPENMP][NVPTX]Fixed/improved support for globalization in team contexts. Summary: Current globalization scheme works correctly only for SPMD+lightweight runtime mode and does not work for full runtime. Patch improves support for the globalization scheme + reduces global memory consumption in lightweight runtime mode. Patch adds runtime functions to work with the statically allocated global memory. It allows to improve performance and memory consumption. This global memory must be allocated by the compiler. Reviewers: grokos, kkwli0, gtbercea, caomhin Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D53943 llvm-svn: 345976	2018-11-02 14:43:23 +00:00
Gheorghe-Teodor Bercea	b10bacf122	[OpenMP][libomptarget] Add runtime function for pushing coalesced global records Summary: In the case of coalesced global records, we need to push the exact data size passed in. This patch fixes this by outlining the common functionality of the previous push function and by adding a separate entry point for coalesced pushes. The pop function remains unchanged. Reviewers: ABataev, grokos, caomhin Reviewed By: ABataev, grokos Subscribers: jholewinski, cfe-commits, Hahnfeld, guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D53141 llvm-svn: 345867	2018-11-01 18:08:12 +00:00
Alexey Bataev	e5369885dd	[LIBOMPTARGET] Add support for mapping of lambda captures. Summary: Added support for correct mapping of variables captured by reference in lambdas. That kind of mapping may appear only in target-executable regions and must follow the original lambda or another lambda capture for the same lambda. The expected data: base address - the address of the lambda, begin pointer - pointer to the address of the lambda capture, size - size of the captured variable. When OMP_TGT_MAPTYPE_PTR_AND_OBJ mapping type is seen in target-executable region, the target address of the last processed item is taken as the address of the original lambda `tgt_lambda_ptr`. Then, the pointer to capture on the device is calculated like `tgt_lambda_ptr + (host_begin_pointer - host_begin_base)` and the target-based address of the original variable (which host address is `(void*)begin_pointer`) is written to that pointer. Reviewers: kkwli0, gtbercea, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D51107 llvm-svn: 345608	2018-10-30 15:42:12 +00:00
Jonas Hahnfeld	a762bfc03a	[libomptarget-nvptx] Enable asserts in bclib If the user requested LIBOMPTARGET_NVPTX_DEBUG, include asserts in the bitcode library. Everything else will have very unpleasent effects because asserts will appear when falling back to the static library libomptarget-nvptx.a. Differential Revision: https://reviews.llvm.org/D52701 llvm-svn: 343477	2018-10-01 14:16:55 +00:00
Jonas Hahnfeld	a1100e6b9a	[libomptarget-nvptx] reduction: Determine if runtime uninitialized Pass in the correct value of isRuntimeUninitialized() which solves parallel reductions as reported on the mailing list. For reference: r333285 did the same for loop scheduling. Differential Revision: https://reviews.llvm.org/D52725 llvm-svn: 343476	2018-10-01 14:14:26 +00:00
Jonas Hahnfeld	1bf767fb8e	[libomptarget-nvptx] Align data sharing stack NVPTX requires addresses of pointer locations to be 8-byte aligned or there will be an exception during runtime. This could happen without this patch as shown in the added test: getId() requires 4 byte of stack and putValueInParallel() uses 16 bytes to store the addresses of the captured variables. Differential Revision: https://reviews.llvm.org/D52655 llvm-svn: 343402	2018-09-30 09:23:21 +00:00
Jonas Hahnfeld	067235f227	[libomptarget-nvptx] Fix ancestor_thread_num and team_size (non-SPMD) According to OpenMP 4.5, p250:12-14: If the requested nest level is outside the range of 0 and the nest level of the current thread, as returned by the omp_get_level routine, the routine returns -1. The SPMD code path will need a similar fix. Differential Revision: https://reviews.llvm.org/D51787 llvm-svn: 343401	2018-09-30 09:23:14 +00:00
Jonas Hahnfeld	fb1b80191e	[libomptarget-nvptx] Add tests for nested parallelism Clang trunk will serialize nested parallel regions. Check that this is correctly reflected in various API methods. Differential Revision: https://reviews.llvm.org/D51786 llvm-svn: 343382	2018-09-29 16:02:32 +00:00
Jonas Hahnfeld	c89a14f5d2	[libomptarget-nvptx] Ignore calls to dynamic API There is no support and according to the OpenMP 4.5, p238:7-9: For implementations that do not support dynamic adjustment of the number of threads this routine has no effect: the value of dyn-var remains false. Add a test that cancellation and nested parallelism aren't supported either. Differential Revision: https://reviews.llvm.org/D51785 llvm-svn: 343381	2018-09-29 16:02:25 +00:00
Jonas Hahnfeld	a743c04412	[libomptarget-nvptx] Fix number of threads in parallel If there is no num_threads() clause we must consider the nthreads-var ICV. Its value is set by omp_set_num_threads() and can be queried using omp_get_max_num_threads(). The rewritten code now closely resembles the algorithm given in the OpenMP standard. Differential Revision: https://reviews.llvm.org/D51783 llvm-svn: 343380	2018-09-29 16:02:17 +00:00
Alexey Bataev	418af6f6cf	[OPENMP] Add the test to check that the libomptarget does not cause infinite loop on removing non-mapped pointer-with-object. Added test to check that libomptarget does not cause infinite loop when trying to unmap the pointer-with-object data that was not previously mapped. llvm-svn: 343344	2018-09-28 17:13:11 +00:00
Jonas Hahnfeld	122dbb5dce	[libomptarget-nvptx] Add testing infrastructure This patch also introduces testing for libomptarget-nvptx which has been missing until now. I propose to add tests for all bugs that are fixed in the future. The target check-libomptarget-nvptx is not run by default because - we can't determine if there is a GPU plugged into the system. - it will require the latest Clang compiler. Keeping compatibility with older releases would prevent testing newer code generation developed in trunk. Differential Revision: https://reviews.llvm.org/D51687 llvm-svn: 343324	2018-09-28 15:05:43 +00:00
Gheorghe-Teodor Bercea	f7256a593f	[OpenMP][libomptarget] Set the frame pointer then test empty slot condition Summary: NFC - just fixing a bug: the empty slot test was before the re-setting of the Stack pointer. Reviewers: ABataev, caomhin, Hahnfeld Reviewed By: ABataev Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D52122 llvm-svn: 343006	2018-09-25 18:48:14 +00:00
Gheorghe-Teodor Bercea	9bc3bfffb4	[OpenMP][libomptarget] Simplify warp master selection for data sharing Summary: There is currently no supported situation where the warp master is not the first thread in the warp. This also avoids the device execution from hanging on Volta GPUs when ballot_sync is called by a number of threads that is less that the size of a warp. Reviewers: ABataev, caomhin, grokos Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D50188 llvm-svn: 342972	2018-09-25 13:23:32 +00:00
Alexey Bataev	022bf16b41	[OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime. Summary: We need the support for per-team shared variables to support codegen for lastprivates/reductions. Patch adds this support by using shared memory if the total size of the reductions/lastprivates is <= 128 bytes, then pre-allocated buffer in global memory if size is <= 4K bytes,or uses malloc/free, otherwise. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51875 llvm-svn: 342737	2018-09-21 14:11:41 +00:00
Alexey Bataev	06b6e0f406	[OPENMP]Increment iterator when the loop is continued. Summary: Missed operation of the incrementing iterator when required just to continue execution. Reviewers: kkwli0, gtbercea, grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51937 llvm-svn: 341964	2018-09-11 17:16:26 +00:00
Jonas Hahnfeld	dc79c7187c	[libomptarget-nvptx] Remove last mentions of __kmpc_print_* Their implementation was removed during review, delete their prototype declarations. llvm-svn: 341748	2018-09-08 12:10:19 +00:00
Jonas Hahnfeld	21e3ee0afe	[libomptarget] Remove two unneeded includes, NFCI. Follow-up to r340542 and r340767. llvm-svn: 341563	2018-09-06 17:00:57 +00:00
Jonas Hahnfeld	f27dcf01d2	[libomptaret][test] Announce compiler features This is a follow-up to r341371: The new test for PR38704 doesn't work with Clang 6.0. It uses an UNSUPPORTED: clang-6, but that hasn't worked because the compiler features weren't known to lit. llvm-svn: 341448	2018-09-05 07:26:00 +00:00
Sergey Dmitriev	b4dc69ff80	[libomptarget] Remove `Devices` from `RTLInfoTy` This patch removes unused field `Devices` from `RTLInfoTy`. Differential Revision: https://reviews.llvm.org/D51653 llvm-svn: 341399	2018-09-04 20:23:09 +00:00
Jonas Hahnfeld	bb51d39871	[libomptarget][CUDA] Use cuDeviceGetAttribute, NFCI. cuDeviceGetProperties has apparently been deprecated since CUDA 5.0. Nvidia started using annotations only in CUDA 9.2, so nobody noticed nor cared before. The new function returns the same values, tested with a P100. Differential Revision: https://reviews.llvm.org/D51624 llvm-svn: 341372	2018-09-04 15:13:28 +00:00
Jonas Hahnfeld	f7f86971e6	[libomptarget] PR38704: Fix erase of ShadowPtrMap erase() invalidates the iterator and returns a new one pointing to the following element. The code now follows the example at https://en.cppreference.com/w/cpp/container/map/erase. (The added testcase crashes without this patch.) Reported by David Binderman (https://llvm.org/PR38704)! Differential Revision: https://reviews.llvm.org/D51623 llvm-svn: 341371	2018-09-04 15:13:23 +00:00
Jonas Hahnfeld	82d20201d0	[libomptarget][NVPTX] Drop dead code and data structures, NFCI. * cg and HasCancel in WorkDescr were never read and can be removed. * This eliminates the last use of priv in ThreadPrivateContext. * CounterGroup is unused afterwards. * Remove duplicate external declares in omptarget-nvptx.cu that are already in the header omptarget-nvptx.h. Differential Revision: https://reviews.llvm.org/D51622 llvm-svn: 341370	2018-09-04 15:13:17 +00:00
Jonas Hahnfeld	96c13488ab	[libomptarget][NVPTX] Fix __kmpc_spmd_kernel_deinit If the runtime is uninitialized the master thread must Enqueue the state object, and ALL threads must return immediately. Found post-commit of https://reviews.llvm.org/D51222. llvm-svn: 341328	2018-09-03 17:24:23 +00:00
Alexey Bataev	39a4724095	[OPENMP][NVPTX] Replace assert() by ASSERT0() macro, NFC. Required to fix the buildbots. llvm-svn: 340956	2018-08-29 19:22:06 +00:00
Alexey Bataev	b7a5d38cf5	[OPENMP][NVPTX] Lightweight runtime support for SPMD mode. Summary: Implemented simple and lightweight runtime support for SPMD mode-based constructs. It adds support for L2 sequential parallelism wihtout full runtime support. Also, patch fixes some use cases for uninitialized\|lightweight runtime. Reviewers: grokos, kkwli0, Hahnfeld, gtbercea Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51222 llvm-svn: 340944	2018-08-29 17:35:09 +00:00
Alexandre Eichenberger	e9b7d8dcd6	[OpenMP][libomptarget] rework of fatal error reporting Summary: Removed the function that used a lock and varargs Used the same mechanism as for debug messages Reviewers: ABataev, gtbercea, grokos, Hahnfeld Reviewed By: gtbercea, Hahnfeld Subscribers: mikerice, ABataev, RaviNarayanaswamy, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51226 llvm-svn: 340767	2018-08-27 18:20:15 +00:00
Alexandre Eichenberger	1b4a666ba5	[OpenMP][libomptarget] Bringing up to spec with respect to OMP_TARGET_OFFLOAD env var Summary: Right now, only the OMP_TARGET_OFFLOAD=DISABLED was implemented. Added support for the other MANDATORY and DEFAULT values. Reviewers: gtbercea, ABataev, grokos, caomhin, Hahnfeld Reviewed By: Hahnfeld Subscribers: protze.joachim, gtbercea, AlexEichenberger, RaviNarayanaswamy, Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D50522 llvm-svn: 340542	2018-08-23 16:22:42 +00:00
Alexey Bataev	37d4156b11	[OPNEMP, NVPTX] Fixed sychronization construct + code cleanup. Summary: 1. Fixed internal problem in `__kmpc_barrier` function: SPMD mode synchronization function should be called only in L1 parallel level. 2. Removed some extra code for synchronization inside of the code, used `__kmpc_barrier` instead. 3. Some code cleanup. Reviewers: gtbercea, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49564 llvm-svn: 337691	2018-07-23 13:52:12 +00:00
George Rokos	a0da24683b	[OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members This patch removes the translation code since this functionality is now implemented in the compiler. target_data_begin and target_data_end are also patched to handle some special cases that used to be handled by the obsolete translation function, namely ensure proper alignment of struct members when we have partially mapped structs. Mapping a struct from a higher address (i.e. not from its beginning) can result in distortion of the alignment for some of its member fields. Padding restores the original (proper) alignment. Differential revision: https://reviews.llvm.org/D44186 llvm-svn: 337455	2018-07-19 13:41:03 +00:00
Joachim Protze	bb869f42b7	[libomptarget] Also support several images for elf In revision r336569 (D49036) libomptarget support for multiple nvidia images has been fixed in case a target region resides inside one or multiple libraries and in the compiled application. But the issues is still present for elf images. This fix will also support multiple images for elf. Patch by Jannis Klinkenberg Reviewers: protze.joachim, ABataev, grokos Reviewed By: protze.joachim, ABataev, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49418 llvm-svn: 337355	2018-07-18 07:23:46 +00:00
Azharuddin Mohammed	6712b8675b	[cmake] Fix libomptarget/test/CMakeLists.txt Summary: Should be variable name instead of variable reference. If the variable is somehow unset, it messes up the if condition expression and causes a CMake error. Reviewers: jlpeyton, AndreyChurbanov, Hahnfeld Reviewed By: Hahnfeld Subscribers: mgorny, llvm-commits, openmp-commits Differential Revision: https://reviews.llvm.org/D47221 llvm-svn: 337133	2018-07-15 17:29:43 +00:00
Gheorghe-Teodor Bercea	9e94326185	[OpenMP][libomptarget] Fix data sharing and globalization infrastructure to work in SPMD mode Summary: This patch fixes the data sharing infrastructure to work for the SPMD and non-SPMD cases. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: ABataev, grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D49204 llvm-svn: 337013	2018-07-13 16:14:22 +00:00
Alexey Bataev	c2c0138a04	[OPENMP, NVPTX] Fix loop boundaries calculation for dynamic loops. Summary: Patch fixes the next problems. 1. Removes unused functions from omptarget_nvptx_ThreadPrivateContext class + simplified data members. 2. Fixed calculation of loop boundaries for dynamic loops with static scheduling. 3. Introduced saving/restoring of the dynamic loop boundaries to support several nested parallel dynamic loops. Reviewers: grokos Subscribers: guansong, kkwli0, openmp-commits Differential Revision: https://reviews.llvm.org/D49241 llvm-svn: 336915	2018-07-12 15:18:28 +00:00
Alexey Bataev	2622e9e5b3	[OPENMP, NVPTX] Support several images in the executable. Summary: Currently Cuda plugin supports loading of the single image, though we may have the executable with the several images, if it has target regions inside of the dynamically loaded library. Patch allows to load multiple images. Reviewers: grokos Subscribers: guansong, openmp-commits, kkwli0 Differential Revision: https://reviews.llvm.org/D49036 llvm-svn: 336569	2018-07-09 17:46:55 +00:00
Alexey Bataev	3994bafbc7	[OPENMP, NVPTX] Sync threads before start ordered loops. Summary: Threads must be synchronized before starting ordered construct. Reviewers: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D48732 llvm-svn: 335987	2018-06-29 16:16:00 +00:00
Alexey Bataev	0ac29350b5	[OPENMP, NVPTX] Fixes for NVPTX RTL Summary: Patch fixes several problems in the implementation of NVPTX RTL. 1. Detection of the last iteration for loops with static scheduling, no chunks. 2. Fixes reductions for the serialized parallel constructs. 3. Fixes handling of the barriers. Reviewers: grokos Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D48480 llvm-svn: 335469	2018-06-25 13:43:35 +00:00
Guansong Zhang	f9e56e5982	[OpenMP] [CUDA] Expose teamid to the debug path Summary: Small bug fix for debug build. A previous fix causing trouble for debug build. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D48286 llvm-svn: 335046	2018-06-19 14:05:38 +00:00
Jonas Hahnfeld	17aabf83e9	[libomptarget-nvptx] loop: Determine if runtime uninitialized The generic entry points for static loop scheduling previously hardcoded that the runtime was initialized. This can be wrong if the compiler analyzes that the runtime is not needed and calls the init functions accordingly. This didn't affect clang-ykt because they have entry points for different combinations of SPMD x Runtime not needed. I didn't do measurements yet but with inlining we might get away with always calling the generic interface and letting compiler and runtime figure out the rest. In any case, a correct runtime is always better than having functions that may only be called if previous calls passed in a specific set of arguments! Differential Revision: https://reviews.llvm.org/D47131 llvm-svn: 333285	2018-05-25 15:56:48 +00:00
Jonas Hahnfeld	65e0b8784c	[CMake] Unify install path for libraries Introduce OPENMP_INSTALL_LIBDIR and use in all install() commands. This also fixes installation of libomptarget-nvptx that previously didn't honor {OPENMP,LLVM}_LIBDIR_SUFFIX. Differential Revision: https://reviews.llvm.org/D47130 llvm-svn: 333284	2018-05-25 15:56:41 +00:00
George Rokos	6da6f433a0	[CUDA]Fix dynamic\|guided scheduling. The existing implementation of the dynamic scheduling breaks the contract introduced by the original openmp runtime and, thus, is incorrect. Patch fixes it and introduces correct dynamic scheduling model. Thanks to Alexey Bataev for submitting this patch. Differential Revision: https://reviews.llvm.org/D47333 llvm-svn: 333225	2018-05-24 21:12:41 +00:00
Jonas Hahnfeld	9228f9718c	[libomptarget-nvptx-bc] Pass found CUDA installations We already know where the CUDA SDK is, so there is no point in letting Clang search for it again and possibly finding no or a different installation. --cuda-path is supported since the beginning of CUDA support in Clang, so making this required doesn't impose additional restrictions. Differential Revision: https://reviews.llvm.org/D46930 llvm-svn: 332495	2018-05-16 17:20:27 +00:00
Jonas Hahnfeld	37bbe1a698	[libomptarget-nvptx] Test bitcode compiler flags and enable by default Move all logic related to selecting the bitcode compiler and linker into a new file and dynamically test required compiler flags. This also adds -fcuda-rdc for Clang trunk as previously attempted in D44992 which fixes the build. As a result this change also enables building the library by default if all prerequisites are met. Differential Revision: https://reviews.llvm.org/D46901 llvm-svn: 332494	2018-05-16 17:20:21 +00:00
Gheorghe-Teodor Bercea	787a350021	[OpenMP][libomptarget] Add function for checking SPMD mode Summary: Add function to the NVPTX libomptarget library that will return true if the current target region is being executed in SPMD mode. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D46840 llvm-svn: 332360	2018-05-15 15:16:43 +00:00
Guansong Zhang	e1c7a46d5b	[OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side Summary: Enable the device side debug messages at compile time, use env var to control at runtime. To achieve this, an environment data block is passed to the device lib when it is loaded. By default, the message is off, to enable it, a user need to set LIBOMPDEVICE_DEBUG=1. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D46210 llvm-svn: 331550	2018-05-04 19:29:28 +00:00
Guansong Zhang	ad6c26516b	[OpenMP] Remove compilation warning when using clang to compile bc files. Summary: Minor printf format correction. NVCC ignore those. Clang will give warning on these if debug is enabled. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D45528 llvm-svn: 330944	2018-04-26 14:06:53 +00:00
Guansong Zhang	334c379e32	[OpenMP] Make bc file compilation sensitive to LIBOMPTARGET_NVPTX_DEBUG flag Summary: The LIBOMPTARGET_NVPTX_DEBUG flag is inconsistent between using nvcc to generate .a file and clang to generate .bc file. Sync the two setting so we can get debug messages from the bc file path as well. Reviewers: grokos Subscribers: Hahnfeld, openmp-commits, mgorny Tags: #openmp Differential Revision: https://reviews.llvm.org/D45530 llvm-svn: 330477	2018-04-20 20:41:00 +00:00
Guansong Zhang	f679431f91	[OpenMP] Remove extra warning when we build Summary: This one line change is to remove this warning message "warning: integer conversion resulted in a change of sign" Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D45415 llvm-svn: 329713	2018-04-10 15:28:31 +00:00
Guansong Zhang	f0029a7738	Revert "[OpenMP] enable bc file compilation using the latest clang" This reverts commit 6849e31c36d712d97433bca9af39b7a09c8c1207. llvm-svn: 329576	2018-04-09 14:45:41 +00:00
Guansong Zhang	e47fbc9da8	[OpenMP] enable bc file compilation using the latest clang Summary: adding cuda-rdc flag to allow extern global data Reviewers: grokos Reviewed By: grokos Subscribers: gregrodgers, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D44992 llvm-svn: 329072	2018-04-03 15:01:34 +00:00
Gheorghe-Teodor Bercea	4bc36a06e2	[OpenMP][libomptarget] Initialize global memory stack only once. Summary: The global stack initialization function may be called multiple times. The initialization of the shared memory slots should only happen when the function is called for the first time for a given warp master thread. Reviewers: grokos, carlo.bertolli, ABataev, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44754 llvm-svn: 328148	2018-03-21 21:02:55 +00:00
Gheorghe-Teodor Bercea	b4332ca3da	[OpenMP][libomptarget] Fix master warp check Summary: The check for the master warp must take into consideration the actual number of warps: the master warp is equal to the last active warp not necessarily WARPSIZE - 1. Reviewers: grokos, carlo.bertolli, ABataev, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44537 llvm-svn: 328146	2018-03-21 20:51:16 +00:00
Gheorghe-Teodor Bercea	c8d395a168	[OpenMP][libomptarget] Enable globalization for workers Summary: This patch allows worker to have a global memory stack managed by the runtime. This patch is needed for completeness and consistency with the globalization policy: if a worker-side variable escapes the current context it then needs to be globalized. Until now, only the master thread was allowed to have such a stack. These global values can now potentially be shared amongst workers if the semantics of the OpenMP program require it. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44487 llvm-svn: 328144	2018-03-21 20:34:19 +00:00
George Rokos	6b9bb5e1c2	Bugfix, extern declarations for libomp functions are `extern "C"` declarations llvm-svn: 327763	2018-03-17 02:07:42 +00:00
George Rokos	2878c3957b	Moved extern declarations to private header file, they are only used from within libomptarget, they don't need to be in omptarget.h. llvm-svn: 327740	2018-03-16 20:40:09 +00:00
Gheorghe-Teodor Bercea	876c1ed2e5	[OpenMP][libomptarget] Enable usage of shared memory slots Summary: Allow the runtime to use the existing shared memory statically allocated slots. When a variable is globalized, the underlying memory can be either shared or global memory (both have block-wide visibility). In this case, we allow that the storage to use a limited amount of shared memory that has been statically allocated already. Only if shared memory doesn't prove to be enough do we then invoke malloc() to create a new global memory slot. Reviewers: ABataev, carlo.bertolli, grokos, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44486 llvm-svn: 327639	2018-03-15 16:05:34 +00:00
Gheorghe-Teodor Bercea	f3de222b0d	[OpenMP][libomptarget] Enable multiple frames per global memory slot Summary: To save on calls to malloc, this patch enables the re-use of pre-allocated global memory slots. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44470 llvm-svn: 327637	2018-03-15 15:56:04 +00:00
George Rokos	59be4b434f	[libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread. llvm-svn: 327556	2018-03-14 19:11:36 +00:00
Gheorghe-Teodor Bercea	49b62649cf	[OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing. Summary: This patch adds support for the sharing of variables from the master thread of a team to the worker threads of the team. The runtime uses a stack structure implemented as a doubly-linked list of slots with each slot having the exact same size as the size requested. This implementation leverages existing data structures. The runtime functions are added as separate functions to avoid interfering with the current interface. Limitations to be addressed in future patches: - This current patch only employs global memory. In a future patch we will enable to usage for shared memory as an optimization. - Allow the allocation of several requested sizes in the same slot. Reviewers: ABataev, grokos, caomhin, carlo.bertolli Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44260 llvm-svn: 327440	2018-03-13 19:44:53 +00:00
Gheorghe-Teodor Bercea	d5e5992f9a	[OpenMP][libomptarget] Fix union. Summary: To make the two parts of the union have the same size, the size of vect needs to be increased by 16 bits. Reviewers: grokos, carlo.bertolli, caomhin, ABataev Reviewed By: grokos, ABataev Subscribers: fedor.sergeev, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44254 llvm-svn: 327040	2018-03-08 18:44:02 +00:00
Gheorghe-Teodor Bercea	7a5fa21ae2	[OpenMP] Remove implicit data sharing using device shared memory from libomptarget Summary: This patch reverts the changes to libomptarget that were coupled with the changes to Clang code gen for data sharing using shared memory. A similar patch exists for Clang: D43625 Shared memory is meant to be used as an optimization on top of a more general scheme. So far we didn't have a global memory implementation ready so shared memory was a solution which applied to the current level of OpenMP complexity supported by trunk on GPU devices (due to the missing NVPTX backend patch this functionality has never been exercised). Now that we have a global memory solution this patch is "in the way" and needs to be removed (for now). This patch (or an equivalent version of it) will be put out for review once the global memory scheme is in place. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D43626 llvm-svn: 326950	2018-03-07 22:10:10 +00:00
Gheorghe-Teodor Bercea	d5ae4e6501	[OpenMP][libomptarget] Enable the compilation of multiple bc libraries for runtime inlining Summary: Different NVIDIA GPUs support different compute capabilities. To enable the inlining of runtime functions and the best performance on different generations of NVIDIA GPUs, a bc library for each compute capability needs to be compiled. The same compiler build will then be usable in conjunction with multiple generations of NVIDIA GPUs. To differentiate between versions of the same bc lib, the output file name will contain the compute capability ID. Depends on D14254 Reviewers: Hahnfeld, hfinkel, carlo.bertolli, caomhin, ABataev, grokos Reviewed By: Hahnfeld, grokos Subscribers: guansong, mgorny, openmp-commits Differential Revision: https://reviews.llvm.org/D41724 llvm-svn: 324904	2018-02-12 16:45:20 +00:00
Jonas Hahnfeld	3cfaf3dd0d	[libomptarget] Fix detection of CUDA stubs library CUDA_LIBRARIES contains additional linker arguments since CMake 3.3 which breakes the current way of finding the stubs library. llvm-svn: 324879	2018-02-12 11:01:56 +00:00

... 2 3 4 5 6 ...

395 Commits