llvm-project

Commit Graph

Author	SHA1	Message	Date
Vyacheslav Zakharin	f2f88f3e7a	An attempt to abandon omptarget out-of-tree builds. I want to start using LLVM component libraries in libomptarget to stop duplicating implementations already available in LLVM (e.g. LLVMObject, LLVMSupport, etc.). Without relying on LLVM in all libomptarget builds one has to provide fallback implementation for each used LLVM feature. This is an attempt to stop supporting out-of-llvm-tree builds of libomptarget. I understand that I may need to revert this, if this affects downstream projects in a bad way. Differential Revision: https://reviews.llvm.org/D101509	2021-05-07 12:43:50 -07:00
Joseph Huber	a15f8589f4	[libomptarget] Add support for target memory allocators to cuda RTL Summary: The allocator interface added in D97883 allows the RTL to allocate shared and host-pinned memory from the cuda plugin. This patch adds support for these to the runtime. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D102000	2021-05-07 10:27:02 -04:00
Jon Chesterfield	44ee974e2f	[libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one [libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one D101976 would require a second barrier instance. This NFC to amdgpu makes it simpler to add one (an extra global, one more line in init). Also renames the current barrier to L0. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102016	2021-05-06 23:52:19 +01:00
Jon Chesterfield	7e9351b9de	[libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin [libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin Drops an enum that was identical to a HSA one, localises some functions where they were only called from one TU. Covers everything internalize + adce can identify as dead, except for msgpack::dump which is useful when debugging. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102014	2021-05-06 23:16:32 +01:00
Pushpinder Singh	ae845d6426	[AMDGPU][OpenMP] Enable Libomptarget runtime tests This enables the runtime tests on amdgpu targets. 10 tests have been marked as XFAIL on amdgcn currently mostly due to missing printf. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D99656	2021-05-03 05:56:42 +00:00
Michael Kruse	7308862ff5	[OpenMP][CMake] Use in-project clang as CUDA->IR compiler. If available, use the clang that is already built in the same project as CUDA compiler unless another executable is explicitly defined. This also ensures the generated deviceRTL IR will be consistent with the version of Clang. This patch is required to reliably test OpenMP offloading in a buildbot without either a two-stage build (e.g. with LLVM_ENABLE_RUNTIMES) or a separately installed clang on the worker that will eventually become outdated. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101265	2021-04-30 12:45:52 -05:00
Michael Kruse	3244a8b536	[OpenMP][CMake] Pass --cuda-path to regression tests. The OpenMP runtime can be compiled using a CUDA installed at non-default location with the -DCUDA_TOOLKIT_ROOT_DIR setting. However, check-openmp will fail afterwards because Clang needs to know where to find the CUDA headers. Fix by passing -cuda-path to Clang using the value of CUDA_TOOLKIT_ROOT_DIR which has been determined by CMake. Also set LD_LIBRARY_PATH such that it can find the cuda runtime when executing. This will ensure that the regression test do not depend on the current environment, but use the environment it was configured for. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101266	2021-04-27 16:27:40 -05:00
Joachim Protze	24f836e8fd	[OpenMP][libomptarget] Separate lit tests for different offloading targets (2/2) This patch fuses the RUN lines for most libomptarget tests. The previous patch D101315 created separate test targets for each supported offloading triple. This patch updates the RUN lines in libomptarget tests to use a generic run line independent of the offloading target selected for the lit instance. In cases, where no RUN line was defined for a specific offloading target, the corresponding target is declared as XFAIL. If it turns out that a test actually supports the target, the XFAIL line can be removed. Differential Revision: https://reviews.llvm.org/D101326	2021-04-27 15:54:32 +02:00
Joachim Protze	b845217b1d	[OpenMP][libomptarget] Separate lit tests for different offloading targets (1/2) This patch creates a separate test directory for each offloading target to be tested. This allows to test multiple architectures in one configuration, while still see all failing tests separately. The lit test names include the target triple, so that it will be easier to spot the failing target. This patch also allows to mark expected failing tests based on the target-triple, as the currently used triple is added to the lit "features": ``` // XFAIL: nvptx64-nvidia-cuda ``` Differential Revision: https://reviews.llvm.org/D101315	2021-04-27 12:30:01 +02:00
Jon Chesterfield	58f125493d	[libomptarget] Enable AMDGPU devicertl [libomptarget] Enable AMDGPU devicertl The amdgpu devicertl is written in freestanding openmp and compiles to a bitcode library (per listed gfx arch) with no unresolved symbols. It requires a recent clang, preferably the one from the same monorepo checkout. This is D98658, with printf explicitly stubbed out, after patching clang to no longer require an llvm with the amdgpu target enabled. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101213	2021-04-24 02:24:44 +01:00
Johannes Doerfert	17330a3cb1	[OpenMP] Avoid reading uninitialized parallel level values In a last minute change request for `a2dbfb6b72` we introduced a read of the uninitialized parallel level value in SPMD-mode. We go back to initializing the array early and checking for an adjusted level. Found by the miniqmc unit tests: https://cdash.qmcpack.org/CDash/viewTest.php?onlyfailed&buildid=203434 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101123	2021-04-23 11:21:58 -05:00
Joseph Huber	59b6849012	[OpenMP] Replace global InfoLevel with a reference to an internal one. Summary: This patch improves the implementation of D100774 by replacing the global variable introduced with a function that returns a reference to an internal one. This removes the need to define the variable in every plugin that uses it. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101102	2021-04-23 09:43:46 -04:00
Joseph Huber	2b6f20082e	[OpenMP] Add function for setting LIBOMPTARGET_INFO at runtime Summary: This patch adds a new runtime function __tgt_set_info_flag that allows the user to set the information level at runtime without using the environment variable. Using this will require an extern function, but will eventually be added into an auxilliary library for OpenMP support functions. This patch required moving the current InfoLevel to a global variable which must be instantiated by each plugin. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100774	2021-04-22 12:48:11 -04:00
Alexey Bataev	ca70512099	[OPENMP]Mark test as unsupported to avoid possible unexpected passes, NFC.	2021-04-22 08:06:25 -07:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
Alexey Bataev	079884225a	[OPENMP]Fix PR49698: OpenMP declare mapper causes segmentation fault. The implicitly generated mappings for allocation/deallocation in mappers runtime should be mapped as implicit, also no need to clear member_of flag to avoid ref counter increment. Also, the ref counter should not be incremented for the very first element that comes from the mapper function. Differential Revision: https://reviews.llvm.org/D100673	2021-04-21 10:38:31 -07:00
Hansang Bae	9b98497b44	[OpenMP] Add omp_target_is_accessible() to header files -- Added omp_target_is_accessible to the header files -- Added missing const qualifier to device memory routines Differential Revision: https://reviews.llvm.org/D100420	2021-04-16 07:54:15 -05:00
Joseph Huber	83d4b2e2e0	[OpenMP] Add info for device table changes Summary: This patch adds a feature to print information whenever the host-device pointer mapping table is changed by inserting or removing an entry. This introduces a new bit field for LIBOMPTARGET_INFO at position 0x8. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100600	2021-04-15 18:39:48 -04:00
Hansang Bae	3da61ddae7	[OpenMP] Define omp_is_initial_device() variants in omp.h omp_is_initial_device() is marked as a built-in function in the current compiler, and user code guarded by this call may be optimized away, resulting in undesired behavior in some cases. This patch provides a possible fix for such cases by defining the routine as a variant function and removing it from builtin list. Differential Revision: https://reviews.llvm.org/D99447	2021-04-06 16:58:01 -05:00
Joseph Huber	0af4e74aef	[OpenMP][NFC] Fix typo in libomptarget error message Summary: There was a typo suggesting the user to use `LIBOMPTARGET_DEBUG` instead of `LIBOMPTARGET_INFO`	2021-04-01 12:45:28 -04:00
Joseph Huber	29338459fb	[OpenMP] Trim error messages in CUDA plugin Summary: Remove some of the error messages printed when the CUDA plugin fails. The current error messages can be confusing because they are the first error messages printed after the async stream finds an error. This means that the printed values aren't related to what caused the issue, but are simply the last asyncronous operation that succeeded on the device. Remove these as they can be misleading. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D99510	2021-03-29 12:20:19 -04:00
Alexey Bataev	0411b23319	[OPENMP]Map data field with l-value reference types. Added initial support dfor the mapping of the data members with l-value reference types. Differential Revision: https://reviews.llvm.org/D98812	2021-03-29 07:07:09 -07:00
Joseph Huber	16064e71e9	[OpenMP] Reset async stream properly upon failure Summary: If the call to `synchronize` fails, it will currently block the stream indefinitely if execution is continued from this point. Additionally, if the program exits it will trigger an assertion on the non-null value of the async queue and prevent the runtime from printing debugging information. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D99443	2021-03-26 19:05:06 -04:00
Jon Chesterfield	626a31de15	[libomptarget] Add register usage info to kernel metadata Add register usage information to the runtime metadata so that it can be used during kernel launch (that change will be in a different commit). Add this information to the kernel trace. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D98829	2021-03-18 17:00:42 +00:00
Jon Chesterfield	dbf8f2b089	Revert "[libomptarget] Build amdgcn devicertl by default" This reverts commit `e23f3502d9`. It broke the build of openmp for clang built without amdgcn support. D98746, under review, would allow this to reland.	2021-03-17 11:34:44 +00:00
Johannes Doerfert	0a954a528b	[OpenMP][FIX] Repair accidental replacement of _shfl_sync with _shfl This was broken accidentally in D95752. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D98677	2021-03-15 22:46:00 -05:00
Jon Chesterfield	e23f3502d9	[libomptarget] Build amdgcn devicertl by default [libomptarget] Build amdgcn devicertl by default The cmake for this looks for an llvm install and does the right thing when building as part of enable_runtimes. It will probably do the right thing in other settings - at least, it won't try to build this with gcc. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98658	2021-03-15 23:17:50 +00:00
Jon Chesterfield	bb38d7ff05	[libomptarget][nfc][amdgcn] Use precise triple for devicertl build	2021-03-15 20:24:13 +00:00
Jon Chesterfield	d0bc85f04a	[libomptarget][nfc] Drop unused DEVICE macro [libomptarget][nfc] Drop unused DEVICE macro Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98655	2021-03-15 20:12:50 +00:00
Jon Chesterfield	7da76aaaf4	[libomptarget] Build amdgpu plugin by default [libomptarget] Build amdgpu plugin by default This will build the amdgpu plugin if cmake is able to find the hsa runtime library, which will be the case if rocm is installed or if the hsa library has been installed somewhere cmake looks. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98654	2021-03-15 20:12:01 +00:00
Jon Chesterfield	bcb3f0f867	[libomptarget] Fix devicertl build [libomptarget] Fix devicertl build The target specific functions in target_interface are extern C, but the implementations for nvptx were mostly C++ mangling. That worked out as a quirk of DEVICE macro expanding to nothing, except for shuffle.h which only forward declared the functions with C++ linkage. Also implements GetWarpSize, as used by shuffle, and includes target_interface in nvptx target_impl.cu to help catch future divergence between interface and implementation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98651	2021-03-15 19:50:22 +00:00
Jon Chesterfield	f675b3df48	[libomptarget] Drop assert.h, use freestanding for amdgcn devicertl [libomptarget] Drop assert.h, use freestanding for amdgcn devicertl Promotes the runtime assert to a link time error for the unimplemented fallback functions. Enables amdgcn to build with only clang provided headers, which makes it less likely to break other builds when enabled. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98649	2021-03-15 18:50:09 +00:00
Jon Chesterfield	156842937f	[libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding [libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding The glibc headers are a periodic source of problems compiling the devicertl. This patch resolves the following error run into while building llvm on a slightly different linux system. ``` In file included from .../lib/clang/13.0.0/include/inttypes.h:21: In file included from /usr/include/inttypes.h:25: /usr/include/features.h:461:12: fatal error: 'sys/cdefs.h' file not found # include <sys/cdefs.h> ^~~~~~~~~~~~~ ``` As a second patch, removing assert.h from shuffle will let amdgcn build as -ffreestanding, at which point only the headers that clang itself provides are used and interactions with the host glibc are eliminated. Doing the same for nvptx is complicated by printf handling but also seems worthwhile. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98565	2021-03-15 16:54:58 +00:00
George Rokos	2468fdd9af	[libomptarget] Add allocator support for target memory This patch adds the infrastructure for allocator support for target memory. Three allocators are introduced for device, host and shared memory. The corresponding API functions have the llvm_ prefix temporarily, until they become part of the OpenMP standard. Differential Revision: https://reviews.llvm.org/D97883	2021-03-13 03:47:07 -08:00
Johannes Doerfert	5449fbb5d4	[OpenMP][NFC] Use `AsyncInfo` as the variable name for a `__tgt_async_info` Reviewed By: grokos, tianshilei1992 Differential Revision: https://reviews.llvm.org/D96444	2021-03-11 23:31:34 -06:00
Johannes Doerfert	66ba494b49	[OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant The shuffle idiom is differently implemented in our supported targets. To reduce the "target_impl" file we now move the shuffle idiom in it's own self-contained header that provides the implementation for AMDGPU and NVPTX. A fallback can be added later on. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95752	2021-03-11 23:31:30 -06:00
Joseph Huber	807466ef28	[OpenMP] Restore backwards compatibility for libomptarget Summary: The changes introduced in D87946 changed the API for libomptarget functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x but was not given a backward-compatible interface. This change will require people using Clang 13.x or 12.x to recompile their offloading programs. Reviewed By: jdoerfert cchen Differential Revision: https://reviews.llvm.org/D98358	2021-03-11 09:52:11 -05:00
Shilei Tian	c41ae246ac	[OpenMP][Clang][NVPTX] Only build one bitcode library for each SM In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on NVPTX target. We don't need to have macros in source code to select right functions based on CUDA version. we don't need to compile multiple bitcode libraries of different CUDA versions for each SM. We don't need to worry about future compatibility with newer CUDA version. `-target-feature +ptx61` is used in this patch, which corresponds to the highest PTX version that CUDA 9.2 can support. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97198	2021-03-08 12:03:04 -05:00
Joel E. Denny	d0eb25a643	[OpenMP] Encapsulate more in checkDeviceAndCtors This patch just encapsulates some repeated code. To do so, it relocates some functions from interface.cpp to omptarget.cpp. It also adjusts them to the LLVM coding style. This patch is almost NFC except some `DP` messages are a bit different. For example, messages like "Entering target region" are now emitted even if offload is disabled, but a subsequent "Offload is disabled" is then emitted. Reviewed By: jdoerfert, grokos Differential Revision: https://reviews.llvm.org/D97908	2021-03-04 12:03:42 -05:00
Joel E. Denny	bfe5452b93	[OpenMP] Fix lone target exit data Without this patch, an `omp target exit data` before the runtime is initialized produces a runtime error. This patch fixes that by changing `__tgt_target_data_end_mapper` to call `CheckDeviceAndCtors` like many other runtime routines. Discussed at <https://lists.llvm.org/pipermail/openmp-dev/2021-March/003920.html>. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D97907	2021-03-04 12:03:42 -05:00
Joel E. Denny	10c18c69f2	[OpenMP] Fix support for device as host Without this patch, when the offload device is set to `omp_get_initial_device()`, the runtime fails with an error diagnostic when entering target regions or target data regions. However, OpenMP 5.1, sec. 2.14.5 "target Construct", "Restrictions", p. 203, L3-5 states: > The device clause expression must evaluate to a non-negative integer > value that is less than or equal to the value of > omp_get_num_devices(). Sec. 3.7.7 "omp_get_initial_device", p. 412, L2-3 states: > The value of the device number is the value returned by the > omp_get_num_devices routine. Similarly, OpenMP 5.0, sec. 2.12.5 "target Construct", "Restrictions", p. 174 L30-32 states: > The device clause expression must evaluate to a non-negative integer > value less than the value of omp_get_num_devices() or to the value > of omp_get_initial_device(). This patch fixes this behavior by changing the runtime to behave as if offloading is disabled whenever it finds the offload device (either from a `device` clause or the default device) is set to the host device. In the case of mandatory offloading when `omp_get_num_devices() == 0`, it incorporates the behavior proposed for OpenMP 5.2 in OpenMP spec github issue 2669. Reviewed By: grokos, RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D97616	2021-03-04 12:03:42 -05:00
Alexey Bataev	0caf736d7e	[OPENMP50]Mapping of the subcomponents with the 'default' mappers. If the mapped structure has data members, which have 'default' mappers, need to map these members individually using their 'default' mappers. Differential Revision: https://reviews.llvm.org/D92195	2021-03-02 07:11:06 -08:00
Vyacheslav Zakharin	6baeeb9efa	[libomptarget] Fixed MSVC build fail caused by __attribute__((used)). Differential Revision: https://reviews.llvm.org/D97348	2021-02-24 09:59:39 -08:00
Shilei Tian	e5da63d5a9	[OpenMP] Fixed a crash when offloading to x86_64 with target nowait PR#49334 reports a crash when offloading to x86_64 with `target nowait`, which is caused by referencing a nullptr. The root cause of the issue is, when pushing a hidden helper task in `__kmp_push_task`, it also maps the gtid to its shadow gtid, which is wrong. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97329	2021-02-24 12:37:30 -05:00
Manoel Roemmer	542d9c2154	[libomptarget] Load images in order of registration This makes sure that images are loaded in the order in which they are registered with libomptarget. If a target can load multiple images and these images depend on each other (for example if one image contains the programs target regions and one image contains library code), then the order in which images are loaded can be important for symbol resolution (for example, in the VE plugin). In this case: because the same code exist in the host binaries, the order in which the host linker loads them (which is also the order in which images are registered with libomptarget) is the order in which the images have to be loaded onto the device. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95530	2021-02-24 18:15:41 +01:00
Shilei Tian	f6c2984a09	[OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM `ptx71` is not supported in release version of LLVM yet. As a result, the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned in D97004. Since the support in D97004 is just a WA for releease, and we'll not use it in the near future, using `ptx70` for CUDA 11 is feasible. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97195	2021-02-23 13:20:21 -05:00
Shilei Tian	309b00a42e	[OpenMP][NFC] clang-format the whole openmp project Same script as D95318. Test files are excluded. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D97088	2021-02-20 12:46:32 -05:00
Joel E. Denny	ef8b3b5ffd	[OpenMP] Fix nvptx CUDA_VERSION conversion As mentioned in PR#49250, without this patch, ptxas for CUDA 9.1 fails in the following two tests: - openmp/libomptarget/test/mapping/lambda_mapping.cpp - openmp/libomptarget/test/offloading/bug49021.cpp The error looks like: ``` ptxas /tmp/lambda_mapping-081ea9.s, line 828; error : Not a name of any known instruction: 'activemask' ``` The problem is that our cmake script converts CUDA version strings incorrectly: 9.1 becomes 9100, but it should be 9010, as shown in `getCudaVersion` in `clang/lib/Driver/ToolChains/Cuda.cpp`. Thus, `openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu` inadvertently enables `activemask` because it apparently becomes available in 9.2. This patch fixes the conversion. This patch does not fix the other two tests in PR#49250. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97012	2021-02-19 11:09:26 -05:00
Joel E. Denny	d2147b1a87	[OpenMP] Fix always,from and delete for data absent at exit Without this patch, there's a runtime error for those map types at exit from an "omp target data" or at "omp target exit data", but the spec says the list item should be ignored. This patch tests that fix in data_absent_at_exit.c, and it also improves other testing for data that is not fully present at exit. Reviewed By: grokos, RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D96999	2021-02-19 11:09:26 -05:00
Ron Lieberman	30c0d5b4c3	[OPENMP][AMDGCN] Improvements to print_kernel_trace (bit mask) allow bit masking to select various trace features. bit 0 => Launch tracing (stderr) bit 1 => timing of runtime (stdout) bit 2 => detailed launch tracing (stderr) bit 3 => timing goes to stdout instead of stderr example: LIBOMPTARGET_KERNEL_TRACE=7 does it all LIBOMPTARGET_KERNEL_TRACE=5 Launch + details LIBOMPTARGET_KERNEL_TRACE=2 timings + launch to stderr LIBOMPTARGET_KERNEL_TRACE=10 timings + launch to stdout Differential Revision: https://reviews.llvm.org/D96998	2021-02-19 06:47:22 -05:00
Shilei Tian	89827fd404	[OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1 CUDA 11.2 and CUDA 11.1 are all available now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97004	2021-02-18 21:04:39 -05:00
Jon Chesterfield	53d7fd3762	[libomptarget][amdgcn] Remove lookup of .language msgpack field	2021-02-17 23:02:16 +00:00
Alexey Bataev	60d71a286b	[OPENMP50]Allow overlapping mapping in target constructs. OpenMP 5.0 removed a lot of restriction for overlapped mapped items comparing to OpenMP 4.5. Patch restricts the checks for overlapped data mappings only for OpenMP 4.5 and less and reorders mapping of the arguments so, that present and alloc mappings are processed first and then all others. Differential Revision: https://reviews.llvm.org/D86119	2021-02-16 14:42:08 -08:00
Johannes Doerfert	2518cc65d2	[OpenMP][FIX] Avoid use of stack allocations in asynchronous calls As reported by Guilherme Valarini [0], we used to pass stack allocations to calls that can nowadays be asynchronous. This is arguably a problem and it will inevitably result in UB. To remedy the situation we allocate the locations as part of the AsyncInfoTy object. The lifetime of that object matches what we need for now. If the synchronization is not tied to the AsyncInfoTy object anymore we might need to have a different buffer construct in global space. This should be back-ported to LLVM 12 but needs slight modifications as it is based on refactoring patches we do not need to backport. [0] https://lists.llvm.org/pipermail/openmp-dev/2021-February/003867.html Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D96667	2021-02-16 15:38:11 -06:00
Johannes Doerfert	758b849931	[OpenMP] Unify omptarget API and usage wrt. `__tgt_async_info` This patch unifies our libomptarget API in two ways: - always pass a `__tgt_async_info` object, the Queue member decides if it is in use or not. - (almost) always synchronize in the interface layer and not in the omptarget layer. A side effect is that we now put all constructor and static initializer kernels in a stream too, if the device utilizes `__tgt_async_info`. The patch contains a TODO which can be addressed as we add support for asynchronous malloc and free in the plugin API. This is the only `synchronizeAsyncInfo` left in the omptarget layer. Site note: On a V100 system the GridMini performance for small sizes more than doubled. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96379	2021-02-16 15:38:06 -06:00
Johannes Doerfert	a2fc0d34db	[OpenMP] Move synchronization into `__tgt_async_info` The AsyncInfo should be passed everywhere and it should offer a way to ensure synchronization, given a libomptarget Device. This replaces D96431. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96438	2021-02-16 15:38:01 -06:00
Johannes Doerfert	942728763b	[OpenMP][NFC] Unify `target` API with other by passing a `__tgt_async_info` pointer Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96430	2021-02-16 15:37:56 -06:00
Johannes Doerfert	44f3022cdf	[OpenMP][NFC] Pass a DeviceTy, not the device number to `target` This unifies the API of `target` relative to `targetUpdateData` and such. Reviewed By: tianshilei1992, grokos Differential Revision: https://reviews.llvm.org/D96429	2021-02-16 15:37:51 -06:00
Johannes Doerfert	ea9395716e	[OpenMP][NFC] Clang format the libomptarget plugins Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96445	2021-02-16 15:37:46 -06:00
Johannes Doerfert	ad94fce845	[OpenMP][NFC] Eliminate sign comparison warning via explicit casts Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96812	2021-02-16 15:37:41 -06:00
Johannes Doerfert	9cd1e2228c	[OpenMP][NFC] Clang format libomptarget code (src & include) The struct and enum alignments are kept by disabling clang-format for that code region. Reviewed By: tianshilei1992, JonChesterfield, grokos Differential Revision: https://reviews.llvm.org/D96428	2021-02-16 15:37:35 -06:00
Jon Chesterfield	6f04addc8b	[libomptarget][amdgcn] Build amdgcn devicertl as openmp [libomptarget][amdgcn] Build amdgcn devicertl as openmp Change cmake to build as openmp and fix up some minor errors in the code. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D96533	2021-02-12 09:51:21 +00:00
Jon Chesterfield	56c446a878	[libomptarget][amdgcn] Tolerate deadstripped device_state variable [libomptarget][amdgcn] Tolerate deadstripped device_state variable The device_state variable may have been deadstripped. Similar to device_environment, leave detection of missing but used symbol to loader. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D96330	2021-02-09 16:29:53 +00:00
Jon Chesterfield	4756f76bce	[libomptarget][amdgcn] Tolerate deadstripped env variable [libomptarget][amdgcn] Tolerate deadstripped env variable Discovered by Pushpinder. If the device_environment variable is unused it can be deadstripped, in which case we should not abort due to it missing. This change is safe in that a missing symbol which is actually used can be reported by both linker and loader, and a missing unused symbol is better deadstripped than left in the image. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D96329	2021-02-09 11:58:37 +00:00
Jon Chesterfield	2fa4186d4e	[libomptarget][amdgcn] Fix language linkage post D95300, drop use of assert	2021-02-08 20:07:51 +00:00
Shilei Tian	b68a6b09e6	[OpenMP][libomptarget] Fixed an issue that device sync is skipped if the kernel doesn't have any argument Currently if there is not kernel argument, device synchronization will be skipped. This can lead to two issues: 1. If there is any device error, it will not be captured; 2. The target region might end before the kernel is done, which is not spec conformant. The test added in this patch only runs on NVPTX platform, although it will not be executed by Phab at all. It also requires `not` which is not available on most systems. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D96067	2021-02-04 20:14:24 -05:00
Shilei Tian	567b3f8841	[OpenMP][deviceRTLs] Drop `assert` in common parts of `deviceRTLs` The header `assert.h` needs to be included in order to use `assert` in the code. When building NVPTX `deviceRTLs` on a CUDA free system, it requires headers from `gcc-multilib`, which some systems don't have. This patch drops the use of `assert` in common parts of `deviceRTLs`. In light of `openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h`, a code block ``` if (!cond) __builtin_trap(); ``` is being used. The builtin will be translated to `call void @llvm.trap()`, and the corresponding PTX is `trap;`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95986	2021-02-04 12:39:43 -05:00
Shilei Tian	0f0ce3c12e	[OpenMP][NVPTX] Take functions in `deviceRTLs` as `convergent` OpenMP device compiler (similar to other SPMD compilers) assumes that functions are convergent by default to avoid invalid transformations, such as the bug (https://bugs.llvm.org/show_bug.cgi?id=49021). Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95971	2021-02-03 20:58:12 -05:00
Atmn Patel	b545667d0a	[OpenMP][Libomptarget] Remove possible harmful copy constructor call for RTLsTy From https://bugs.llvm.org/show_bug.cgi?id=48973, we know that `std::call_once(PM->RTLs.initFlag, &RTLsTy::LoadRTLs, PM->RTLs)` causes compile time problems in libstdc++v3 5.3.1. This is because there was a defect in the standard regarding the `call_once` (LWG 2442). This was fixed in libstdc++ soon thereafter, but there are likely other standard libraries where this will fail. By matching this function call with the other one, we fix this bug. Differential Revision: https://reviews.llvm.org/D95769	2021-02-01 20:13:03 -05:00
Joseph Huber	fda4853998	[OpenMP] Fix seg fault in libomptarget when using Info with multiple threads Summary: One option for the LIBOMPTARGET_INFO environment variable is to print the current status of the device's data mappings. These are a shared resource among threads so this needs to be protected when using multiple streams. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95786	2021-02-01 11:21:57 -05:00
Shilei Tian	26d38f6d20	[OpenMP][NVPTX] Refined CMake logic to choose compute capabilites This patch refines the logic to choose compute capabilites via the environment variable `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES`. It supports the following values (all case insensitive): - "all": Build `deviceRTLs` for all supported compute capabilites; - "auto": Only build for the compute capability auto detected. Note that this requires CUDA. If CUDA is not found, a CMake fatal error will be raised. - "xx,yy" or "xx;yy": Build for compute capabilities `xx` and `yy`. If `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES` is not set, it is equivalent to set it to `all`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95687	2021-01-30 15:14:48 -05:00
Shilei Tian	1b19c42302	[OpenMP][deviceRTLs] Separate declaration of target dependent functions from `target_impl.h` This patch created a new header file `target_interface.h` for declarations of all target dependent functions. All future targets can get things work by simply implementing all functions declared in the header and macros/data same as each `target_impl.h`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95300	2021-01-28 08:14:33 -05:00
Shilei Tian	5a64794bba	[OpenMP][NVPTX] Added the missing -O1 when building NVPTX bitcode libraries In the past `-O1` was used when building NVPTX bitcode libraries. After we switched to OpenMP, `-O1` was missing by mistake, leading to a huge performance regression. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95545	2021-01-28 08:13:38 -05:00
Shilei Tian	19248d30e4	[OpenMP][deviceRTLs] Added `[[clang::loader_uninitialized]]` explicitly `[[clang::loader_uninitialized]]` is in macro `SHARED` but it doesn't work for array like `parallelLevel`, so the variable will be zero initialized. There is also a similar issue for `omptarget_nvptx_device_State` which is in global address space. Its c'tor is also generated, which was not in the past when building the `deviceRTLs` with CUDA. In this patch, we added the attribute to the two variables explicitly. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95550	2021-01-28 08:12:49 -05:00
Vyacheslav Zakharin	0fc90873b2	[libomptarget][NFC] Link plugins with threads support library due to std::call_once usage. Differential Revision: https://reviews.llvm.org/D95572	2021-01-27 19:26:18 -08:00
Atmn Patel	8a77056256	[OpenMP][Libomptarget] Fix conditional in CMake for remote plugin The remote offloading plugin's CMakeLists was trying to build if its flag was enabled even if it didn't find gRPC/protobuf. The conditional was wrong, it's fixed by this. Differential Revision: https://reviews.llvm.org/D95574	2021-01-27 21:28:25 -05:00
Shilei Tian	fb12df4a8e	[OpenMP][NVPTX] Disable building NVPTX deviceRTL by default on a non-CUDA system D95466 dropped CUDA to build NVPTX deviceRTL and enabled it by default. However, the building requires some libraries that are not available on non-CUDA system by default, which could break the compilation. This patch disabled the build by default. It can be enabled with `LIBOMPTARGET_BUILD_NVPTX_BCLIB=ON`. Reviewed By: kparzysz Differential Revision: https://reviews.llvm.org/D95556	2021-01-27 17:06:14 -05:00
Giorgis Georgakoudis	1e59c1a898	[OpenMP][Libomptarget] Fix check-libomptarget The check-libomptarget fails when building with LLVM_ENABLE_PROJECTS. This is because test configuration misses the path to libomp.so and libLLVMSupport.so when time profiling is enabled (both libraries have the same path when building). This patch add the path to the configuration. Reviewed By: vzakhari Differential Revision: https://reviews.llvm.org/D95376	2021-01-27 06:46:40 -08:00
Shilei Tian	e7535f8fed	[OpenMP][NVPTX] Drop dependence on CUDA to build NVPTX `deviceRTLs` With D94745, we no longer use CUDA SDK to compile `deviceRTLs`. Therefore, many CMake code in the project is useless. This patch cleans up unnecessary code and also drops the requirement to build NVPTX `deviceRTLs`. CUDA detection is still being used however to determine whether we need to involve the tests. Auto detection of compute capability is enabled by default and can be disabled by setting CMake variable `LIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY=OFF`. If auto detection is enabled, and CUDA is also valid, it will only build the bitcode library for the detected version; otherwise, all variants supported will be generated. One drawback of this patch is, we now generate 96 variants of bitcode library, and totally 1485 files to be built with a clean build on a non-CUDA system. `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=""` can be used to disable building NVPTX `deviceRTLs`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95466	2021-01-26 20:21:36 -05:00
Jon Chesterfield	653655040f	[libomptarget][cuda] Handle missing _v2 symbols gracefully [libomptarget][cuda] Handle missing _v2 symbols gracefully Follow on from D95367. Dlsym the _v2 symbols if present, otherwise use the unsuffixed version. Builds a hashtable for the check, can revise for zero heap allocations later if necessary. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95415	2021-01-27 00:22:29 +00:00
Vyacheslav Zakharin	3caa2d3354	[libomptarget][NFC] Avoid gcc 5/6 issue with lambda captures. Differential Revision: https://reviews.llvm.org/D95486	2021-01-26 16:06:58 -08:00
Vyacheslav Zakharin	5f1d4d4779	[libomptarget][NFC] Use portable printf format specifiers. Differential Revision: https://reviews.llvm.org/D95476	2021-01-26 13:56:25 -08:00
Atmn Patel	810572cc96	[OpenMP][Libomptarget] Fix cmake error on remote plugin Requiring 3.15 causes a build breakage, I'm sure none of the contents actually require 3.15 or above. Differential Revision: https://reviews.llvm.org/D95474	2021-01-26 16:00:40 -05:00
Jon Chesterfield	7baff00eee	[libomptarget][cuda] Gracefully handle missing cuda library [libomptarget][cuda] Gracefully handle missing cuda library If using dynamic cuda, and it failed to load, it is not safe to call cuGetErrorString. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95412	2021-01-26 20:43:07 +00:00
Jon Chesterfield	fdeffd6fb0	[libomptarget][cuda] Only run tests when sure there is cuda available [libomptarget][cuda] Only run tests when sure there is cuda available Prior to D95155, building the cuda plugin implied cuda was installed locally. With that change, every machine can build a cuda plugin, but they won't all have cuda and/or an nvptx card installed locally. This change enables the nvptx tests when either: - libcuda is present - the user has forced use of the dlopen stub The default case when there is no cuda detected will no longer attempt to run the tests on nvptx hardware, as was the case before D95155. Reviewed By: jdoerfert, ronlieb Differential Revision: https://reviews.llvm.org/D95467	2021-01-26 20:41:06 +00:00
Atmn Patel	ec8f4a38c8	[OpenMP][Libomptarget] Introduce Remote Offloading Plugin This introduces a remote offloading plugin for libomptarget. This implementation relies on gRPC and protobuf, so this library will only build if both libraries are available on the system. The corresponding server is compiled to `openmp-offloading-server`. This is a large change, but the only way to split this up is into RTL/server but I fear that could introduce an inconsistency amongst them. Ideally, tests for this should be added to the current ones that but that is problematic for at least one reason. Given that libomptarget registers plugin on a first-come-first-serve basis, if we wanted to offload onto a local x86 through a different process, then we'd have to either re-order the plugin list in `rtl.cpp` (which is what I did locally for testing) or find a better solution for runtime plugin registration in libomptarget. Differential Revision: https://reviews.llvm.org/D95314	2021-01-26 15:33:38 -05:00
Atmn	683719bc0c	[OpenMP][Libomptarget] Introduce changes to support remote plugin In order to support remote execution, we need to be able to send the target binary description to the remote host for registration (and consequent deregistration). To support this, I added these two optional new functions to the plugin API: - `__tgt_rtl_register_lib` - `__tgt_rtl_unregister_lib` These functions will be called to properly manage the instance of libomptarget running on the remote host. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D93293	2021-01-26 14:19:27 -05:00
Jon Chesterfield	32cc5564e2	[libomptarget][devicertl][amdgpu] Fix build, variable renaming error	2021-01-26 19:05:21 +00:00
Shilei Tian	7c03f7d7d0	[OpenMP][deviceRTLs] Build the deviceRTLs with OpenMP instead of target dependent language From this patch (plus some landed patches), `deviceRTLs` is taken as a regular OpenMP program with just `declare target` regions. In this way, ideally, `deviceRTLs` can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics. Here're a list of changes in this patch. 1. For NVPTX, `DEVICE` is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove `DEVICE` or probably some other macros. 2. Shared variable is implemented with OpenMP allocator, which is defined in `allocator.h`. Again, this feature is not available on AMDGCN, so two macros are redefined properly. 3. CUDA header `cuda.h` is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation. 4. Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as `libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc`, such as `libomptarget-nvptx-cuda_80-sm_20.bc`. With this change, there are also multiple features to be expected in the near future: 1. CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version. 2. Atomic operations used in `deviceRTLs` can be replaced by `omp atomic` if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong. 3. Target specific parts will be wrapped into `declare variant` with `isa` selector if it can work properly. No target specific macro is needed anymore. 4. (Maybe more...) Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94745	2021-01-26 12:28:47 -05:00
George Rokos	94cf89d1c2	[libomptarget][NFC] Fixed obsolete function names in comments	2021-01-26 07:39:42 -08:00
Alexey Bataev	4a63e53373	[LIBOMPTARGET]FIX define declaration, NFC Fixed declaration of define by adding a comma symbol. Required to fix build without profiling.	2021-01-26 07:43:31 -05:00
Johannes Doerfert	8c7fdc4c61	[OpenMP] Add source location information to the libomptarget profile In much of the libomptarget interface we have an ident_t object now, if it is not null we can use it to improve the profile output. For now, we simply use the ident_t "source information string" as generated by the FE. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95282	2021-01-25 22:43:43 -06:00
Jon Chesterfield	357eea6e8b	Revert "[libomptarget][cuda] Gracefully handle missing cuda library" This reverts commit `fafd45c01f`.	2021-01-26 03:14:53 +00:00
Jon Chesterfield	fafd45c01f	[libomptarget][cuda] Gracefully handle missing cuda library [libomptarget][cuda] Gracefully handle missing cuda library If using dynamic cuda, and it failed to load, it is not safe to call cuGetErrorString. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95412	2021-01-26 02:54:00 +00:00
Shilei Tian	3333244d77	[OpenMP][deviceRTLs] Remove omp_is_initial_device `omp_is_initial_device` in device code was implemented as a builtin function in D38968 for a better performance. Therefore there is no chance that this function will be called to `deviceRTLs`. As we're moving to build `deviceRTLs` with OpenMP compiler, this function can lead to a compilation error. This patch just simply removes it. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95397	2021-01-25 18:34:23 -05:00
Shilei Tian	27cc4a8138	[OpenMP][NVPTX] Rewrite CUDA intrinsics with NVVM intrinsics This patch makes prep for dropping CUDA when compiling `deviceRTLs`. CUDA intrinsics are replaced by NVVM intrinsics which refers to code in `__clang_cuda_intrinsics.h`. We don't want to directly include it because in the near future we're going to switch to OpenMP and by then the header cannot be used anymore. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95327	2021-01-25 14:14:30 -05:00
Joseph Huber	93eef7d8e9	[OpenMP][NFC] Fix SourceInfo.h variable names Summary: Fix the names to use Pascal case to comply with the LLVM coding guidelines. `ident_t` is required for compatibility with the rest of libomp.	2021-01-25 12:43:34 -05:00
Jon Chesterfield	95f0d1edaf	[libomptarget] Compile with older cuda, revert D95274 [libomptarget] Compile with older cuda, revert D95274 Fixes regression reported in comments of D95274. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95367	2021-01-25 16:12:56 +00:00
Jon Chesterfield	e5e448aafa	[libomptarget][cuda] Fix build, change missed from D95274	2021-01-24 18:30:04 +00:00
Shilei Tian	cfd978d5d3	[OpenMP] Fixed test environment of `check-libomptarget-nvptx` D95161 removed the option `--libomptarget-nvptx-path`, which is used in the tests for `libomptarget-nvptx`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95293	2021-01-24 13:18:33 -05:00
Jon Chesterfield	c3074d48d3	[libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics [libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics Tested by diff of IR generated for target_impl.cu before and after. NFC. Part of removing deviceRTL build time dependency on cuda SDK. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95294	2021-01-24 10:59:15 +00:00
Jon Chesterfield	dc70c56be5	[libomptarget][amdgpu][nfc] Update comments [libomptarget][amdgpu][nfc] Update comments Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95295	2021-01-23 22:53:58 +00:00
Jon Chesterfield	78b0630b72	[libomptarget][cuda] Call v2 functions explicitly [libomptarget][cuda] Call v2 functions explicitly rtl.cpp calls functions like cuMemFree that are replaced by a macro in cuda.h with cuMemFree_v2. This patch changes the source to use the v2 names consistently. See also D95104, D95155 for the idea. Alternatives are to use a mixture, e.g. call the macro names and explictly dlopen the _v2 names, or to keep the current status where the symbols are replaced by macros in both files Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95274	2021-01-23 20:33:13 +00:00
Jon Chesterfield	47e95e87a3	[libomptarget] Build cuda plugin without cuda installed locally [libomptarget] Build cuda plugin without cuda installed locally Compiles a new file, `plugins/cuda/dynamic_cuda/cuda.cpp`, to an object file that exposes the same symbols that the plugin presently uses from libcuda. The object file contains dlopen of libcuda and cached dlsym calls. Also provides a cuda.h containing the subset that is used. This lets the cmake file choose between the system cuda and a dlopen shim, with no changes to rtl.cpp. The corresponding change to amdgpu is postponed until after a refactor of the plugin to reduce the size of the hsa.h stub required Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95155	2021-01-23 00:15:04 +00:00
Jon Chesterfield	9b19ecb8f1	[libomptarget][devicertl] Drop templated atomic functions [libomptarget][devicertl] Drop templated atomic functions The five __kmpc_atomic templates are instantiated a total of seven times. This change replaces the template with explictly typed functions, which have the same prototype for amdgcn and nvptx, and implements them with the same code presently in use. Rolls in the accepted but not yet landed D95085. The unsigned long long type can be replaced with uint64_t when replacing the cuda function. Until then, clang warns on casting a pointer to one to a pointer to the other. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95093	2021-01-22 14:48:22 +00:00
Joseph Huber	119a9ea13f	[OpenMP] Fix failing test due to change in offloading flags Summary: Prior to D91261 the information checked the OMP_MAP_TARGET_PARAM flag, change this as it has been removed. The INFO macro was changed to accept a flag as input to make conditionally printing information easier. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95133	2021-01-21 14:09:36 -05:00
Shilei Tian	48c54f0f62	[OpenMP][NVPTX] Added forward declaration for atomic operations Pretty similar to D95058, this patch added forward declaration for CUDA atomic functions. We already have definitions with right mangled names in internal CUDA headers so the forward declaration here can work properly. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D95085	2021-01-21 10:37:16 -05:00
Joseph Huber	e4eaf9d820	[OpenMP] Add support for mapping names in mapper API Summary: The custom mapper API did not previously support the mapping names added previously. This means they were not present if a user requested debugging information while using the mapper functions. This adds basic support for passing the mapped names to the runtime library. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D94806	2021-01-21 09:26:44 -05:00
Shilei Tian	33a5d212c6	[OpenMP][NVPTX] Added forward declaration to pave the way for building deviceRTLs with OpenMP Once we switch to build deviceRTLs with OpenMP, primitives and CUDA intrinsics cannot be used directly anymore because `__device__` is not recognized by OpenMP compiler. To avoid involving all CUDA internal headers we had in `clang`, we forward declared these functions. Eventually they will be transformed into right LLVM instrinsics. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95058	2021-01-20 15:56:02 -05:00
Jon Chesterfield	fbc1dcb946	[libomptarget][devicertl][nfc] Simplify target_atomic abstraction [libomptarget][devicertl][nfc] Simplify target_atomic abstraction Atomic functions were implemented as a shim around cuda's atomics, with amdgcn implementing those symbols as a shim around gcc style intrinsics. This patch folds target_atomic.h into target_impl.h and folds amdgcn. Further work is likely to be useful here, either changing to openmp's atomic interface or instantiating the templates on the few used types in order to move them into a cuda/c++ implementation file. This change is mostly to group the remaining uses of the cuda api under nvptx' target_impl abstraction. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95062	2021-01-20 19:50:50 +00:00
Jon Chesterfield	ea616f9026	[libomptarget][devicertl][nfc] Remove some cuda intrinsics, simplify [libomptarget][devicertl][nfc] Remove some cuda intrinsics, simplify Replace __popc, __ffs with clang intrinsics. Move kmpc_impl_min to only file that uses it and replace template with explictly typed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95060	2021-01-20 19:45:05 +00:00
Shilei Tian	fd70f70d1e	[OpenMP][NVPTX] Replaced CUDA builtin vars with LLVM intrinsics Replaced CUDA builtin vars with LLVM intrinsics such that we don't need definitions of those intrinsics. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95013	2021-01-20 12:02:06 -05:00
Jon Chesterfield	e069662deb	[libomptarget][devicertl] Wrap source in declare target pragmas [libomptarget][devicertl] Wrap source in declare target pragmas Factored out of D93135 / D94745. C++ and cuda ignore unknown pragmas so this is a NFC for the current implementation language. Removes noise from patches for building deviceRTL as openmp. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95048	2021-01-20 15:50:41 +00:00
Jon Chesterfield	214387c2c6	[libomptarget][nvptx] Reduce calls to cuda header [libomptarget][nvptx] Reduce calls to cuda header Remove use of clock_t in favour of a builtin. Drop a preprocessor branch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94731	2021-01-15 02:16:33 +00:00
Jon Chesterfield	6e7094c14b	[libomptarget][nvptx][nfc] Move target_impl functions out of header [libomptarget][nvptx][nfc] Move target_impl functions out of header This removes most of the differences between the two target_impl.h. Also change name mangling from C to C++ for __kmpc_impl_*_lock. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D94728	2021-01-15 00:19:48 +00:00
Shilei Tian	547b032ccc	[OpenMP] Remove omptarget-nvptx from deps as it is no longer a valid target `omptarget-nvptx` is still a dependence for `check-libomptarget-nvtpx` although it has been removed by D94573. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94725	2021-01-14 19:16:11 -05:00
Shilei Tian	64e9e9aeee	[OpenMP] Dropped unnecessary define when compiling deviceRTLs for NVPTX The comment said CUDA 9 header files use the `nv_weak` attribute which `clang` is not yet prepared to handle. It's three years ago and now things have changed. Based on my test, removing the definition doesn't have any problem on my machine with CUDA 11.1 installed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94700	2021-01-14 13:55:12 -05:00
Shilei Tian	763c1f9933	[OpenMP] Drop the static library libomptarget-nvptx For NVPTX target, OpenMP provides a static library `libomptarget-nvptx` built by NVCC, and another bitcode `libomptarget-nvptx-sm_{$sm}.bc` generated by Clang. When compiling an OpenMP program, the `.bc` file will be fed to `clang` in the second run on the program that compiles the target part. Then the generated PTX file will be fed to `ptxas` to generate the object file, and finally the driver invokes `nvlink` to generate the binary, where the static library will be appened to `nvlink`. One question is, why do we need two libraries? The only difference is, the static library contains `omp_data.cu` and the bitcode library doesn't. It's unclear why they were implemented in this way, but per D94565, there is no issue if we also include the file into the bitcode library. Therefore, we can safely drop the static library. This patch is about the change in OpenMP. The driver will be updated as well if this patch is accepted. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D94573	2021-01-14 13:34:25 -05:00
Jon Chesterfield	5d165f0b89	[libomptarget][amdgpu] Fix kernel launch tracing to match previous behavior Restore control of kernel launch tracing to be >= 1 as it was before export LIBOMPTARGET_KERNEL_TRACE=1 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94695	2021-01-14 18:13:22 +00:00
Jon Chesterfield	84e0b14a0a	[libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL [libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D94565	2021-01-13 03:51:11 +00:00
Shilei Tian	68ff52ffea	[OpenMP] Fixed the link error that cannot find static data member Constant static data member can be defined in the class without another define after the class in C++17. Although it is C++17, Clang can still handle it even w/o the flag for C++17. Unluckily, GCC cannot handle that. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D94541	2021-01-12 16:48:28 -05:00
Jon Chesterfield	33e2494bea	[libomptarget][amdgpu][nfc] Fix build on centos [libomptarget][amdgpu][nfc] Fix build on centos rtl.cpp replaced 224 with a #define from elf.h, but that doesn't work on a centos 7 build machine with an old elf.h Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D94528	2021-01-12 19:40:03 +00:00
Shilei Tian	bdd1ad5e5c	[OpenMP] Fixed include directories for OpenMP when building OpenMP with LLVM_ENABLE_RUNTIMES Some LLVM headers are generated by CMake. Before the installation, LLVM's headers are distributed everywhere, some of which are in `${LLVM_SRC_ROOT}/llvm/include/llvm`, and some are in `${LLVM_BINARY_ROOT}/include/llvm`. After intallation, they're all in `${LLVM_INSTALLATION_ROOT}/include/llvm`. OpenMP now depends on LLVM headers. Some headers depend on headers generated by CMake. When building OpenMP along with LLVM, a.k.a via `LLVM_ENABLE_RUNTIMES`, we need to tell OpenMP where it can find those headers, especially those still have not been copied/installed. Reviewed By: jdoerfert, jhuber6 Differential Revision: https://reviews.llvm.org/D94534	2021-01-12 14:32:38 -05:00
Shilei Tian	0871d6d516	[OpenMP] Move memory manager to plugin and make it a common interface The lifetime of `libomptarget` and its opened plugins are not aligned and it's hard for `libomptarget` to determine when the plugins are destroyed. As a result, some issues (see D94256 for details) occur on some platforms. Actually, if we take target memory as target resources, same as other resources, such as CUDA streams, in each plugin, then the memory manager should also be in the plugin. Also considering some platforms may want to opt out the feature, it makes sense to move the memory manager to plugin, make it a common interface, and let plguin developers determine whether they need it. This is what this patch does. CUDA plugin is taken as example to show how to integrate it. In this way, we can also get a bonus that different thresholds can be set for different platforms. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D94379	2021-01-11 21:33:42 -05:00
Shilei Tian	a81c68ae6b	[OpenMP] Take elf_common.c as a interface library For now `elf_common.c` is taken as a common part included into different plugin implementations directly via `#include "../../common/elf_common.c"`, which is not a best practice. Since it is simple enough such that we don't need to create a real library for it, we just take it as a interface library so that other targets can link it directly. Another advantage of this method is, we don't need to add the folder into header search path which can potentially pollute the search path. VE and AMD platforms have not been tested because I don't have target machines. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94443	2021-01-11 17:34:26 -05:00
Shilei Tian	175c336a1c	[OpenMP] Remove copy constructor of `RTLInfoTy` Multiple `RTLInfoTy` objects are stored in a list `AllRTLs`. Since `RTLInfoTy` contains a `std::mutex`, it is by default not a copyable object. In order to support `AllRTLs.push_back(...)` which is currently used, a customized copy constructor is provided. Every time we need to add a new data member into `RTLInfoTy`, we should keep in mind not forgetting to add corresponding assignment in the copy constructor. In fact, the only use of the copy constructor is to push the object into the list, we can of course write it in a way that first emplace a new object back, and then use the reference to the last element. In this way we don't need the copy constructor anymore. If the element is invalid, we just need to pop it, and that's what this patch does. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94361	2021-01-09 13:01:01 -05:00
Joseph Huber	2ce16810f2	[OpenMP] Always print error messages in libomptarget CUDA plugin Summary: Currently error messages from the CUDA plugins are only printed to the user if they have debugging enabled. Change this behaviour to always print the messages that result in offloading failure. This improves the error messages by indidcating what happened when the error occurs in the plugin library, such as a segmentation fault on the device. Reviewed by: jdoerfert Differential Revision: https://reviews.llvm.org/D94263	2021-01-07 17:47:32 -05:00
Shilei Tian	5acdae1f9a	[OpenMP] Fixed an issue that wrong LLVM headers might be included when building libomptarget Wrong LLVM headers might be included if we don't set `include_directories` to a right place. This will cause a compilation error if LLVM is installed in system directories. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93737	2021-01-06 17:07:36 -05:00
Shilei Tian	e2a623094f	[OpenMP] Fixed the test environment when building along with LLVM Currently all built libraries in OpenMP are anywhere if building along with LLVM. It is not an issue if we don't execute any test. However, almost all tests for `libomptarget` fails because in the lit configuration, we only set `<build_dir>/libomptarget` to `LD_LIBRARY_PATH` and `LIBRARY_PATH`. Since those libraries are everywhere, `clang` can no longer find `libomptarget.so` or those deviceRTLs anymore. In this patch, we set a unified path for all built libraries, no matter whether it is built along with LLVM or not. In this way, our lit configuration can work propoerly. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93736	2021-01-06 17:06:16 -05:00
George Rokos	dec02904d2	[libomptarget] Allow calls to omp_target_memcpy with 0 size. Differential Revision: https://reviews.llvm.org/D94095	2021-01-05 16:03:53 -08:00
Joseph Huber	fe5d51a489	[OpenMP] Add using bit flags to select Libomptarget Information Summary: This patch adds more fine-grained support over which information is output from the libomptarget runtime when run with the environment variable LIBOMPTARGET_INFO set. An extensible set of flags can be used to pick and choose which information the user is interested in. Reviewers: jdoerfert JonChesterfield grokos Differential Revision: https://reviews.llvm.org/D93727	2021-01-04 12:03:15 -05:00
Jon Chesterfield	76bfbb74d3	[libomptarget][amdgpu] Call into deviceRTL instead of ockl [libomptarget][amdgpu] Call into deviceRTL instead of ockl Amdgpu codegen presently emits a call into ockl. The same functionality is already present in the deviceRTL. Adds an amdgpu specific entry point to avoid the dependency. This lets simple openmp code (specifically, that which doesn't use libm) run without rocm device libraries installed. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D93356	2021-01-04 16:48:47 +00:00
Atmn	907886cc5b	[OpenMP][Libomptarget][NFC] Use CMake Variables This patchs adds CMake variables to add subdirectories and include directories for libomptarget and explicitly gives the location of source files. Differential Revision: https://reviews.llvm.org/D93290	2020-12-16 19:05:15 -05:00
Jon Chesterfield	b607837c75	[libomptarget][nfc] Replace static const with enum [libomptarget][nfc] Replace static const with enum Semantically identical. Replaces 0xff... with ~0 to spare counting the f. Has the advantage that the compiler doesn't need to prove the 4/8 byte value dead before discarding it, and sidesteps the compilation question associated with what static means for a single source language. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93328	2020-12-16 16:40:37 +00:00
Giorgis Georgakoudis	e007b32864	[OpenMP] Add time profiling for libomptarget Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93055	2020-12-11 18:53:37 -08:00
Jon Chesterfield	ce93de3bb2	[libomptarget][nfc] Remove data_sharing type aliasing [libomptarget][nfc] Remove data_sharing type aliasing Libomptarget previous used __kmpc_data_sharing_slot to access values of type __kmpc_data_sharing_{worker,master}_slot_static. This aliasing violation was benign in practice. The master type has since been removed, so a single type can be used instead. This is particularly helpful for the transition to an openmp deviceRTL, as the c++/openmp compiler for amdgcn currently rejects the flexible array member for being an incomplete type. Serves the same purpose as abandoned D86324. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93075	2020-12-11 02:13:34 +00:00
Jon Chesterfield	7c59614394	[libomptarget][amdgpu] clang-format src/rtl.cpp	2020-12-09 19:45:51 +00:00
Jon Chesterfield	c9bc414840	[libomptarget][amdgpu] Let default number of teams equal number of CUs	2020-12-09 19:35:34 +00:00
Jon Chesterfield	e191d31159	[libomptarget][amdgpu] Robust handling of device_environment symbol	2020-12-09 19:21:51 +00:00
Jon Chesterfield	cab9f69235	[libomptarget][amdgpu] Improve diagnostics on arch mismatch	2020-12-09 18:55:53 +00:00
Jon Chesterfield	71f4693020	[libomptarget][amdgpu] Add plumbing to call into hostrpc lib, if linked	2020-12-07 15:24:01 +00:00
Jon Chesterfield	e1b8e8a1f4	[libomptarget][amdgpu] Skip device_State allocation when using bss global	2020-12-06 12:13:56 +00:00
Jon Chesterfield	f628eef98a	[libomptarget][amdgpu] Fix latent race in load binary	2020-12-04 16:29:09 +00:00
Jon Chesterfield	ae9d96a656	[libomptarget][amdgpu] Address compiler warnings, drive by fixes [libomptarget][amdgpu] Address compiler warnings, drive by fixes Initialize some variables, remove unused ones. Changes the debug printing condition to align with the aomp test suite. Differential Revision: https://reviews.llvm.org/D92559	2020-12-03 11:09:12 +00:00
Pushpinder Singh	afc09c6fe4	[libomptarget][AMDGPU] Remove MaxParallelLevel Removes MaxParallelLevel references from rtl.cpp and drops resulting dead code. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D92463	2020-12-03 00:27:03 -05:00
Jon Chesterfield	89a0f48c58	[libomptarget][cuda] Detect missing symbols in plugin at build time [libomptarget][cuda] Detect missing symbols in plugin at build time Passes -z,defs to the linker. Error on unresolved symbol references. Otherwise, those unresolved symbols present as target code running on the host as the plugin fails to load. This is significantly harder to debug than a link time error. Flag matches that passed by amdgcn and ve plugins. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D92143	2020-11-27 15:39:41 +00:00
cchen	7036fe8a0c	[libomptarget] Add support for target update non-contiguous This patch is the runtime support for https://reviews.llvm.org/D84192. In order not to modify the tgt_target_data_update information but still be able to pass the extra information for non-contiguous map item (offset, count, and stride for each dimension), this patch overload arg when the maptype is set as OMP_TGT_MAPTYPE_DESCRIPTOR. The origin arg is for passing the pointer information, however, the overloaded arg is an array of descriptor_dim: ``` struct descriptor_dim { int64_t offset; int64_t count; int64_t stride }; ``` and the array size is the dimension size. In addition, since we have count and stride information in descriptor_dim, we can replace/overload the arg_size parameter by using dimension size. Reviewed By: grokos, tianshilei1992 Differential Revision: https://reviews.llvm.org/D82245	2020-11-19 11:33:27 -06:00
Joseph Huber	da8bec47ab	[OpenMP] Add Location Fields to Libomptarget Runtime for Debugging Summary: Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values. Reviewers: jdoerfert grokos Differential Revision: https://reviews.llvm.org/D87946	2020-11-19 12:01:53 -05:00
Joseph Huber	5378c6a4bf	[OpenMP] Add Support for Mapping Names in Libomptarget RTL Summary: This patch adds basic support for priting the source location and names for the mapped variables. This patch does not support names for custom mappers. This is based on D89802. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D90172	2020-11-18 16:01:59 -05:00
Joseph Huber	97e55cfef5	[OpenMP] Add Passing in Original Declaration Names To Mapper API Summary: This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;" Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D89802	2020-11-18 15:28:39 -05:00
Alexey Bataev	dcde6f17fd	Revert "[libomptarget] Add support for target update non-contiguous" This reverts commit `6847bcec1a`. It breaks the build of libomptarget.	2020-11-10 07:49:00 -08:00
cchen	6847bcec1a	[libomptarget] Add support for target update non-contiguous This patch is the runtime support for https://reviews.llvm.org/D84192. In order not to modify the tgt_target_data_update information but still be able to pass the extra information for non-contiguous map item (offset, count, and stride for each dimension), this patch overload arg when the maptype is set as OMP_TGT_MAPTYPE_DESCRIPTOR. The origin arg is for passing the pointer information, however, the overloaded arg is an array of descriptor_dim: ``` struct descriptor_dim { int64_t offset; int64_t count; int64_t stride }; ``` and the array size is the dimension size. In addition, since we have count and stride information in descriptor_dim, we can replace/overload the arg_size parameter by using dimension size. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D82245	2020-11-06 20:55:33 -06:00
Jon Chesterfield	93cbf622fc	[libomptarget][nfc] Build amdgcn deviceRTL with nogpulib	2020-11-04 11:29:22 +00:00
Shilei Tian	f5eebc25cc	[OpenMP] Fixed an issue in the test case parallel_offloading_map There is a non-conforming use of variable-sized array in the test case `parallel_offloading_map.c`. This patch fixed it. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D90642	2020-11-03 15:59:16 -05:00
Joachim Protze	71041a8b6b	[OpenMP][libomptarget][Tests] fix failing test D88149 updated `omp_get_initial_device` behavior to conform with OpenMP 5.1. omp_get_initial_device() == omp_get_num_devices()	2020-11-03 13:15:33 +01:00
Atmn Patel	a95b25b29e	[Libomptarget][NFC] Move global Libomptarget state to a struct Presently, there a number of global variables in libomptarget (devices, RTLs, tables, mutexes, etc.) that are not placed within a struct. This patch places them into a struct ``PluginManager``. All of the functions that act on this data remain free. Differential Revision: https://reviews.llvm.org/D90519	2020-11-03 00:10:18 -05:00
Jon Chesterfield	dee7704829	[AMDGPU] Add __builtin_amdgcn_grid_size [AMDGPU] Add __builtin_amdgcn_grid_size Similar to D76772, loads the data from the dispatch pointer. Marked invariant. Patch also updates the openmp devicertl to use this builtin. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D90251	2020-10-29 16:25:13 +00:00
Benjamin Kramer	207cf71fa9	Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API" This reverts commit `d981c7b758` and `a87d7b3d44`. Test fails under msan.	2020-10-28 13:58:14 +01:00
Joseph Huber	d981c7b758	[OpenMP] Add Support for Mapping Names in Libomptarget RTL Summary: This patch adds basic support for priting the source location and names for the mapped variables. This patch does not support names for custom mappers. This is based on D89802. The names information currently will be printed out only in debug mode or using env LIBOMPTARGET_INFO during execution. But the information is added when availible to the Device and Private data structures. To get the information out the code must be built with debug symbols on using -g or -Rpass=openmp-opt Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D90172	2020-10-27 16:53:05 -04:00
Joseph Huber	a87d7b3d44	[OpenMP] Add Passing in Original Declaration Names To Mapper API Summary: This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;". See clang/test/OpenMP/target_map_names.cpp for an example of the generated output for a given map clause. Reviewers: jdoervert Differential Revision: https://reviews.llvm.org/D89802	2020-10-27 16:09:19 -04:00
Shilei Tian	e20d64c3d9	[Clang][OpenMP] Fixed an issue of segment fault when using target nowait The implementation of target nowait just wraps the target region into a task. The essential four parameters (base ptr, ptr, size, mapper) are taken as firstprivate such that they will be copied to the private location. When there is no user-defined mapper, the mapper variable will be nullptr. However, it will be still copied to the corresponding place. Therefore, a memcpy will be generated and the source pointer will be nullptr, causing a segmentation fault. The root cause is when calling `emitOffloadingArraysArgument`, the last argument `Options` has a field about whether it requires a task. It only takes depend clause into account. In this patch, the nowait clause is also included. There're two things that will be done in another patches: 1. target data nowait has not been supported yet. D90099 added the support. 2. When there is no mapper, the mapper array can be nullptr no matter whether it requires outer task or not. It can avoid an unnecessary data copy. This is an optimization that is covered in D90101. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89844	2020-10-26 22:33:22 -04:00
Shilei Tian	3091ed099f	[OpenMP] Fixed a potential integer overflow `size_t` has different width on 32- and 64-bit architecture, but the computation to floor to power of two assumed it is 64-bit, which can cause an integer overflow. In this patch, architecture detection is added so that the operation for 64-bit `size_t`. Thank Luke for reporting the issue. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89878	2020-10-22 21:22:19 -04:00
Jon Chesterfield	26790ed248	[libomptarget] Require LLVM source tree to build libomptarget [libomptarget] Require LLVM source tree to build libomptarget This is to permit reliably #including files from the LLVM tree in libomptarget, as an improvement on the copy and paste that is currently in use. See D87841 for the first example of removing duplication given this new requirement. The weekly openmp dev call reached consensus on this approach. See also D87841 for some alternatives that were considered. In the future, we may want to introduce a new top level repo for shared constants, or start using the ADT library within openmp. This will break sufficiently exotic build systems, trivial fixes as below. Building libomptarget as part of the monorepo will continue to work. If openmp is built separately, it now requires a cmake macro indicating where to find the LLVM source tree. If openmp is built separately, without the llvm source tree already on disk, the build machine will need a copy of a subset of the llvm source tree and the cmake macro indicating where it is. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D89426	2020-10-21 18:53:00 +01:00
JonChesterfield	55dc123555	[libomptarget][amdgcn] Refactor memcpy to eliminate maps [libomptarget][amdgcn] Refactor memcpy to eliminate maps Builds on D89776 to remove now dead code. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D89888	2020-10-21 16:59:33 +01:00
Pushpinder Singh	aa616efbb3	[libomptarget][AMDGPU][NFC] Split atmi_memcpy for h2d and d2h The calls to atmi_memcpy presently determine the direction of copy (host to device or device to host) by storing pointers in a map during malloc and looking up the pointers during memcpy. As each call site already knows the direction, this stash+lookup can be eliminated. This NFC will be followed by a functional one that deletes those map lookups. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D89776 Change-Id: I1d9089bc1e56b3a9a30e334735fa07dee1f84990	2020-10-20 06:29:32 -04:00
Jon Chesterfield	d27b39ce11	[libomptarget][amdgcn] Implement missing symbols in deviceRTL [libomptarget][amdgcn] Implement missing symbols in deviceRTL Malloc, wtime are stubs. Malloc needs a hostrpc implementation which is a work in progress, wtime needs some experimentation to find out the multiplier to get a time in seconds as documentation is scarce. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D89725	2020-10-20 00:24:15 +01:00
George Rokos	5adb3a6d86	[libomptarget] Fix copy-to motion for PTR_AND_OBJ entries where PTR is a struct member. This patch fixes a problem whereby the pointee object of a PTR_AND_OBJ entry with a `map(to)` motion clause can be overwritten on the device even if its reference counter is >=1. Currently, we check the reference counter of the parent struct in order to determine whether the motion clause should be respected, but since the pointee object is not part of the struct, it's got its own reference counter which should be used to enqueue the copy or discard it. The same behavior has already been implemented in targetDataEnd (omptarget.cpp:539-540), but we somehow missed doing the same in targetDataBegin. Differential Revision: https://reviews.llvm.org/D89597	2020-10-16 16:14:01 -07:00
JonChesterfield	7d2ecef5ed	[openmp][libomptarget] Include header from LLVM source tree [openmp][libomptarget] Include header from LLVM source tree The change is to the amdgpu plugin so is unlikely to break anything. The point of contention is whether libomptarget can depend on LLVM. A community discussion was cautiously not opposed yesterday. This introduces a compile time dependency on the LLVM source tree, in this case expressed as skipping the building of the plugin if LLVM_MAIN_INCLUDE_DIR is not set. One the source files will #include llvm/Frontend/OpenMP/OMPGridValues.h, instead of copy&pasting the numbers across. For users that download the monorepo, the llvm tree is already on disk. This will inconvenience users who download only the openmp source as a tar, as they would now also have to download (at least a file or two) from the llvm source, if they want to build the parts of the openmp project that (post this patch) depend on llvm. There was interest expressed in going further - using llvm tools as part of building libomp, or linking against llvm libraries. That seems less clear cut an improvement and worthy of further discussion. This patch seeks only to change policy to support openmp depending on the llvm source tree. Including in the other direction, or using libraries / tools etc, are purposefully out of scope. Reviewers are a best guess at interested parties, please feel free to add others Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D87841	2020-10-15 15:46:19 +01:00
JonChesterfield	8b6cd15242	[libomptarget][amdgcn] Implement partial barrier [libomptarget][amdgcn] Implement partial barrier named_sync is used to coordinate non-spmd kernels. This uses bar.sync on nvptx. There is no corresponding ISA support on amdgcn, so this is implemented using shared memory, one word initialized to zero. Each wave increments the variable by one. Whichever wave is last is responsible for resetting the variable to zero, at which point it and the others continue. The race condition on a wave reaching the barrier before another wave has noticed that it has been released is handled with a generation counter, packed into the same word. Uses a shared variable that is not needed on nvptx. Introduces a new hook, kmpc_impl_target_init, to allow different targets to do extra initialization. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D88602	2020-10-12 21:27:32 +01:00
Joseph Huber	d564409946	[OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default Summary: This patch changes the CMake files for Clang and Libomptarget to query the system for its supported CUDA architecture. This makes it much easier for the user to build optimal code without needing to set the flags manually. This relies on the now deprecated FindCUDA method in CMake, but full support for architecture detection is only availible in CMake >3.18 Reviewers: jdoerfert ye-luo Subscribers: cfe-commits guansong mgorny openmp-commits sstefan1 yaxunl Tags: #clang #OpenMP Differential Revision: https://reviews.llvm.org/D87946	2020-10-08 12:09:34 -04:00
Pushpinder Singh	3a12ff0dac	[OpenMP][RTL] Remove dead code RequiresDataSharing was always 0, resulting dead code in device runtime library. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D88829	2020-10-06 05:43:47 -04:00
Joachim Protze	55cff5b288	[OpenMP][libomptarget] make omp_get_initial_device 5.1 compliant OpenMP 5.1 defines omp_get_initial_device to return the same value as omp_get_num_devices. Since this change is also 5.0 compliant, no versioning is needed. Differential Revision: https://reviews.llvm.org/D88149	2020-10-01 00:51:11 +02:00
JonChesterfield	d256797c90	[nfc][libomptarget] Drop parameter to named_sync [nfc][libomptarget] Drop parameter to named_sync named_sync has one call site (in sync.cu) where it always passed L1_BARRIER. Folding this into the call site and dropping the macro is a simplification. amdgpu doesn't have ptx' bar.sync instruction. A correct implementation of __kmpc_impl_named_sync in terms of shared memory is much easier if it can assume that the barrier argument is this constant. Said implementation is left for a second patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D88474	2020-09-29 23:12:21 +01:00
Manoel Roemmer	c816ee13ad	[OpenMP][VE plugin] Fixing failure to build VE plugin with consolidated error handling in libomptarget The libomptarget VE plugin [[ http://lab.llvm.org:8014/builders/clang-ve-ninja/builds/8937/steps/build-unified-tree/logs/stdio \| fails zu build ]] after `ae95ceeb8f` . Differential Revision: https://reviews.llvm.org/D88476	2020-09-29 17:38:01 +02:00
Ye Luo	ffd159d8e9	[OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL It allows customizing MAX_SM for non-flagship GPU and reduces graphic memory usage. In addition, so far the size is hard-coded up to __CUDA_ARCH__ 700 and is already a hassle for 800. Introduce MAX_SM for 800 and protect future arch Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D88185	2020-09-24 12:39:59 -04:00
Ye Luo	03111e5e7a	[OpenMP] Protect unrecogonized CUDA error code If an error code can not be recognized by cuGetErrorString, errStr remains null and causes crashing at DP() printing. Protect this case. Reviewed By: jhuber6, tianshilei1992 Differential Revision: https://reviews.llvm.org/D87980	2020-09-21 13:43:08 -04:00
JonChesterfield	a9be2b5cb2	[libomptarget] Disable build of amdgpu plugin as it doesn't build with rocm.	2020-09-18 18:10:27 +01:00
Joseph Huber	c3e6054b07	[OpenMP] Additional Information for Libomptarget Mappings Summary: This patch adds additonal support for priting infromation from Libomptarget for already existing maps and printing the final data mapped on the device at device destruction. Reviewers: jdoerfort gkistanova Subscribers: guansong openmp-commits sstefan1 yaxunl Tags: #OpenMP Differential Revision: https://reviews.llvm.org/D87722	2020-09-15 18:12:57 -04:00
Raul Tambre	c42f96cb23	[CMake][OpenMP] Simplify getting CUDA library directory LLVM now requires CMake 3.13.4 so we can simplify this. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D87195	2020-09-11 21:19:11 +03:00
Joseph Huber	ae209397b1	[OpenMP] Begin Printing Information Dumps In Libomptarget and Plugins Summary: This patch starts adding support for adding information dumps to libomptarget and rtl plugins. The information printing is controlled by the LIBOMPTARGET_INFO environment variable introduced in D86483. The goal of this patch is to provide the user with additional information about the device during kernel execution and providing the user with information dumps in the case of failure. This patch added the ability to dump the pointer mapping table as well as printing the number of blocks and threads in the cuda RTL. Reviewers: jdoerfort gkistanova ye-luo Subscribers: guansong openmp-commits sstefan1 yaxunl ye-luo Tags: #OpenMP Differential Revision: https://reviews.llvm.org/D87165	2020-09-09 12:03:56 -04:00
Pushpinder Singh	7634c64b61	[OpenMP][AMDGPU] Use DS_Max_Warp_Number instead of WARPSIZE The size of worker_rootS should have been DS_Max_Warp_Number. This reduces memory usage by deviceRTL on AMDGPU from around 2.3GB to around 770MB. Reviewed By: JonChesterfield, jdoerfert Differential Revision: https://reviews.llvm.org/D87084	2020-09-07 05:15:21 -04:00
Joseph Huber	ae95ceeb8f	[OpenMP] Consolidate error handling and debug messages in Libomptarget Summary: This patch consolidates the error handling and messaging routines to a single file omptargetmessage. The goal is to simplify the error handling interface prior to adding more error handling support Reviewers: jdoerfert grokos ABataev AndreyChurbanov ronlieb JonChesterfield ye-luo tianshilei1992 Subscribers: danielkiss guansong jvesely kerbowa nhaehnle openmp-commits sstefan1 yaxunl	2020-09-01 15:28:19 -04:00
Alexey Bataev	6aa7228a62	[LIBOMPTARGET]Do not try to optimize bases for the next parameters. PrivateArgumentManager shall immediately allocate firstprivates if they are bases for the next parameters and the next paramaters rely on the fact that the base musst be allocated already. Differential Revision: https://reviews.llvm.org/D86781	2020-08-28 15:46:31 -04:00
Shilei Tian	46e0ced762	[OpenMP] Fixed wrong test command in the test private_mapping.c The test command in `private_mapping.c` was set to expect failure by mistake. It is fixed in this patch. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D86758	2020-08-28 12:19:46 -04:00
Joseph Huber	7a5a74ea96	[OpenMP] Always emit debug messages that indicate offloading failure Summary: This patch changes the libomptarget runtime to always emit debug messages that occur before offloading failure. The goal is to provide users with information about why their application failed in the target region rather than a single failure message. This is only done in regions that precede offloading failure so this should not impact runtime performance. if the debug environment variable is set then the message is forwarded to the debug output as usual. A new environment variable was added for future use but does nothing in this current patch. LIBOMPTARGET_INFO will be used to report runtime information to the user if requrested, such as grid size, SPMD usage, or data mapping. It will take an integer indicating the level of information verbosity and a value of 0 will disable it. Reviewers: jdoerfort Subscribers: guansong sstefan1 yaxunl ye-luo Tags: #OpenMP Differential Revision: https://reviews.llvm.org/D86483	2020-08-26 19:30:41 -04:00
JonChesterfield	5d989fb37d	[libomptarget][amdgpu] Improve thread safety, remove dead code	2020-08-26 22:04:03 +01:00
Jon Chesterfield	28fbf422f2	[libomptarget][amdgpu] Update plugin CMake to work with latest rocr library	2020-08-26 20:01:42 +01:00
Shilei Tian	0775c1dfbc	[OpenMP] Pack first-private arguments to improve efficiency of data transfer In this patch, we pack all small first-private arguments, allocate and transfer them all at once to reduce the number of data transfer which is very expensive. Let's take the test case as example. ``` int main() { int data1[3] = {1}, data2[3] = {2}, data3[3] = {3}; int sum[16] = {0}; #pragma omp target teams distribute parallel for map(tofrom: sum) firstprivate(data1, data2, data3) for (int i = 0; i < 16; ++i) { for (int j = 0; j < 3; ++j) { sum[i] += data1[j]; sum[i] += data2[j]; sum[i] += data3[j]; } } } ``` Here `data1`, `data2`, and `data3` are three first-private arguments of the target region. In the previous `libomptarget`, it called data allocation and data transfer three times, each of which allocated and transferred 12 bytes. With this patch, it only calls allocation and transfer once. The size is `(12+4)3=48` where 12 is the size of each array and 4 is the padding to keep the address aligned with 8. It is implemented in this way: 1. First collect all information for those first*-private arguments. _private_ arguments are not the case because private arguments don't need to be mapped to target device. It just needs a data allocation. With the patch for memory manager, the data allocation could be very cheap, especially for the small size. For each qualified argument, push a place holder pointer `nullptr` to the `vector` for kernel arguments, and we will update them later. 2. After we have all information, create a buffer that can accommodate all arguments plus their paddings. Copy the arguments to the buffer at the right place, i.e. aligned address. 3. Allocate a target memory with the same size as the host buffer, transfer the host buffer to target device, and finally update all place holder pointers in the arguments `vector`. The reason we only consider small arguments is, the data transfer is asynchronous. Therefore, for the large argument, we could continue to do things on the host side meanwhile, hopefully, the data is also being transferred. The "small" is defined by that the argument size is less than a predefined value. Currently it is 1024. I'm not sure whether it is a good one, and that is an open question. Another question is, do we need to make it configurable via an environment variable? Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D86307	2020-08-25 16:06:29 -04:00
Shilei Tian	f93b42a629	[NFC][OpenMP] Remove outdated comments about potential issues The issue mentioned has been fixed in D84996	2020-08-24 01:21:06 +00:00
Shilei Tian	0289696751	[OpenMP] Introduce target memory manager Target memory manager is introduced in this patch which aims to manage target memory such that they will not be freed immediately when they are not used because the overhead of memory allocation and free is very large. For CUDA device, cuMemFree even blocks the context switch on device which affects concurrent kernel execution. The memory manager can be taken as a memory pool. It divides the pool into multiple buckets according to the size such that memory allocation/free distributed to different buckets will not affect each other. In this version, we use the exact-equality policy to find a free buffer. This is an open question: will best-fit work better here? IMO, best-fit is not good for target memory management because computation on GPU usually requires GBs of data. Best-fit might lead to a serious waste. For example, there is a free buffer of size 1960MB, and now we need a buffer of size 1200MB. If best-fit, the free buffer will be returned, leading to a 760MB waste. The allocation will happen when there is no free memory left, and the memory free on device will take place in the following two cases: 1. The program ends. Obviously. However, there is a little problem that plugin library is destroyed before the memory manager is destroyed, leading to a fact that the call to target plugin will not succeed. 2. Device is out of memory when we request a new memory. The manager will walk through all free buffers from the bucket with largest base size, pick up one buffer, free it, and try to allocate immediately. If it succeeds, it will return right away rather than freeing all buffers in free list. Update: A threshold (8KB by default) is set such that users could control what size of memory will be managed by the manager. It can also be configured by an environment variable `LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD`. Reviewed By: jdoerfert, ye-luo, JonChesterfield Differential Revision: https://reviews.llvm.org/D81054	2020-08-19 23:12:23 -04:00
Shilei Tian	83c3d07994	[OpenMP] Refactored the function `DeviceTy::data_exchange` This patch contains the following changes: 1. Renamed the function `DeviceTy::data_exchange` to `DeviceTy::dataExchange`; 2. Changed the second argument `DeviceTy DstDev` to `DeviceTy &DstDev`; 3. Renamed the last argument. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D86238	2020-08-19 16:08:14 -04:00
Jon Chesterfield	6e1b11087f	[libomptarget][amdgpu] Support building with static rocm libraries	2020-08-19 15:44:30 +01:00
George Rokos	32ebdc70f3	[libomptarget][NFC] Sort list of plugins in chronological order Differential Revision: https://reviews.llvm.org/D86082	2020-08-17 08:33:36 -07:00
Johannes Doerfert	5272d29e2c	[OpenMP][CUDA] Keep one kernel list per device, not globally. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D86039	2020-08-16 14:38:35 -05:00
Johannes Doerfert	aa27cfc1e7	[OpenMP][CUDA] Cache the maximal number of threads per block (per kernel) Instead of calling `cuFuncGetAttribute` with `CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation, we can do it for the first one and cache the result as part of the `KernelInfo` struct. The only functional change is that we now expect `cuFuncGetAttribute` to succeed and otherwise propagate the error. Ignoring any error seems like a slippery slope... Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D86038	2020-08-16 14:38:33 -05:00
Jon Chesterfield	d0b312955f	[libomptarget] Implement host plugin for amdgpu [libomptarget] Implement host plugin for amdgpu Replacement for D71384. Primary difference is inlining the dependency on atmi followed by extensive simplification and bugfixes. This is the latest version from https://github.com/ROCm-Developer-Tools/amd-llvm-project/tree/aomp12 with minor patches and a rename from hsa to amdgpu, on the basis that this can't be used by other implementations of hsa without additional work. This will not build unless the ROCM_DIR variable is passed so won't break other builds. That variable is used to locate two amdgpu specific libraries that ship as part of rocm: libhsakmt at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface libhsa-runtime64 at https://github.com/RadeonOpenCompute/ROCR-Runtime These libraries build from source. The build scripts in those repos are for shared libraries, but can be adapted to statically link both into this plugin. There are caveats. - This works well enough to run various tests and benchmarks, and will be used to support the current clang bring up - It is adequately thread safe for the above but there will be races remaining - It is not stylistically correct for llvm, though has had clang-format run - It has suboptimal memory management and locking strategies - The debug printing / error handling is inconsistent I would like to contribute this pretty much as-is and then improve it in-tree. This would be advantagous because the aomp12 branch that was in use for fixing this codebase has just been joined with the amd internal rocm dev process. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85742	2020-08-15 23:58:28 +01:00
Joel E. Denny	518a27e559	[OpenMP] Fix ref count dec for implicit map of partial data D85342 broke this case. The new test case presents an example. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D85369	2020-08-06 11:39:29 -04:00
Joel E. Denny	8c8bb128df	[OpenMP] Fix `target data` exit for array extension For example: ``` #pragma omp target data map(tofrom:arr[0:100]) { #pragma omp target exit data map(delete:arr[0:100]) #pragma omp target enter data map(alloc:arr[98:2]) } ``` Without this patch, the transfer at the end of the target data region is broken and fails depending on the target device. According to my read of the spec, the transfer shouldn't even be attempted because `arr[0:100]` isn't (fully) present there. To fix that, this patch makes `DeviceTy::getTgtPtrBegin` return null for this case. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D85342	2020-08-05 16:51:25 -04:00
Joel E. Denny	41b1aefecb	[OpenMP] Fix `present` diagnostic for array extension For example, without this patch, the following fails as expected with or without the `present` modifier, but the `present` modifier doesn't produce its usual diagnostic: ``` #pragma omp target data map(alloc: arr[0:2]) { #pragma omp target map(present, tofrom: arr[0:100]) // not fully present ; } ``` Reviewed By: grokos, vzakhari Differential Revision: https://reviews.llvm.org/D85320	2020-08-05 16:51:24 -04:00
George Rokos	40470eb27a	[libomptarget][NFC] Replace `%ld` with PRId64 for data of type int64_t. The standard way of printing `int64_t` data is via the PRId64 macro, `ld` is for `long int` and int64_t is not guaranteed to be typedef'ed as `long int` on all platforms. E.g. on Windows we get mismatch warnings. Differential Revision: https://reviews.llvm.org/D85353	2020-08-05 13:28:35 -07:00
Alexey Bataev	6780d5675b	[LIBOMPTARGET]Fix order of mapper data for targetDataEnd function. targetDataMapper function fills arrays with the mapping data in the direct order. When this function is called by targetDataBegin or tgt_target_update functions, it works as expected. But targetDataEnd function processes mapped data in reverse order. In this case, the base pointer might be deleted before the associated data is deleted. Need to reverse data, mapped by mapper, too, since it always adds data that must be deleted at the end of the buffer. Fixes the test declare_mapper_target_update.cpp. Also, reduces the memry fragmentation by preallocation the memory buffers. Differential Revision: https://reviews.llvm.org/D85216	2020-08-05 13:42:24 -04:00
Joel E. Denny	5ab43989c3	[OpenMP] Fix `omp target update` for array extension OpenMP TR8 sec. 2.15.6 "target update Construct", p. 183, L3-4 states: > If the corresponding list item is not present in the device data > environment and there is no present modifier in the clause, then no > assignment occurs to or from the original list item. L10-11 states: > If a present modifier appears in the clause and the corresponding > list item is not present in the device data environment then an > error occurs and the program termintates. (OpenMP 5.0 also has the first passage but without mention of the present modifier of course.) In both passages, I assume "is not present" includes the case of partially but not entirely present. However, without this patch, the target update directive misbehaves in this case both with and without the present modifier. For example: ``` #pragma omp target enter data map(to:arr[0:3]) #pragma omp target update to(arr[0:5]) // might fail on data transfer #pragma omp target update to(present:arr[0:5]) // might fail on data transfer ``` The problem is that `DeviceTy::getTgtPtrBegin` does not return a null pointer in that case, so `target_data_update` sees the data as fully present, and the data transfer then might fail depending on the target device. However, without the present modifier, there should never be a failure. Moreover, with the present modifier, there should always be a failure, and the diagnostic should mention the present modifier. This patch fixes `DeviceTy::getTgtPtrBegin` to return null when `target_data_update` is the caller. I'm wondering if it should do the same for more callers. Reviewed By: grokos, jdoerfert Differential Revision: https://reviews.llvm.org/D85246	2020-08-05 10:03:31 -04:00
Joel E. Denny	002d61db2b	[OpenMP] Fix `present` for exit from `omp target data` Without this patch, the following example fails but shouldn't according to OpenMP TR8: ``` #pragma omp target enter data map(alloc:i) #pragma omp target data map(present, alloc: i) { #pragma omp target exit data map(delete:i) } // fails presence check here ``` OpenMP TR8 sec. 2.22.7.1 "map Clause", p. 321, L23-26 states: > If the map clause appears on a target, target data, target enter > data or target exit data construct with a present map-type-modifier > then on entry to the region if the corresponding list item does not > appear in the device data environment an error occurs and the > program terminates. There is no corresponding statement about the exit from a region. Thus, the `present` modifier should: 1. Check for presence upon entry into any region, including a `target exit data` region. This behavior is already implemented correctly. 2. Should not check for presence upon exit from any region, including a `target` or `target data` region. Without this patch, this behavior is not implemented correctly, breaking the above example. In the case of `target data`, this patch fixes the latter behavior by removing the `present` modifier from the map types Clang generates for the runtime call at the end of the region. In the case of `target`, we have not found a valid OpenMP program for which such a fix would matter. It appears that, if a program can guarantee that data is present at the beginning of a `target` region so that there's no error there, that data is also guaranteed to be present at the end. This patch adds a comment to the runtime to document this case. Reviewed By: grokos, RaviNarayanaswamy, ABataev Differential Revision: https://reviews.llvm.org/D84422	2020-08-05 10:03:31 -04:00
Shilei Tian	f2400f024d	[OpenMP] Fixed the issue that target memory deallocation might be called when they're being used This patch fixed the issue that target memory might be deallocated when they're still being used or before they're used. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84996	2020-07-31 18:54:18 -04:00
Shilei Tian	0f10165626	[OpenMP] Refactored the function `targetDataEnd` Refactored the function `targetDataEnd` to make preparation of fixing the issue of ahead-of-time target memory deallocation. This patch only renamed `targetDataEnd` related variables and functions to conform with LLVM code standard. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84991	2020-07-30 21:39:26 -04:00
Shilei Tian	8218eee269	[OpenMP] Refactored the function `target` Refactored the function `target` to make preparation for fixing the issue of ahead-of-time device memory deallocation. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84816	2020-07-30 21:05:55 -04:00
Alexey Bataev	622e46156d	[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region. Need to map the base pointer for all directives, not only target data-based ones. The base pointer is mapped for array sections, array subscript, array shaping and other array-like constructs with the base pointer. Also, codegen for use_device_ptr clause was modified to correctly handle mapping combination of array like constructs + use_device_ptr clause. The data for use_device_ptr clause is emitted as the last records in the data mapping array. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84767	2020-07-30 11:18:33 -04:00
Alexey Bataev	b69357c2f4	Revert "[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region." This reverts commit `142d0d3ed8` to investigate undefined behavior revealed by buildbots.	2020-07-30 10:57:56 -04:00
Alexey Bataev	142d0d3ed8	[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region. Need to map the base pointer for all directives, not only target data-based ones. The base pointer is mapped for array sections, array subscript, array shaping and other array-like constructs with the base pointer. Also, codegen for use_device_ptr clause was modified to correctly handle mapping combination of array like constructs + use_device_ptr clause. The data for use_device_ptr clause is emitted as the last records in the data mapping array. It applies only for global pointers. Differential Revision: https://reviews.llvm.org/D84767	2020-07-30 09:40:05 -04:00
Joel E. Denny	cee52dd026	[OpenMP] Implement TR8 `present` motion modifier in runtime (2/2) This patch implements OpenMP runtime support for the OpenMP TR8 `present` motion modifier for `omp target update` directives. The previous patch in this series implements Clang front end support. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D84712	2020-07-29 12:18:50 -04:00
Shilei Tian	30440924d4	[OpenMP] Replaced mutex lock/unlock in `target` with `std::lock_guard` Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84799	2020-07-28 20:31:40 -04:00
Joel E. Denny	65564e5eaf	Revert "[OpenMP] Implement TR8 `present` motion modifier in runtime (2/2)" This reverts commit `2cb926a447`. It depends on `3c3faae497`, which is being reverted.	2020-07-28 20:30:05 -04:00
Shilei Tian	3ce69d4d50	[NFC][OpenMP] Renamed all variable and function names in `target` to conform with LLVM code standard This patch only touched variables and functions in `target`. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84797	2020-07-28 20:11:09 -04:00
Joel E. Denny	2cb926a447	[OpenMP] Implement TR8 `present` motion modifier in runtime (2/2) This patch implements OpenMP runtime support for the OpenMP TR8 `present` motion modifier for `omp target update` directives. The previous patch in this series implements Clang front end support. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D84712	2020-07-28 19:15:18 -04:00
Joel E. Denny	9b4826d18b	[OpenMP] Fix libomptarget negative tests to expect abort On runtime failures, D83963 causes the runtime to abort instead of merely exiting with a non-zero value, but many tests in the libomptarget test suite still expect the former behavior. This patch updates the test suite and was discussed in post-commit comments on D83963 and D84557.	2020-07-28 09:02:16 -04:00
Joachim Protze	e2f5444c9c	[OpenMP][Tests] Enable nvptx64 testing for most libomptarget tests Also add $BUILD/lib to the LIBRARY_PATH to fix https://bugs.llvm.org/show_bug.cgi?id=46836. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D84557	2020-07-28 11:08:24 +02:00
Ye Luo	9323166601	[OpenMP] Add more pass-through functions in DeviceTy Summary: 1. Add DeviceTy::data_alloc, DeviceTy::data_delete, DeviceTy::data_alloc, DeviceTy::synchronize pass-through functions. Avoid directly accessing Device.RTL 2. Fix the type of the first argument of synchronize_ty in rth.h, device id is int32_t which is consistent with other functions. Reviewers: tianshilei1992, jdoerfert Reviewed By: tianshilei1992 Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D84487	2020-07-27 16:08:30 -04:00
Johannes Doerfert	9c87466c39	[OpenMP] Use `abort` not `error` for fatal runtime exceptions See PR46515 for the rational but generally, we want to really abort not gracefully shut down. Reviewed By: grokos, ABataev Differential Revision: https://reviews.llvm.org/D83963	2020-07-24 15:15:38 -05:00
Shilei Tian	c0185dc7df	Revert "[OpenMP] Wait for kernel prior to memory deallocation" This reverts commit `9b2832c089`.	2020-07-22 23:03:36 -04:00
Shilei Tian	9b2832c089	[OpenMP] Wait for kernel prior to memory deallocation Summary: In the function `target`, memory deallocation and `target_data_end` is called immediately returning from launching kernel. This might cause a race condition that the corresponding memory is still being used by the kernel and a potential issue that when the kernel starts to execute, its required data have already been deallocated, especially when multiple kernels running concurrently. Since nevertheless, we will block the thread issuing the target offloading at the end of the target, we just move the synchronization ahead a little bit to make sure the correctness. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D84381	2020-07-22 22:55:34 -04:00
Joel E. Denny	708752b2f6	[OpenMP] Implement TR8 `present` map type modifier in runtime (2/2) This implements OpenMP runtime support for the OpenMP TR8 `present` map type modifier. The previous patch in this series implements Clang front end support. See that patch summary for behaviors that are not yet supported. Reviewed By: grokos, jdoerfert Differential Revision: https://reviews.llvm.org/D83062	2020-07-22 14:04:58 -04:00
Joel E. Denny	fc247c8f3c	Revert "[OpenMP] Implement TR8 `present` map type modifier in runtime (2/2)" This reverts commit `45b8f7ec35`. It attempts to use debug macros `DPxMOD` and `DPxPTR` in release builds. Will fix and reapply later.	2020-07-22 11:22:08 -04:00
Joel E. Denny	45b8f7ec35	[OpenMP] Implement TR8 `present` map type modifier in runtime (2/2) This implements OpenMP runtime support for the OpenMP TR8 `present` map type modifier. The previous patch in this series implements Clang front end support. See that patch summary for behaviors that are not yet supported. Reviewed By: grokos, jdoerfert Differential Revision: https://reviews.llvm.org/D83062	2020-07-22 10:15:32 -04:00
Joachim Protze	ae31d7838c	[OpenMP][NFC] pass on env variables to libomptarget tests	2020-07-22 12:14:45 +02:00
George Rokos	140ab574a1	[OpenMP][Offload] Declare mapper runtime implementation Libomptarget patch adding runtime support for "declare mapper". Patch co-developed by Lingda Li and George Rokos. Differential revision: https://reviews.llvm.org/D68100	2020-07-15 18:11:43 -07:00
Johannes Doerfert	5937434677	[OpenMP] Silence unused symbol warning with proper ifdefs	2020-07-11 11:57:42 -05:00
Johannes Doerfert	c98699582a	[OpenMP][NFC] Remove unused (always fixed) arguments There are various runtime calls in the device runtime with unused, or always fixed, arguments. This is bad for all sorts of reasons. Clean up two before as we match them in OpenMPOpt now. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83268	2020-07-11 00:51:51 -05:00
Johannes Doerfert	cd0ea03e6f	[OpenMP][NFC] Remove unused and untested code from the device runtime Summary: We carried a lot of unused and untested code in the device runtime. Among other reasons, we are planning major rewrites for which reduced size is going to help a lot. The number of code lines reduced by 14%! Before: ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- CUDA 13 489 841 2454 C/C++ Header 14 322 493 1377 C 12 117 124 559 CMake 4 64 64 262 C++ 1 6 6 39 ------------------------------------------------------------------------------- SUM: 44 998 1528 4691 ------------------------------------------------------------------------------- After: ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- CUDA 13 366 733 1879 C/C++ Header 14 317 484 1293 C 12 117 124 559 CMake 4 64 64 262 C++ 1 6 6 39 ------------------------------------------------------------------------------- SUM: 44 870 1411 4032 ------------------------------------------------------------------------------- Reviewers: hfinkel, jhuber6, fghanim, JonChesterfield, grokos, AndreyChurbanov, ye-luo, tianshilei1992, ggeorgakoudis, Hahnfeld, ABataev, hbae, ronlieb, gregrodgers Subscribers: jvesely, yaxunl, bollu, guansong, jfb, sstefan1, aaron.ballman, openmp-commits, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D83349	2020-07-10 19:09:41 -05:00
Ye Luo	c5348aecd7	[OpenMP] Use primary context in CUDA plugin Summary: Retaining per device primary context is preferred to creating a context owned by the plugin. From CUDA documentation 1. Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used." from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html 2. Right under cuCtxCreate. In most cases it is recommended to use cuDevicePrimaryCtxRetain. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf 3. The primary context is unique per device and shared with the CUDA runtime API. These functions allow integration with other libraries using CUDA. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX Two issues are addressed by this patch: 1. Not using the primary context caused interoperability issue with libraries like cublas, cusolver. CUBLAS_STATUS_EXECUTION_FAILED and cudaErrorInvalidResourceHandle 2. On OLCF summit, "Error returned from cuCtxCreate" and "CUDA error is: invalid device ordinal" Regarding the flags of the primary context. If it is inactive, we set CU_CTX_SCHED_BLOCKING_SYNC. If it is already active, we respect the current flags. Reviewers: grokos, ABataev, jdoerfert, protze.joachim, AndreyChurbanov, Hahnfeld Reviewed By: jdoerfert Subscribers: openmp-commits, yaxunl, guansong, sstefan1, tianshilei1992 Tags: #openmp Differential Revision: https://reviews.llvm.org/D82718	2020-07-07 10:14:51 -04:00
Saiyedul Islam	38d6640ba5	[libomptarget] Implement atomic inc and fence functions for AMDGCN using clang builtins This function uses __builtin_amdgcn_atomic_inc32(): uint32_t atomicInc(uint32_t *address, uint32_t max); These functions use __builtin_amdgcn_fence(): __kmpc_impl_threadfence() __kmpc_impl_threadfence_block() __kmpc_impl_threadfence_system() They will take place of current mechanism of directly calling IR functions. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83132	2020-07-07 06:36:25 +00:00
Fangrui Song	6ba4380ed6	[libomptarget][test] Fix text relocations by adding -fPIC	2020-07-05 12:51:28 -07:00
Ye Luo	45bb073da8	[OpenMP] fix clang warning about printf format in CUDA plugin Summary: Warnings are printed by clang when building LIBOMPTARGET_ENABLE_DEBUG=ON due incorrect format string. Reviewers: tianshilei1992, jdoerfert Reviewed By: tianshilei1992 Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D82789	2020-06-29 22:35:39 -04:00
Ye Luo	6e5f64c44f	[OpenMP] Adopt std::set in HostDataToTargetMap Summary: lookupMapping took significant time due to linear complexity searching. This is bad for offloading from multiple host threads because lookupMapping is protected by mutex. Use std::set for logarithmic complexity searching. Before my change. libomptarget inclusive time 16.7 sec, exclusive time 8.6 sec. After the change libomptarget inclusive time 7.3 sec, exclusive time 0.4 sec. Most of the overhead of libomptarget (exclusive time) is gone. Reviewers: jdoerfert, grokos Reviewed By: grokos Subscribers: tianshilei1992, yaxunl, guansong, sstefan1 Tags: #openmp Differential Revision: https://reviews.llvm.org/D82264	2020-06-24 12:22:45 -04:00
Shilei Tian	aaf50adb53	Revert "[OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info" This reverts commit `ee1bf45e1d`.	2020-06-17 15:01:16 -04:00
Shilei Tian	ee1bf45e1d	[OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info DeviceID is added for some cases that we only have the __tgt_async_info but do not know its corresponding device id. However, to communicate with target plugins, we need that information. Event is added for another way to synchronize.	2020-06-17 14:29:09 -04:00
Shilei Tian	a014fbbc21	[OpenMP] Improve D2D memcpy to use more efficient driver API Summary: In current implementation, D2D memcpy is first to copy data back to host and then copy from host to device. This is very efficient if the device supports D2D memcpy, like CUDA. In this patch, D2D memcpy will first try to use native supported driver API. If it fails, fall back to original way. It is worth noting that D2D memcpy in this scenerio contains two ideas: - Same devices: this is the D2D memcpy in the CUDA context. - Different devices: this is the PeerToPeer memcpy in the CUDA context. My implementation merges this two parts. It chooses the best API according to the source device and destination device. Reviewers: jdoerfert, AndreyChurbanov, grokos Reviewed By: jdoerfert Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D80649	2020-06-04 16:59:06 -04:00
Manoel Roemmer	6b9e43c67e	[Openmp][VE] Libomptarget plugin for NEC SX-Aurora This patch adds a libomptarget plugin for the NEC SX-Aurora TSUBASA Vector Engine (VE target). The code is largely based on the existing generic-elf plugin and uses the NEC VEO and VEOSINFO libraries for offloading. Differential Revision: https://reviews.llvm.org/D76843	2020-05-12 10:47:30 +02:00
Joel E. Denny	dd5ba4b585	[OpenMP][NFC] Fix `not` sustitution in tests D78566 introduced a `\bnot\b` lit substitution in OpenMP test suites. However, that would corrupt a command like `FileCheck -implicit-check-not` or any file name like `%t.not`. We could use lookbehind/lookahead assertions to avoid such cases, but this patch switches to `%not` (suggested during the D78566 review) as a safer option. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D79529	2020-05-11 14:53:48 -04:00
Shilei Tian	cb038927ef	[OpenMP] Fix an issue of wrong return type of DeviceRTLTy::getNumOfDevices Summary: There is a typo in DeviceRTLTy::getNumOfDevices that the type of its return value is bool. It will lead to a problem of wrong device number returned from omp_get_num_devices. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D79255	2020-05-03 15:59:06 -04:00
Ron Lieberman	ee9c53d271	[libomptarget] Initialize reference parameter IsNew within Device::getOrAllocTgtPtr The two locals IsNew and Pointer_IsNew were uninitialized at declaration, and then passed by reference to Device.getOrAllocTgtPtr which in turn did not assign on all paths within the function. This resulted in occasional runtime failures in one application. Device::getOrAllocTgtPtr will now initialize IsNew to false on entry to function. Differential Revision: https://reviews.llvm.org/D78744	2020-04-24 15:33:37 -05:00
Joel E. Denny	5f6aa9680c	[OpenMP] target_data_begin: fail on device alloc fail Without this patch, target_data_begin continues after an illegal mapping or an out-of-memory error on the device. With this patch, it terminates the runtime with an error instead. The new test exercises only illegal mappings. I didn't think of a good way to exercise out-of-memory errors from the test suite. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D78170	2020-04-21 17:10:50 -04:00
Joel E. Denny	ba942610f6	[OpenMP] Add scaffolding for negative runtime tests Without this patch, the openmp project's test suites do not appear to have support for negative tests. However, D78170 needs to add a test that an expected runtime failure occurs. This patch makes `not` visible in all of the openmp project's test suites. In all but `libomptarget/test`, it should be possible for a test author to insert `not` before a use of the lit substitution for running a test program. In `libomptarget/test`, that substitution is target-specific, and its value is `echo` when the target is not available. In that case, inserting `not` before a lit substitution would expect an `echo` fail, so this patch instead defines a separate lit substitution for expected runtime fails. Reviewed By: jdoerfert, Hahnfeld Differential Revision: https://reviews.llvm.org/D78566	2020-04-21 17:10:50 -04:00
Shilei Tian	4031bb982b	[OpenMP] Refined CUDA plugin to put all CUDA operations into class Summary: Current implementation mixed everything up so that there is almost no encapsulation. In this patch, all CUDA related operations are put into a new class DeviceRTLTy and only necessary functions are exposed. In addition, all C++ code now conforms with LLVM code standard, keeping those API functions following C style. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: jfb, yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77951	2020-04-13 13:32:46 -04:00
Shilei Tian	feed674dec	[OpenMP] Introduce stream pool to make sure the correctness of device synchr... ...onization Summary: In previous patch, in order to optimize performance, we only synchronize once for each target region. The syncrhonization is via stream synchronization. However, in the extreme situation, the performce might be bad. Consider the following case: There is a task that requires transferring huge amount of data (call many times of data transferring function). It is scheduled to the first stream. And then we have 255 very light tasks scheduled to the remaining 255 streams (by default we have 256 streams). They can be finished before we do synchronization at the end of the first task. Next, we get another very huge task. It will be scheduled again to the first stream. Now the first task finishes its kernel launch and call stream synchronization. Right now, the stream already contains two kernels, and the synchronization will wait until the two kernels finish instead of just the first one for the first task. In this patch, we introduce stream pool. After each synchronization, the stream will be returned back to the pool to make sure that for each synchronization, only expected operations are waited. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: gregrodgers, yaxunl, lildmh, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77412	2020-04-11 07:08:56 -04:00
Shilei Tian	03ff643d2e	[OpenMP] Put old APIs back and added new _async series for backward compatibility Summary: According to comments on bi-weekly meeting, this patch put back old APIs and added new `_async` series Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77822	2020-04-09 22:40:58 -04:00
Shilei Tian	32ed29271f	[OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream Summary: This patch introduces two things for offloading: 1. Asynchronous data transferring: those functions are suffix with `_async`. They have one more argument compared with their synchronous counterparts: `__tgt_async_info`, which is a new struct that only has one field, `void Identifier`. This struct is for information exchange between different asynchronous operations. It can be used for stream selection, like in this case, or operation synchronization, which is also used. We may expect more usages in the future. 2. Optimization of stream selection for data mapping. Previous implementation was using asynchronous device memory transfer but synchronizing after each memory transfer. Actually, if we say kernel A needs four memory copy to device and two memory copy back to host, then we can schedule these seven operations (four H2D, two D2H, and one kernel launch) into a same stream and just need synchronization after memory copy from device to host. In this way, we can save a huge overhead compared with synchronization after each operation. Reviewers: jdoerfert, ye-luo Reviewed By: jdoerfert Subscribers: yaxunl, lildmh, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77005	2020-04-07 14:55:47 -04:00
Kazuaki Ishizaki	4201679110	[OpenMP] NFC: Fix trivial typo Differential Revision: https://reviews.llvm.org/D77430	2020-04-04 12:06:54 +09:00
JonChesterfield	09834f9761	[libomptarget][nfc] Move non-freestanding headers out of common Summary: [libomptarget][nfc] Move non-freestanding headers out of common Lowers the bar for building deviceRTL. Drops math.h entirely as it wasn't used and libm is a big dependency. Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77071	2020-03-31 23:43:18 +01:00
Jon Chesterfield	856c995436	[libomptarget] Add missing elf_end call in elf_common.c Summary: [libomptarget] Add missing elf_end call in elf_common.c Noticed when reviewing D76843. Reviewers: simoll, jdoerfert, efocht, AndreyChurbanov, grokos, manorom Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D76874	2020-03-26 19:07:33 +00:00
JonChesterfield	0813f41005	[libomptarget][nfc] Explicitly static function scope shared variables Summary: [libomptarget][nfc] Explicitly static function scope shared variables `__shared__` in CUDA implies static in function scope. See e.g. D.2.1.1 in CUDA_C_Programming_Guide.pdf, http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ This is surprising for non-cuda developers, see e.g. D73239 where I thought local variables would be thread local. Tested by IR diff of libomptarget.bc (no change), running in tree tests, and binary diff of the nvcc static archives (no significant change). Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D76713	2020-03-24 18:51:50 +00:00
JonChesterfield	298527587c	[libomptarget][nfc] Disable amdgcn rtl build. The cmake logic for finding llvm is misbehaving.	2020-03-21 00:01:03 +00:00
George Rokos	0a42c9bfe4	Enable CUDA offloading on aarch64 host Differential Revision: https://reviews.llvm.org/D76469	2020-03-20 15:38:47 -07:00
Tom Scogland	a23d7282ca	openmp: fix memcpy memory leak Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D72637	2020-03-12 23:24:16 -05:00
Alexey Bataev	c422d69b1a	[LIBOMPTARGET]Fix PR45139: Bug in mixing Python and OpenMP target offload. Summary: Explicitly initialize data members of RTLsTy class upon construction. Reviewers: grokos Subscribers: guansong, openmp-commits, caomhin, kkwli0 Tags: #openmp Differential Revision: https://reviews.llvm.org/D75946	2020-03-11 09:12:02 -04:00
Jon Chesterfield	221ada654b	[libomptarget] Implement locks for amdgcn Summary: [libomptarget] Implement locks for amdgcn The nvptx implementation deadlocks on amdgcn. atomic_cas with multiple active lanes can deadlock - if one lane succeeds, all the others are locked out. The set_lock implementation therefore runs on a single lane. Also uses a sleep intrinsic instead of the system clock for a probably minor performance improvement. The unset/test implementations may be revised later, based on code size / performance or similar concerns. This implements the lock at a per-wavefront scope. That's not strictly as specified, since openmp describes locks in terms of threads. I think the nvptx implementation provides true per-thread locking on volta and the same per-warp locking on other architectures. Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D75546	2020-03-05 20:25:31 +00:00
Jon Chesterfield	918a1065be	[libomptarget][nfc] Move GetWarp/LaneId functions into per arch code Summary: [libomptarget][nfc] Move GetWarp/LaneId functions into per arch code No code change for nvptx. Amdgcn currently has two implementations of GetLaneId, this patch keeps the one a colleague considered to be superior for our ISA. GetWarpId is currently the same function for amdgcn and nvptx, but I think it's cleaner to keep it grouped with all the others than to keep it in support.cu. Reviewers: jdoerfert, grokos, ABataev Reviewed By: jdoerfert Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D75587	2020-03-05 17:05:58 +00:00
Jon Chesterfield	84ac0dffd4	[libomptarget][nfc][amdgcn] Replace magic number with named intrinsic	2020-03-05 11:50:30 +00:00
Jon Chesterfield	133db44996	[libomptarget] Implement most hip atomic functions in terms of intrinsics Summary: [libomptarget] Implement hip atomic functions in terms of intrinsics All but atomicInc can be implemented using type generic clang intrinsics. There is not yet a corresponding intrinsic for atomicInc in clang, only one in LLVM. This patch leaves atomicInc as an unresolved symbol. Reviewers: jdoerfert, ABataev, hfinkel, grokos, arsenm Reviewed By: arsenm Subscribers: sri, saiislam, wdng, jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D73076	2020-03-04 17:56:40 +00:00
Jon Chesterfield	ad3d021b9e	[libomptarget][nfc][amdgcn] Simplify assert_fail implementation	2020-03-03 18:24:51 +00:00
Alexey Bataev	c4a9d976c1	[LIBOMPTARGET]Lower priority of global constructor/destructor to silence the warning from gcc. Summary: fixed the warning from gcc since prios 0-100 are reserved for the internal use. Reviewers: grokos Subscribers: kkwli0, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D75458	2020-03-02 15:15:11 -05:00
Alexey Bataev	63cef621f9	[LIBOMPTARGET]Fix PR44933: fix crash because of the too early deinitialization of libomptarget. Summary: Instead of using global variables with unpredicted time of deinitialization, use dynamically allocated variables with functions explicitly marked as global constructor/destructor and priority. This allows to prevent the crash because of the incorrect order of dynamic libraries deinitialization. Reviewers: grokos, hfinkel Subscribers: caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D74837	2020-02-25 15:54:37 -05:00
Alexey Bataev	578c13d13c	[OPENMP]Fix the test, NFC.	2020-02-13 10:40:06 -05:00
Ethan Stewart	190a11148b	Changed omp_get_max_threads() implementation to more closely match spec description. Summary: The 5.0 spec states, "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a num_threads clause were encountered after execution returns from this routine." The attached test shows Max Threads: 96, Num Threads: 128 without the proposed change. The number of threads should not exceed the (max) nthreads ICV, hence we should return the higher SPMD thread number even when omp_get_max_threads() is called in a generic kernel. This change does fail the api test, max_threads.c, because now it would return 64 instead of 32. Reviewers: jdoerfert, ABataev, grokos, JonChesterfield Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D74092	2020-02-12 23:29:34 +00:00
JonChesterfield	c2ce9ea4e3	[libomptarget][nfc] Change enum values to match those in cuda/rtl Summary: [libomptarget][nfc] Change enum values to match those in cuda/rtl support.h and cuda/rtl.cpp (and downsteam hsa/rtl.cpp) have enums for execution mode. These are actually independent - the numbers that used within support, or within the plugin, are never passed across the boundary. Nevertheless, trying to work out why the values are different between the two has generated a reasonable amount of confusion. This patch changes support to match the values in plugin, on the basis that the plugin also has some comments which I'd have to update if I changed that one instead. Credit to Ron for working through this in our own fork. See rocm-developer-tools/aomp/issues/7 for that earlier diagnostic write up. Also happy with generic = 0, spmd = 1 - provided it's the same in both places. Reviewers: jdoerfert, grokos, ABataev, ronlieb Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D74503	2020-02-12 23:27:08 +00:00
Johannes Doerfert	a5153dbc36	[OpenMP][Offloading] Added support for multiple streams so that multiple kernels can be executed concurrently Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D74145	2020-02-11 22:07:14 -06:00
Johannes Doerfert	3ff4e2eee8	[OpenMP] Switch default C++ standard to C++ 14 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D74258	2020-02-11 17:11:54 -06:00
Jonas Devlieghere	4fe839ef3a	[CMake] Rename EXCLUDE_FROM_ALL and make it an argument to add_lit_testsuite EXCLUDE_FROM_ALL means something else for add_lit_testsuite as it does for something like add_executable. Distinguish between the two by renaming the variable and making it an argument to add_lit_testsuite. Differential revision: https://reviews.llvm.org/D74168	2020-02-06 15:33:18 -08:00
Jon Chesterfield	6a82f0f0b9	[libomptarget] Implement wavefront functions for amdgcn Summary: [libomptarget] Implement wavefront functions for amdgcn Reviewers: jdoerfert, ABataev, grokos, arsenm Reviewed By: arsenm Subscribers: saiislam, wdng, arsenm, jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D73077	2020-02-04 21:55:29 +00:00
Jon Chesterfield	ab9762a9f5	Revert "[nfc][libomptarget] Remove SHARED annotation from local variables" This reverts commit `0e9374e374`. Revert D73239. It fails some local testing, cause presently unknown	2020-01-27 20:05:17 +00:00
Jon Chesterfield	0e9374e374	[nfc][libomptarget] Remove SHARED annotation from local variables Summary: [nfc][libomptarget] Remove SHARED annotation from local variables A few local variables in reduction.cu were marked SHARED. This patch leaves all per-kernel global state localised in omp_data.cu. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D73239	2020-01-23 00:00:23 +00:00
Alexey Bataev	9148b8b734	[OpenMP][Offloading] Fix the issue that omp_get_num_devices returns wrong number of devices, by Shiley Tian. Summary: This patch is to fix issue in the following simple case: #include <omp.h> #include <stdio.h> int main(int argc, char *argv[]) { int num = omp_get_num_devices(); printf("%d\n", num); return 0; } Currently it returns 0 even devices exist. Since this file doesn't contain any target region, the host entry is empty so further actions like initialization will not be proceeded, leading to wrong device number returned by runtime function call. Reviewers: jdoerfert, ABataev, protze.joachim Reviewed By: ABataev Subscribers: protze.joachim Tags: #openmp Differential Revision: https://reviews.llvm.org/D72576	2020-01-21 13:25:18 -05:00
Jon Chesterfield	03c2a59cd6	[libomptarget] Implement smid for amdgcn Summary: [libomptarget] Implement smid for amdgcn Implementation is in a new file as it uses an intrinsic with complicated encoding that warranted substantial comments. Reviewers: jdoerfert, grokos, ABataev, ronlieb Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72956	2020-01-20 14:52:17 +00:00
George Rokos	e244145ab0	[LIBOMPTARGET] Do not increment/decrement the refcount for "declare target" objects The reference counter for global objects marked with declare target is INF. This patch prevents the runtime from incrementing /decrementing INF refcounts. Without it, the map(delete: global_object) directive actually deallocates the global on the device. With this patch, such a directive becomes a no-op. Differential Revision: https://reviews.llvm.org/D72525	2020-01-14 16:30:38 -08:00
Jon Chesterfield	2a43688a0a	[nfc][libomptarget] Refactor nvptx/target_impl.cu Summary: [nfc][libomptarget] Refactor nxptx/target_impl.cu Use __kmpc_impl_atomic_add instead of atomicAdd to match the rest of the file. Alternatively, target_impl.cu could use the cuda functions directly. Using a mixture in this file was an oversight, happy to resolve in either direction. Removed some comments that look outdated. Call __kmpc_impl_unset_lock directly to avoid a redundant diagnostic and remove an implict dependency on interface.h. Reviewers: ABataev, grokos, jdoerfert Reviewed By: jdoerfert Subscribers: jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72719	2020-01-14 19:27:45 +00:00
Jon Chesterfield	2d287bec3c	[nfc][libomptarget] Refactor amdgcn target_impl Summary: [nfc][libomptarget] Refactor amdgcn target_impl Removes references to internal libraries from the header Standardises on C++ mangling for all the target_impl functions Update comment block clang-format Move some functions into a new target_impl.hip source file This lays the groundwork for implementing the remaining unresolved symbols in the target_impl.hip source. Reviewers: jdoerfert, grokos, ABataev, ronlieb Reviewed By: jdoerfert Subscribers: jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72712	2020-01-14 19:27:07 +00:00
Alexey Bataev	b19c0810e5	[LIBOMPTARGET]Ignore empty target descriptors. Summary: If the dynamically loaded module has been compiled with -fopenmp-targets and has no target regions, it has empty target descriptor. It leads to a crash at the runtime if another module has at least one target region and at least one entry in its descriptor. The runtime library is unable to load the empty binary descriptor and terminates the execution. Caused by a clang-offload-wrapper. Reviewers: grokos, jdoerfert Subscribers: caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72472	2020-01-10 09:45:27 -05:00
Kazuaki Ishizaki	4c6a098ad5	[OpenMP] NFC: Fix trivial typos in comments Reviewers: jdoerfert, Jim Reviewed By: Jim Subscribers: Jim, mgorny, guansong, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72285	2020-01-07 14:05:03 +08:00
Jon Chesterfield	bc48af8c57	[libomptarget][nfc] Change unintentional target_impl prefix to kmpc_impl	2019-12-30 20:50:23 +00:00
Jon Chesterfield	63e2aa5658	[libomptarget][nfc] Provide target_impl malloc/free Summary: [libomptarget][nfc] Provide target_impl malloc/free Sufficient to build support.cu for amdgcn Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71685	2019-12-19 16:54:28 +00:00
JonChesterfield	b40822fc14	[libomptarget][nvptx] Fix build, second symbol reordering	2019-12-19 02:02:44 +00:00
Jon Chesterfield	89a2bef27a	[libomptarget][nvptx] Fix build, symbol ordering in target_impl.h	2019-12-19 01:50:06 +00:00
JonChesterfield	9aefe5f65e	[libomptarget][amdgcn] Correct return type of extern __clock64 to unsigned	2019-12-19 00:11:21 +00:00
Jon Chesterfield	2caeaf2f45	[libomptarget][nfc] Introduce atomic wrapper function Summary: [libomptarget][nfc] Introduce atomic wrapper function Wraps atomic functions in a template prefixed __kmpc_atomic that dispatches to cuda or hip atomic functions. Intended to be easily extended to dispatch to OpenCL or C++ atomics for a third target. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: Anastasia, jvesely, mgrang, dexonsmith, llvm-commits, mgorny, jfb, openmp-commits Tags: #openmp, #llvm Differential Revision: https://reviews.llvm.org/D71404	2019-12-18 20:06:17 +00:00
JonChesterfield	8adae6027c	[libomptarget][nfc] Extract function from data_sharing, move to common Summary: [libomptarget][nfc] Extract function from data_sharing, move to common Finding the first active thread in the warp is different on nvptx and amdgcn, mostly due to warp size and the desire for efficiency. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71643	2019-12-18 19:39:35 +00:00
Alexey Bataev	15d47deedd	[LIBOPENMP][NVPTX]Fix the build error in the runtime.	2019-12-17 14:46:04 -05:00
JonChesterfield	0c83f8ccc7	[libomptarget][nfc] Move three files under common, build them for amdgcn Summary: [libomptarget][nfc] Move three files under common, build them for amdgcn Change to reduction.cu to remove two dead includes, otherwise no code change. Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71601	2019-12-17 18:02:49 +00:00
JonChesterfield	3d3e4076cd	[libomptarget][nfc] Move omp locks under target_impl Summary: [libomptarget][nfc] Move omp locks under target_impl These are likely to be target specific, even down to the lock_t which is correspondingly moved out of interface.h. The alternative is to include interface.h in target_impl which substantiatially increases the scope of those symbols. The current nvptx implementation deadlocks on amdgcn. The preferred implementation for that arch is still under discussion - this change leaves declarations in target_impl. The functions could be inline for nvptx. I'd prefer to keep the internals hidden in the target_impl translation unit, but will add the (possibly renamed) macros to target_impl.h if preferred. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71574	2019-12-17 12:18:57 +00:00
Jon Chesterfield	ce12a523b0	[libomptarget][nfc] Move timer functions behind target_impl Summary: [libomptarget][nfc] Move timer functions behind target_impl Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71584	2019-12-17 02:22:29 +00:00
Jon Chesterfield	53bcd1e141	[libomptarget][nfc] Wrap cuda min() in target_impl Summary: [libomptarget][nfc] Wrap cuda min() in target_impl nvptx forwards to cuda min, amdgcn implements directly. Sufficient to build parallel.cu for amdgcn, added to CMakeLists. All call sites are homogenous except one that passes a uint32_t and an int32_t. This could be smoothed over by taking two type parameters and some care over the return type, but overall I think the inline <uint32_t> calling attention to what was an implicit sign conversion is cleaner. Reviewers: ABataev, jdoerfert Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71580	2019-12-17 01:30:04 +00:00
JonChesterfield	69fcc6ecc1	Revert "Revert "[libomptarget] Move resource id functions into target specific code, implement for amdgcn"" Summary: This reverts commit `dd8a7fcdd7`. Alexey reports undefined symbols for the new inline functions defined in target_impl.h This does not reproduce for me for nvptx, or amdgcn, under release or debug builds. I believe the patch is fine, based on: - the semantics of an inline function in C++ (the cuda INLINE functions end up as linkonce_odr in IR), which are only legal to drop if they have no uses - the code generated from a debug build of clang 9 does not show these undef symbols - the tests pass - the code is trivial To progress from here I either need: - A tie break - someone to play the role of CI in determining whether the patch works - Alexey to provide sufficient information about his build for me to reproduce the failure - Alexey to debug why the symbols are disappearing for him and report back Reviewers: ABataev, jdoerfert, grokos Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71502	2019-12-16 16:16:14 +00:00
Alexey Bataev	dd8a7fcdd7	Revert "[libomptarget] Move resource id functions into target specific code, implement for amdgcn" This reverts commit `dbb3fec8ad` since it breaks the NVPTX tests.	2019-12-13 16:36:06 -05:00
Jon Chesterfield	40d72134fd	[libomptarget] Build most of common/src for amdgcn Summary: [libomptarget] Build most of common/src for amdgcn Excluding parallel.cu, which uses an integer min() from cuda, Excluding support.cu, which calls malloc that is not yet available for amdgcn Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: gregrodgers, ronlieb, jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71446	2019-12-13 17:48:19 +00:00
Jon Chesterfield	56adcebfda	[libomptarget][nfc] Add nop syncwarp function for amdgcn	2019-12-13 14:27:52 +00:00
Jon Chesterfield	479868646a	[libomptarget][nfc] Add declarations of atomic functions for amdgcn Summary: [libomptarget][nfc] Add declarations of atomic functions for amdgcn This enables building more source for amdgcn. The functions are usually available in a hip runtime header, but are duplicated here to decouple the implementation Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71412	2019-12-12 22:56:14 +00:00
Jon Chesterfield	dbb3fec8ad	[libomptarget] Move resource id functions into target specific code, implement for amdgcn Summary: [libomptarget] Move resource id functions into target specific code, implement for amdgcn Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71382	2019-12-12 22:49:02 +00:00
Jon Chesterfield	b399252028	[libomptarget][nfc] Add missing header for amdgcn/target_impl	2019-12-12 09:36:57 +00:00
JonChesterfield	0dd62c5c2e	[libomptarget][nfc] Move cuda threadfence functions behind kmpc_impl Summary: [libomptarget][nfc] Move cuda threadfence functions behind kmpc_impl Part of building code under common/ without requiring a cuda compiler Reviewers: ABataev, jdoerfert, grokos Reviewed By: ABataev Subscribers: jvesely, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71102	2019-12-06 15:41:18 +00:00
Jon Chesterfield	cd90f49d70	[libomptarget][nfc] Move three more files to common Summary: [libomptarget][nfc] Move three more files to common Reviewers: ABataev, jdoerfert, grokos Reviewed By: ABataev Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71103	2019-12-06 15:29:50 +00:00
Jon Chesterfield	4af84d2686	[libomptarget][nfc] Introduce SHARED, ALIGN macros Summary: [libomptarget][nfc] Introduce SHARED, ALIGN macros Move remaining cuda attributes behind such macros Reviewers: ABataev, jdoerfert, grokos Reviewed By: ABataev Subscribers: openmp-commits, jvesely Tags: #openmp Differential Revision: https://reviews.llvm.org/D71076	2019-12-05 21:57:58 +00:00
Jon Chesterfield	d0b9ed5c49	[libomptarget][nfc] Move omptarget-nvptx under common Summary: [libomptarget][nfc] Move omptarget-nvptx under common Almost all files depend on require omptarget-nvptx, which no longer contains any obviously architecture dependent code. Moving it under common unblocks task/loop for amdgcn, and allows moving other code. At some point there should probably be a widespread symbol renaming to replace the nvptx string. I'd prefer to get things working first. Building this (and task.cu, loop.cu) without a cuda library requires some more refactoring, e.g. wrap threadfence(), use DEVICE macro more consistently. Patches for that are orthogonal and will be posted shortly. Reviewers: jdoerfert, ABataev, grokos Reviewed By: ABataev Subscribers: mgorny, fedor.sergeev, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71073	2019-12-05 20:34:15 +00:00
JonChesterfield	3ada8d2a87	[libomptarget] Build a minimal deviceRTL for amdgcn Summary: [libomptarget] Build a minimal deviceRTL for amdgcn Repeat of D70414, with an include path fixed. Diff for sanity checking. The CMakeLists.txt file is functionally identical to the one used in the aomp fork. Whitespace changes were made based on nvptx/CMakeLists.txt, plus the copyright notice updated to match (Greg was the original author so would like his sign off on that here). This change will build a small subset of the deviceRTL if an appropriate toolchain is available, e.g. a local install of rocm. Support.h is moved from nvptx as a dependency of debug.h. Reviewers: ABataev, jdoerfert Reviewed By: ABataev Subscribers: jvesely, mgorny, jfb, openmp-commits, jdoerfert Tags: #openmp Differential Revision: https://reviews.llvm.org/D70971	2019-12-04 16:43:37 +00:00
Alexey Bataev	02b9c5d963	Revert "[libomptarget] Build a minimal deviceRTL for amdgcn" This reverts commit `877ffa716f` because it breaks the build.	2019-12-03 12:35:08 -05:00
Jon Chesterfield	877ffa716f	[libomptarget] Build a minimal deviceRTL for amdgcn Summary: [libomptarget] Build a minimal deviceRTL for amdgcn The CMakeLists.txt file is functionally identical to the one used in the aomp fork. Whitespace changes were made based on nvptx/CMakeLists.txt, plus the copyright notice updated to match (Greg was the original author so would like his sign off on that here). This change will build a small subset of the deviceRTL if an appropriate toolchain is available, e.g. a local install of rocm. Support.h is moved from nvptx as a dependency of debug.h. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Reviewed By: jdoerfert Subscribers: jfb, Hahnfeld, jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70414	2019-12-03 15:18:41 +00:00
Bryan Chan	4d3198e243	[OpenMP] build offload plugins before testing them Summary: "make check-all" or "make check-libomptarget" would attempt to run offloading tests before the offload plugins are built. This patch corrects that by adding dependencies to the libomptarget CMake rules. Reviewers: jdoerfert Subscribers: mgorny, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70803	2019-11-28 17:43:56 -05:00
JonChesterfield	a84b48d01e	[nfc][libomptarget] Remove casts of string literals to char*	2019-11-19 19:41:59 +00:00
JonChesterfield	4681e2e434	[nfc][libomptarget] Write amdgcn macros in terms of compiler intrinsics	2019-11-19 17:23:46 +00:00
Jon Chesterfield	5a4a05d776	[libomptarget][nfc] Move some source into common from nvptx Summary: [libomptarget][nfc] Move some source into common from nvptx Moves some source that compiles cleanly under amdgcn into a common subdirectory Includes some non-trivial files and some headers. Keeps the cuda file extension. The build systems for different architectures seem unlikely to have much in common. The idea is therefore to set include paths such that files under common/src compile as if they were under arch/src as the mechanism for sharing. In particular, files under common/src need to be able to include target_impl.h. The corresponding -Icommon is left out in favour of explicit includes on the basis that the it makes it clearer which files under common are used by a given architecture. Reviewers: jdoerfert, ABataev, grokos Reviewed By: ABataev Subscribers: jfb, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70328	2019-11-18 18:17:36 +00:00
JonChesterfield	32dfbd131d	[libomptarget][nfc] Use cuda variable wrappers from support.h Summary: [libomptarget][nfc] Use cuda variable wrappers from support.h Reimplementation of D69693, after the revert of D69885 Use the wrappers in support.h for cuda builtin variables at all call sites. Localises use of cuda and removes WARPSIZE==32 assumption in debug.h. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70186	2019-11-14 12:45:09 +00:00
JonChesterfield	fd9fa9995c	[libomptarget] Move supporti.h to support.cu Summary: [libomptarget] Move supporti.h to support.cu Reimplementation of D69652, without the unity build and refactors. Will need a clean build of libomptarget as the cmakelists changed. Reviewers: ABataev, jdoerfert Reviewed By: jdoerfert Subscribers: mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70131	2019-11-13 11:36:46 +00:00
Jon Chesterfield	7cea0cea77	[libomptarget] Revert all improvements to support Summary: [libomptarget] Revert all improvements to support The change to unity build for nvcc has broken the build for some developers. This patch reverts to a known-working state. There has been some confusion over exactly how the build broke. I think we have reached a common understanding that the disappearing symbols are from the bitcode library built by clang. The static archive built by nvcc may show the same problem. Some of the confusion arose from building the deviceRTL twice and using one or the other library based on various environmental factors. I'm pretty sure the problem is clang expanding `__forceinline__` into both `__inline__` and `attribute(("always_inline"))`. The `__inline__` attribute resolves to linkonce_odr which is not safe for exporting symbols from translation units. "always_inline" is the desired semantic for small functions defined in one translation unit that are intended to be inlined at link time. "inline" is not. This therefore reintroduces the dependency hazard of supporti.h and some code duplication, and blocks progress separating deviceRTL into reusable components. See also D69857, D69859 for attempts at a fix instead of a revert. Reviewers: ABataev, jdoerfert, grokos, ikitayama, tianshilei1992 Reviewed By: ABataev Subscribers: mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69885	2019-11-06 15:44:10 +00:00
Ron Lieberman	dc34b1c94d	Test commit: adds a . to comment. NFC	2019-11-04 16:51:03 -06:00
JonChesterfield	94c59ea8dd	[libomptarget] Implement target_impl for amdgcn Summary: [libomptarget] Implement target_impl for amdgcn Smallest atomic addition for a new target. Implements enough of the amdgcn specific code that some of the source files under nvptx/src could be compiled, without modification, to run on amdgcn. This foreshadows a work in progress patch to move said source out of nvptx/src. Patch based on fork at https://github.com/ROCm-Developer-Tools/llvm-project Reviewers: ABataev, jdoerfert, grokos, ronlieb Subscribers: jvesely, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69718	2019-11-01 15:46:35 +00:00
Alexey Bataev	e57f8ad914	[LIBOMPTARGET]Call GetLaneId function, do not use its address in debug log functions.	2019-11-01 09:43:47 -04:00
JonChesterfield	9b06ac98d0	[nfc][omptarget] Use builtin var abstraction. Second pass at D69476 Summary: [nfc][omptarget] Use builtin var abstraction. Second pass at D69476 Use the wrappers in support.h for cuda builtin variables at all call sites. Localises use of cuda and removes WARPSIZE==32 assumption in debug.h. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69693	2019-11-01 02:21:44 +00:00
JonChesterfield	764c8420e4	[nfc][libomptarget] Reorganise support header Summary: [nfc][libomptarget] Reorganise support header All functions defined in support implementation are now declared in support.h Reordered functions in support implementation to match the sequence in support.h Added include guards to support.h Added #include interface to support.h to provide kmp_Ident declaration Move supporti.h to support.cu and s/INLINE/EXTERN/g Add remaining includes to support.cu A minor side effect is to change the name mangling of the support functions to extern "C". If this matters another macro along the lines of INLINE/EXTERN can be added - perhaps DEVICE as that's the obvious implementation. Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69652	2019-10-31 17:15:02 +00:00
Jon Chesterfield	e9f9dfab82	[libomptarget] Change nvcc compilation to use a unity build Summary: [libomptarget] Change nvcc compilation to use a unity build This allows nvcc to inline functions between what would otherwise be distinct translation units, which in turn removes any runtime cost from implementing functions in source files (as opposed to inline in headers). This will then allow the circular dependencies in deviceRTL to be readily broken and individual components more easily shared between architectures. Reviewers: ABataev, jdoerfert, grokos, RaviNarayanaswamy, hfinkel, ronlieb, gregrodgers Reviewed By: jdoerfert Subscribers: mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69489	2019-10-31 01:58:51 +00:00
Jon Chesterfield	8548e2f543	[nfc][libomptarget] Move named_sync() into target_impl Summary: [nfc][libomptarget] Move named_sync() into target_impl Reviewers: ABataev, jdoerfert, grokos Reviewed By: ABataev Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69487	2019-10-30 16:25:05 +00:00
Jon Chesterfield	74bb5ee674	[nfc][libomptarget] Move smid() into target_impl Summary: [nfc][libomptarget] Move smid() into target_impl Reviewers: ABataev, jdoerfert, grokos Reviewed By: ABataev Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69485	2019-10-30 13:39:15 +00:00
Jon Chesterfield	62a161cc00	[libomptarget] Always call malloc, free via SafeMalloc, SafeFree wrapper Summary: [libomptarget] Always call malloc, free via SafeMalloc, SafeFree wrapper NFC for release, adds some verbosity to debug printing. Motivation is to provide one place where local modifications can be made to the behaviour of all heap allocation or deallocation while debugging. Reviewers: jdoerfert, ABataev, grokos Reviewed By: ABataev Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69492	2019-10-30 13:35:34 +00:00
Alexey Bataev	d7941a6ab9	[LIBOMPTARGET]Fix build, NFC. Need to include nvptx_interface.h in target_impl.h, otherwise the build is failed because of missing __kmpc_impl_lanemask_t type.	2019-10-28 10:43:00 -04:00
Jon Chesterfield	174967f153	[nfc][libomptarget] Decrease coupling between files Summary: [nfc][libomptarget] Decrease coupling between files debug.h used the symbol omptarget_device_environment so implicitly required an include of omptarget-nvptx.h to compile. Similarly interface.h uses size_t. Moving this declaration to a new header means cancel, critical can now build without omptarget-nvptx.h. After this change, debug.h, cancel.cu, critical.cu could move under a common source directory. Reviewers: ABataev, jdoerfert, grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69473	2019-10-27 14:27:54 +00:00
Jon Chesterfield	ad4c42666d	[nfc][libomptarget] Inline option into target_impl Summary: [nfc][libomptarget] Inline option into target_impl Subset of D69423. The macros that were in option.h are all target dependent. Inlining the header simplifies the dependency graph when looking to move code into a common subdir. Reviewers: ABataev, jdoerfert, grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69472	2019-10-27 14:26:55 +00:00
Jon Chesterfield	f7c3c640af	[NFC][libomptarget]Remove TRUE,FALSE macros from option.h Summary: [NFC][libomptarget]Remove TRUE,FALSE macros from option.h Subset of D69423. Patch series ends with removing option.h. Reviewers: ABataev, jdoerfert, grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69463	2019-10-27 01:31:12 +01:00
Jon Chesterfield	197b7b24c3	[NFC][libomptarget] move remaining device specific code out of omptarget-nvptx.h Summary: [NFC][libomptarget] move remaining device specific code out of omptarget-nvptx.h Strictly there is one remaining difference wrt amdgcn - parallelLevel is volatile qualified on amdgcn and not on nvptx. Determining whether this is correct - and how to represent the different semantics of 'volatile' under various conditions - is beyond the scope of this code motion patch. Reviewers: ABataev, jdoerfert, grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D69424	2019-10-25 18:58:31 +01:00
Jon Chesterfield	d69d1aa131	[libomptarget][nfc] Make interface.h target independent Summary: [libomptarget][nfc] Make interface.h target independent Move interface.h under a top level include directory. Remove #includes to avoid the interface depending on the implementation. Reviewers: ABataev, jdoerfert, grokos, ronlieb, RaviNarayanaswamy Reviewed By: jdoerfert Subscribers: mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D68615 llvm-svn: 374919	2019-10-15 17:15:26 +00:00
Jon Chesterfield	58fd6b5b9c	[libomptarget][nfc] Update remaining uint32 to use lanemask_t Summary: [libomptarget][nfc] Update remaining uint32 to use lanemask_t Update a few functions in the API to use lanemask_t instead of i32. NFC for nvptx. Also update the ActiveThreads type in DataSharingStateTy. This removes a lot of #ifdef from the downsteam amdgcn implementation. Reviewers: ABataev, jdoerfert, grokos, ronlieb, RaviNarayanaswamy Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D68513 llvm-svn: 373806	2019-10-04 22:30:28 +00:00
Jon Chesterfield	4f75a73796	Use named constant to indicate all lanes, to handle 32 and 64 wide architectures Summary: Use named constant to indicate all lanes, to handle 32 and 64 wide architectures Reviewers: ABataev, jdoerfert, grokos, ronlieb Reviewed By: grokos Subscribers: ronlieb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D68369 llvm-svn: 373793	2019-10-04 21:39:22 +00:00
Sergey Dmitriev	4b343fd84c	[Clang][OpenMP Offload] Create start/end symbols for the offloading entry table with a help of a linker Linker automatically provides __start_<section name> and __stop_<section name> symbols to satisfy unresolved references if <section name> is representable as a C identifier (see https://sourceware.org/binutils/docs/ld/Input-Section-Example.html for details). These symbols indicate the start address and end address of the output section respectively. Therefore, renaming OpenMP offload entries section name from ".omp.offloading_entries" to "omp_offloading_entries" to use this feature. This is the first part of the patch for eliminating OpenMP linker script (please see https://reviews.llvm.org/D64943). Differential Revision: https://reviews.llvm.org/D68070 llvm-svn: 373118	2019-09-27 20:00:51 +00:00
Alexey Bataev	4812941776	[OPENMP][NVPTX]Fix parallel level counter in non-SPMD mode. Summary: In non-SPMD mode we may end up with the divergent threads when trying to increment/decrement parallel level counter. It may lead to incorrect calculations of the parallel level and wrong results when threads are divergent. We need to reconverge the threads before trying to modify the parallel level counter. Reviewers: grokos, jdoerfert Subscribers: guansong, openmp-commits, caomhin, kkwli0 Tags: #openmp Differential Revision: https://reviews.llvm.org/D66802 llvm-svn: 370803	2019-09-03 18:11:50 +00:00
Jon Chesterfield	bbdd282371	[libomptarget] Refactor activemask macro to inline function Summary: [libomptarget] Refactor activemask macro to inline function See also abandoned D66846, split into this diff and others. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Reviewed By: jdoerfert, ABataev Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66851 llvm-svn: 370781	2019-09-03 16:31:30 +00:00
Jon Chesterfield	3294421926	Use target_impl functions to replace more inline asm Summary: Use target_impl functions to replace more inline asm Follow on from D65836. Removes remaining asm shuffles and lanemask accessors Also changes the types of target_impl bitwise functions to unsigned. Reviewers: jdoerfert, ABataev, grokos, Hahnfeld, gregrodgers, ronlieb, hfinkel Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66809 llvm-svn: 370216	2019-08-28 15:04:06 +00:00
Jon Chesterfield	80f9a38a76	[libomptarget] Refactor syncthreads macro to inline function Summary: [libomptarget] Refactor syncthreads macro to inline function See also abandoned D66846, split into this diff and others. Rev 2 of D66855 Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66861 llvm-svn: 370210	2019-08-28 14:22:35 +00:00
Jon Chesterfield	be3d487313	[libomptarget] Refactor syncwarp macro to inline function Summary: [libomptarget] Refactor syncwarp macro to inline function See also abandoned D66846, split into this diff and others. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66857 llvm-svn: 370149	2019-08-28 02:02:53 +00:00
Jon Chesterfield	e73e3013a6	Fix build break due to close brace lost in merge llvm-svn: 370148	2019-08-28 01:56:26 +00:00
Jon Chesterfield	327aa81123	[libomptarget] Refactor shfl_down_sync macro to inline function Summary: [libomptarget] Refactor shfl_down_sync macro to inline function See also abandoned D66846, split into this diff and others. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66853 llvm-svn: 370146	2019-08-28 01:47:41 +00:00
Jon Chesterfield	b9b712df82	[libomptarget] Refactor shfl_sync macro to inline function Summary: [libomptarget] Refactor shfl_sync macro to inline function See also abandoned D66846, split into this diff and others. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66852 llvm-svn: 370144	2019-08-28 01:31:04 +00:00
Alexey Bataev	da8b5cc9f1	[OPENMP][NVPTX]Add __kmpc_syncwarp(int32_t) function. Summary: Added function void __kmpc_syncwarp(int32_t) to expose it to the compiler. It is required to fix the problem with the critical regions in Cuda9.0+. We cannot use barrier in the critical region, but still need to reconverge the threads in the warp after. This function allows to do this. Reviewers: grokos, jdoerfert Subscribers: guansong, openmp-commits, kkwli0, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D66672 llvm-svn: 369933	2019-08-26 17:32:45 +00:00
Alexey Bataev	0366168f3a	[OPENMP][NVPTX]Use __syncwarp() to reconverge the threads. Summary: In Cuda 9.0 it is not guaranteed that threads in the warps are convergent. We need to use __syncwarp() function to reconverge the threads and to guarantee the memory ordering among threads in the warps. This is the first patch to fix the problem with the test libomptarget/deviceRTLs/nvptx/src/sync.cu on Cuda9+. This patch just replaces calls to __shfl_sync() function with the call of __syncwarp() function where we need to reconverge the threads when we try to modify the value of the parallel level counter. Reviewers: grokos Subscribers: guansong, jfb, jdoerfert, caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D65013 llvm-svn: 369796	2019-08-23 18:34:48 +00:00
Jon Chesterfield	ed3324f6b6	Factor architecture dependent code out of loop.cu Summary: [libomptarget] Factor architecture dependent code out of loop.cu Related to the patch series starting D64217. Added subscribers to said series as reviewers. This effort is smaller in scope. This patch factors out just enough architecture dependent code from loop.cu to allow the same source to be used with amdgcn, given a different target_impl.h. Testing is that the same bitcode (modulo variable names) is generated for libomptarget before and after the refactor, for nvptx and the out of tree amdgcn. Reviewers: jdoerfert, ABataev, bollu, jfb, tra, grokos, Hahnfeld, guansong, xtian, gregrodgers, ronlieb, hfinkel, gtbercea, guraypp, arpith-jacob Reviewed By: jdoerfert, ABataev Subscribers: dexonsmith, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D65836 llvm-svn: 368751	2019-08-13 21:41:47 +00:00
Gheorghe-Teodor Bercea	6c7b882e52	[OpenMP][libomptarget] Add support for close map modifier Summary: This patch adds support for the close map modifier. The close map modifier will overwrite the unified shared memory requirement and create a device copy of the data. Reviewers: ABataev, Hahnfeld, caomhin, grokos, jdoerfert, AlexEichenberger Reviewed By: Hahnfeld, AlexEichenberger Subscribers: guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D65340 llvm-svn: 368488	2019-08-09 21:32:57 +00:00
Jonas Hahnfeld	7a0f2dc5a4	[libomptarget] Remove duplicate RTLRequiresFlags per device We have one global RTLs.RequiresFlags, I don't see a need to make a copy per device that the runtime manages. This was problematic anyway because the copy happened during the first __tgt_register_lib(). This made it impossible to call __tgt_register_requires() from normal user funtions for testing. Hence, this change also fixes unified_shared_memory/shared_update.c for older versions of Clang that don't call __tgt_register_requires() before __tgt_register_lib(). Differential Revision: https://reviews.llvm.org/D66019 llvm-svn: 368465	2019-08-09 19:20:39 +00:00
Gheorghe-Teodor Bercea	a1d20506e7	[OpenMP][libomptarget] Add support for unified memory for regular maps Summary: This patch adds support for using unified memory in the case of regular maps that happen when a target region is offloaded to the device. For cases where only a single version of the data is required then the host address can be used. When variables need to be privatized in any way or globalized, then the copy to the device is still required for correctness. Reviewers: ABataev, jdoerfert, Hahnfeld, AlexEichenberger, caomhin, grokos Reviewed By: Hahnfeld Subscribers: mgorny, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D65001 llvm-svn: 368192	2019-08-07 17:29:45 +00:00
Jon Chesterfield	ae0178bee7	Use forceinline. Necessary for nvcc to inline small functions within the bitcode library Summary: [libomptarget] Use forceinline. Necessary for nvcc to inline small functions within the bitcode library Suggested in D65836 Reviewers: ABataev, jdoerfert, grokos, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D65876 llvm-svn: 368177	2019-08-07 15:24:12 +00:00
Alexey Bataev	c10180ed8e	[OPENMP][OFFLOADING]Fix the test, NFC. llvm-svn: 368068	2019-08-06 18:13:39 +00:00
Michael Kruse	78769ec403	[libomptarget] Harmonize emitting CUDA errors and general debug messages. Ensures that CUDA fail reasons (such as "No CUDA-capable device detected") are printed together with libomptarget's debug message (e.g. "Error when setting CUDA context"). Previously, the former was printed only in CMAKE_BUILD_TYPE=Debug builds while the latter was enabled by LIBOMPTARGET_ENABLE_DEBUG. With this change, also only call cuGetErrorString when the error will be printed. Suggested-by: Ye Luo <xw111luoye@gmail.com> Differential Revision: https://reviews.llvm.org/D65687 llvm-svn: 367910	2019-08-05 19:12:10 +00:00
Michael Kruse	2c7a8eaf3d	[OpenMP 5.0] libomptarget interface for declare mapper functions. This patch implements the libomptarget runtime interface for OpenMP 5.0 declare mapper functions. The declare mapper functions generated by Clang will call them to complete the mapping of members. kmpc_mapper_num_components gets the current number of components for a user-defined mapper; kmpc_push_mapper_component pushes back one component for a user-defined mapper. The design slides can be found at https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx Patch by Lingda Li <lildmh@gmail.com> Differential Revision: https://reviews.llvm.org/D60972 llvm-svn: 367772	2019-08-04 04:18:28 +00:00
Alexey Bataev	ca424d100c	[OPENMP][NVPTX]Perform memory flush if number of threads to sync is 1 or less. Summary: According to the OpenMP standard, barrier operation must perform implicit flush operation. Currently, if there is only one thread in the team, barrier does not flush the memory. Patch fixes this problem. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62398 llvm-svn: 367024	2019-07-25 15:02:28 +00:00
Jonas Hahnfeld	6e40ae8f3d	[libomptarget] Handle offload policy in push_tripcount If the first target region in a program calls the push_tripcount function, libomptarget didn't handle the offload policy correctly. This could lead to unexpected error messages as seen in http://lists.llvm.org/pipermail/openmp-dev/2019-June/002561.html To solve this, add a check calling IsOffloadDisabled() as all other entry points already do. If this method returns false, libomptarget is effectively disabled. Differential Revision: https://reviews.llvm.org/D64626 llvm-svn: 366810	2019-07-23 14:20:48 +00:00
Alexey Bataev	da43861b4a	[OpenMP][libomptarget] Suppress C++ 11 related warnings when building libomptarget-nvptx bitcode library, by Doru Bercea. Summary: Pass -std=c++11 flag to compiler to suppress C++ 11 related warnings when building NVPTX bitcode library. Reviewers: ABataev, caomhin, Hahnfeld Reviewed By: ABataev, Hahnfeld Subscribers: jdoerfert, Hahnfeld, jholewinski, mgorny, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D55772 llvm-svn: 366438	2019-07-18 13:54:01 +00:00
Ron Lieberman	59532488b1	[OPENMP] Resolve lost LoopTripCnt for subsequent loops in same thread. Remove loopTripCnt from threaded device stack after consuming it. Added a libomptarget DP message to aid in future debugging and to validate the added testcase, which only runs in Debug build. Differential Revision: https://reviews.llvm.org/D64808 llvm-svn: 366349	2019-07-17 17:07:52 +00:00
Alexey Bataev	85b9651edd	[OPENMP][NVPTX]Fixed checks for cuda versions. Summary: We used CUDART_VERSION macro to check for the installed cuda version but this macro is defined in cuda_runtime_api.h, which is not used by project. Better to use CUDA_VERSION macro, which is defined in cuda.h. Also, added the check if this macro is defined. If macro is undefined, there is something wrong with the cuda configuration and we should not continue the compilation. This also fixes problems with runtime building in cuda 10+. Reviewers: grokos Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D64648 llvm-svn: 366224	2019-07-16 16:07:10 +00:00
Alexey Bataev	42816107f7	[OPENMP]Fix threadid in __kmpc_omp_taskwait call for dependent target calls. Summary: We used to call __kmpc_omp_taskwait function with global threadid set to 0. It may crash the application at the runtime if the thread executing target region is not a master thread. Reviewers: grokos, kkwli0 Subscribers: guansong, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D64571 llvm-svn: 366220	2019-07-16 15:51:32 +00:00
Jonas Hahnfeld	aca476b296	[libomptarget] Fix typos and grammar in error messages, NFC. llvm-svn: 365890	2019-07-12 10:21:55 +00:00
Jonas Hahnfeld	2dfc5179f6	[libomptarget-nvptx] Remove dead functions These entry points are never called by Clang trunk nor clang-ykt. If XL doesn't use them either, they can finally go away. Differential Revision: https://reviews.llvm.org/D52700 llvm-svn: 365817	2019-07-11 20:12:51 +00:00
Alexey Bataev	4ad9286a57	[OPENMP]Rename loopTripCnt member data to LoopTripCnt, NFC. Rename variable to follow LLVM coding standard. llvm-svn: 365368	2019-07-08 18:45:48 +00:00
Alexey Bataev	060921dee7	[OPENMP]Make __kmpc_push_tripcount thread safe. Summary: __kmpc_push_tripcount function is not thread safe and may lead to data race when the target regions are executed in parallel threads. The patch makes loopTripCnt counter thread aware and stores the tripcount value per thread in the map. Access to map is guarded by mutex to prevent data race in the map itself. Test is for NVPTX target because it does not work correctly on the host. Seems to me, there is a problem in libomp with target regions in the parallel threads. Reviewers: grokos Subscribers: guansong, jfb, jdoerfert, openmp-commits, kkwli0, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D64080 llvm-svn: 365332	2019-07-08 15:30:23 +00:00
Alexey Bataev	bb55ece269	[OPENMP][NVPTX]Relax flush directive. Summary: According to the OpenMP standard, flush makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied. According to the Cuda toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#memory-fence-functions), __threadfence() functions provides required functionality. __threadfence_system() also provides required functionality, but it also includes some extra functionality, like synchronization of page-locked host memory, synchronization for the host, etc. It is not required per the standard and we can use more relaxed version of memory fence operation. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62397 llvm-svn: 364572	2019-06-27 18:33:09 +00:00
Gheorghe-Teodor Bercea	aace6d285d	[OpenMP][libomptarget] Add support for declare target to clause under unified memory Summary: This patch adds support for handling variables under the: ``` #pragma omp declare target to() ``` clause when the ``` #pragma omp requires unified_shared_memory ``` is used. The address of the host variable is copied into the device pointer just like for the declare target link case. Reviewers: ABataev, caomhin, grokos, AlexEichenberger Reviewed By: grokos Subscribers: jcownie, guansong, jdoerfert, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D63106 llvm-svn: 363825	2019-06-19 15:48:10 +00:00
Alexey Bataev	8a2bd361eb	[OPENMP][CUDA]Use __syncthreads when compiled by nvcc and clang >= 9.0. Summary: The problems with __syncthreads() were fixed in clang >= 9.0 and the original __syncthreads() can be used instead of the ptx instruction. Reviewers: grokos Subscribers: guansong, jdoerfert, openmp-commits, kkwli0, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D63515 llvm-svn: 363807	2019-06-19 14:20:34 +00:00
Gheorghe-Teodor Bercea	c5fe030c16	[OpenMP][libomptarget] Enable usage of unified memory for declare target link variables Summary: This patch enables the usage of a host variable on the device for declare target link variables when unified memory is available. Reviewers: ABataev, caomhin, grokos Reviewed By: grokos Subscribers: Hahnfeld, guansong, jdoerfert, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D60884 llvm-svn: 362505	2019-06-04 15:05:53 +00:00
Alexey Bataev	e1947b84c1	Revert "[OPENMP][NVPTX]Fix barriers and parallel level counters, NFC." This reverts commit r361421 to split the patch into 3 parts. llvm-svn: 361638	2019-05-24 14:06:47 +00:00
Alexey Bataev	9d9e406684	[OPENMP][NVPTX]Fix barriers and parallel level counters, NFC. Summary: Parallel level counter should be volatile to prevent some dangerous optimiations by the ptxas. Otherwise, ptxas optimizations lead to undefined behaviour in some cases. Also, use __threadfence() for #pragma omp flush and if the barrier should not be used (we have only one thread in the team), still perform flush operation since the standard requires implicit flush when executing barriers. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62199 llvm-svn: 361421	2019-05-22 19:50:32 +00:00
Gheorghe-Teodor Bercea	9e9c918259	[OpenMP][libomptarget] Enable requires flags for target libraries. Summary: Target link variables are currently implemented by creating a copy of the variables on the device side and unified memory never gets exploited. When the prgram uses the: ``` #pragma omp requires unified_shared_memory ``` directive in conjunction with a declare target link, the linked variable is no longer allocated on the device and the host version is used instead. This behavior is overridden by performing an explicit mapping. A Clang side patch is required. Reviewers: ABataev, AlexEichenberger, grokos, Hahnfeld Reviewed By: AlexEichenberger, grokos, Hahnfeld Subscribers: Hahnfeld, jfb, guansong, jdoerfert, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D60223 llvm-svn: 361294	2019-05-21 19:35:02 +00:00
Alexey Bataev	f9e00db818	[OPENMP][NVPTX]Simplify handling of thread limit, NFC. Summary: Patch improves performance of the full runtime mode by moving threads limit counter to the shared memory. It also allows to save global memory. Reviewers: grokos, kkwli0, gtbercea Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D61801 llvm-svn: 360584	2019-05-13 14:21:46 +00:00
Alexey Bataev	f62c266de7	[OPENMP][NVPTX]Improve number of threads counter, NFC. Summary: Patch improves performance of the full runtime mode by moving number-of-threads counter to the shared memory. It also allows to save global memory. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D61785 llvm-svn: 360457	2019-05-10 18:56:05 +00:00
Alexey Bataev	a857e31011	[OPENMP][NVPTX]Improve thread limit counter, NFC. Summary: Patch improves performance of the full runtime mode by moving thread-limit counter to the shared memory. It also allows to save global memory. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61526 llvm-svn: 359922	2019-05-03 20:00:38 +00:00
Alexey Bataev	e031e17919	[OPENMP][NVPTX]Improved several standard OpenMP functions, NFC. Summary: Used parallelLevel[] counter to simplify and improve implementation of the existing standard OpenMP functions. Functions are tested already in several tests, the patch is NFC. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61459 llvm-svn: 359892	2019-05-03 14:47:20 +00:00
Alexey Bataev	8ccb8f8647	[OPENMP][NVPTX]Improve code by using parallel level counter. Summary: Previously for the different purposes we need to get the active/common parallel level and with full runtime we iterated over all the records to calculate this level. Instead, we can used the warp-based parallel level counters used in no-runtime mode. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61395 llvm-svn: 359822	2019-05-02 20:05:01 +00:00
Alexey Bataev	4ad6dbc5fd	[OPENMP][NVPTX]Improve omp_get_max_threads() function. Summary: Function omp_get_max_threads() can always return 1 if current execution mode is SPMD. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61379 llvm-svn: 359792	2019-05-02 14:52:52 +00:00
Alexey Bataev	8e6bf88cf7	[OPENMP][NVPTX]Improved omp_get_thread_limit() function. Summary: Function omp_get_thread_limit() in SPMD mode can return the maximum available number of threads as a result. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D61378 llvm-svn: 359790	2019-05-02 14:46:32 +00:00
Alexey Bataev	c03fe73176	[OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode. Summary: The parallelLevel counter must be on per-thread basis to fully support L2+ parallelism, otherwise we may end up with undefined behavior. Introduce the parallelLevel on per-warp basis using shared memory. It allows to avoid the problems with the synchronization and allows fully support L2+ parallelism in SPMD mode with no runtime. Reviewers: gtbercea, grokos Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D60918 llvm-svn: 359341	2019-04-26 19:30:34 +00:00
Alexey Bataev	5de5d74c8d	[OPENMP][NVPTX] Fix the test, NFC. Fix the test to run it really in SPMD mode without runtime. Previously it was run in SPMD + full runtime mode and does not allow to cehck the functionality correctly. llvm-svn: 358902	2019-04-22 17:25:31 +00:00
Alexey Bataev	13532ea623	[OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions. Summary: If the kernel is executed in SPMD mode and the L2+ parallel for region with the dynamic scheduling is executed, dynamic scheduling functions are called. They expect full runtime support, but SPMD kernels may be executed without the full runtime. It leads to the runtime crash of the compiled program. Patch fixes this problem + fixes handling of the parallelism level in SPMD mode, which is required as part of this patch. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D60578 llvm-svn: 358442	2019-04-15 20:15:20 +00:00
Michael Kruse	d97d5ebcfa	[libomptarget] Introduce LIBOMPTARGET_ENABLE_DEBUG cmake option. At the moment, support for runtime debug output using the OMPTARGET_DEBUG=1 environment variable is only available with CMAKE_BUILD_TYPE=Debug builds. The patch allows setting it independently using the LIBOMPTARGET_ENABLE_DEBUG option, which is enabled by default depending on CMAKE_BUILD_TYPE. That is, unless this option is set explicitly, nothing changes. This is the same mechanism used by LLVM for LLVM_ENABLE_ASSERTIONS. This patch also removes adding -g -O0 in debug builds, it should be handled by cmake's CMAKE_{C\|CXX}_FLAGS_DEBUG configuration option. Idea by Hal Finkel Differential Revision: https://reviews.llvm.org/D55952 llvm-svn: 356998	2019-03-26 15:19:15 +00:00
Gheorghe-Teodor Bercea	06e08f0b0a	[OpenMP][libomptarget] New reduction scheme for team reductions Summary: This patch adds a more sophisticated team reduction scheme to the OpenMP libomptarget-nvptx runtime. The scheme uses a fixed size global memory buffer whose length can be adjusted via compiler flag: ``` -fopenmp-cuda-teams-reduction-recs-num=1024 ``` The global buffer is a structure of arrays (with default size of 1024 each and controlled by the above flag), one array for each reduction variable. Values in the buffer are processed by the last team to finish executing the body of the target region. In addition to adding support for the new flag, the compiler also emits special functions used for the reduction of the intermediate reduction values. These changes will be added in a separate compiler patch following this one. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, jdoerfert, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D58409 llvm-svn: 354471	2019-02-20 14:55:55 +00:00
Chandler Carruth	57b08b0944	Update more file headers across all of the LLVM projects in the monorepo to reflect the new license. These used slightly different spellings that defeated my regular expressions. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351648	2019-01-19 10:56:40 +00:00
Gheorghe-Teodor Bercea	1653633a1c	[OpenMP][libomptarget] Use shared memory variable for tracking parallel level Summary: Replace existing infrastructure for tracking parallel level using global memory with a per-team shared memory variable. This minimizes the impact of the overhead of tracking the parallel level for non-nested cases. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D55773 llvm-svn: 350747	2019-01-09 18:30:14 +00:00
Alexey Bataev	26e6c86b79	[OPENMP][NVPTX]Fix dynamic scheduling. Summary: Previous implementation may cause the runtime crash when the number of teams is > 1024. Patch fixes this problem + reduces number of the atomic operations by 32 times. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56332 llvm-svn: 350524	2019-01-07 14:25:25 +00:00
Alexey Bataev	6b3153ada0	[OPENMP][NVPTX]General formatting/code improvement, NFC. Summary: Formatting. Reviewers: gtbercea, grokos, kkwli0 Subscribers: guansong, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56290 llvm-svn: 350431	2019-01-04 20:16:54 +00:00
Alexey Bataev	dcf2edcdf5	[OPENMP][NVPTX]Improve performance + reduce number of used registers. Summary: Reduced number of the used register + improved performance propagating the information about current execution/data sharing mode directly from the compiler, where it is possible. In some cases, it requires new/reworked interfaces of the runtime external functions. Old functions are marked as deprecated. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56278 llvm-svn: 350405	2019-01-04 17:09:12 +00:00
Joel E. Denny	f17f7a5d4d	[OpenMP] Fix nvidia-cuda-toolkit detection on Debian/Ubuntu The OpenMP runtime's cmake scripts do not correctly locate the libdevice that the Debian/Ubuntu package nvidia-cuda-toolkit currently includes, at least on my Ubuntu 18.04.1 installation. This patch fixes that for me. This problem was discussed at length in D55269. D40453 added a similar adjustment in clang, but reviewers of D55269 concluded that, for the OpenMP runtime, the right place to address this problem is in cmake's CUDA support. However, it was also suggested we could add a workaround to OpenMP's cmake scripts now. This patch contains such a workaround, which I've tried to design so that it will have no harmful effect if cmake improves in the future. nvidia-cuda-toolkit also needs improvements because its intended monolithic CUDA tree shim, /usr/lib/cuda, has many empty directories, such as bin. I reported that at: <https://bugs.launchpad.net/ubuntu/+source/nvidia-cuda-toolkit/+bug/1808999> Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D55588 llvm-svn: 350377	2019-01-04 02:07:13 +00:00
Jonathan Peyton	76f3980a20	[OpenMP] Add omp_get_device_num() and update several other device API functions Add omp_get_device_num() function for 5.0 which returns the number of the device the current thread is running on. Currently, we are leaving it to the compiler to handle this properly if it is called inside target. Also, did some cleanup and updating of duplicate device API functions (in both libomp and libomptarget) to make them into weak functions that check for the symbol from libomptarget, and will call the version in libomptarget if it is present. If any additional device API functions are implemented also in libomptarget in the future, we should add the dlsym calls to the host functions. Also, if the omp_target_* functions are to be implemented for the host (this has been requested), they should attempt to call the libomptarget versions as well. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D55578 llvm-svn: 350352	2019-01-03 21:14:19 +00:00
Alexey Bataev	3c74be8049	[OPENMP][NVPTX]Fix incompatibility of __syncthreads with LLVM, NFC. Summary: One of the LLVM optimizations, split critical edges, also clones tail instructions. This is a dangerous operation for __syncthreads() functions and this transformation leads to undefined behavior or incorrect results. Patch fixes this problem by replacing __syncthreads() function with the assembler instruction, which cost is too high and wich cannot be copied. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56274 llvm-svn: 350333	2019-01-03 17:43:46 +00:00
Vyacheslav Zakharin	e889ac7e6b	[libomptarget] Added install component for libomptarget Differential Revision: https://reviews.llvm.org/D56108 llvm-svn: 350254	2019-01-02 19:39:49 +00:00
Alexey Bataev	d1cd005ec5	[OPENMP][NVPTX]Added/fixed debugging messages, NFC. Summary: Added or fixed new/old debugging messages for the better diagnostics. Reviewers: gtbercea, kkwli0, grokos Reviewed By: grokos Subscribers: caomhin, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D56102 llvm-svn: 350137	2018-12-28 21:36:09 +00:00
Alexey Bataev	28eccf5ba0	[OPENMP][NVPTX]Fixed initialization of the data-sharing interface. Summary: Avoid using of the atomic loop to wait for the completion of the data-sharing interface initialization, use __shfl_sync instead for the communication within the warp to signal other threads in the warp about completion of the initialization. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D56100 llvm-svn: 350129	2018-12-28 17:31:06 +00:00
Alexey Bataev	1708858dbd	[OPENMP][NVPTX]Outline assert into noinline function, NFC. Summary: At high optimization level asserts lead to some unexpected results because of auto-inserted unreachable instructions. This outlining prevents some of such dangerous optimizations and leads to better stability. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D56101 llvm-svn: 350128	2018-12-28 17:29:47 +00:00
Alexey Bataev	9056f1116d	[OPENMP][NVPTX]Revert __kmpc_shuffle_int64 to its original form. Summary: Use the original shuffle implementation for __kmpc_shuffle_int64 since default implementation uses the same implementation. Reviewers: gtbercea Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55514 llvm-svn: 348772	2018-12-10 16:50:36 +00:00
Alexey Bataev	cc6cf64c38	[OPENMP][NVPTX]Enable fast shuffles on 64bit values only if CUDA >= 9. Summary: Shuffle on 64bit data is allowed only for CUDA >= 9.0. Also, fixed the constant for the mask, need one extra L in the end. Reviewers: gtbercea, kkwli0 Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55440 llvm-svn: 348758	2018-12-10 14:29:05 +00:00
Alexey Bataev	8acafff404	[OPENMP][NVPTX]Save registers for optimized builds with enabled logging. Summary: Introduced special noinline function log that allows to save some registers for optimized builds but with enabled logging. Also, it increases the stability of the optimized builds with inlined runtime. Reviewers: gtbercea, kkwli0 Reviewed By: gtbercea Subscribers: caomhin, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D55436 llvm-svn: 348606	2018-12-07 16:08:29 +00:00
Alexey Bataev	653e8ba79a	[OPENMP][NVPTX]Correct type casting for printf args + simplified shfl64 function. Summary: Explicitly casted printf's args to the required types + simplified shfl64 function. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55379 llvm-svn: 348521	2018-12-06 19:45:48 +00:00
Alexey Bataev	5442f3e549	[OPENMP][NVPTX]Fix __kmpc_flush to flush the memory per system, not per block. Summary: According to the standard, after memory flushing the changes in the memory must be visible to all the threads in all teams. Patch fixes this. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55370 llvm-svn: 348491	2018-12-06 15:27:58 +00:00
Gheorghe-Teodor Bercea	10b2e60b7e	[OpenMP][libomptarget] Flush intermediate values during team reduction Summary: Ensure intermediate values of a team reduction are flushed to memory. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D55219 llvm-svn: 348148	2018-12-03 15:21:49 +00:00
Alexey Bataev	0f221f53d8	[OPENMP][NVPTX]Make runtime compatible with the original runtime. Summary: Reworked runtime to make it compatible with the requirements of the original runtime library. Also, simplified some code to reduce number of function calls. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55130 llvm-svn: 348003	2018-11-30 16:52:38 +00:00
Gheorghe-Teodor Bercea	31c1589ab0	[OpenMP][libomptarget] Add new version of SPMD deinit kernel function with argument Summary: To enable the compiler to optimize parts of the function that are not needed when runtime can be omitted, a new version of the SPMD deinit kernel function is needed. This function takes the runtime required flag as an argument. Reviewers: ABataev, kkwli0, caomhin Reviewed By: ABataev Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D54969 llvm-svn: 347714	2018-11-27 21:23:40 +00:00
Alexey Bataev	d4de439cf4	[OPENMP][NVPTX]Basic support for reductions across the teams. Summary: Added functions __kmpc_nvptx_teams_reduce_nowait_simple and __kmpc_nvptx_teams_end_reduce_nowait_simple to implement basic support for reductions across the teams. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D54967 llvm-svn: 347710	2018-11-27 21:06:09 +00:00
Gheorghe-Teodor Bercea	ad8632a9ba	[OpenMP][libomptarget] Refactor SPMD and runtime requirement checking Summary: Refactor the checking for SPMD mode and whether the runtime is initialized or not. This uses constant flags which enables the runtime to optimize out unused sections of code that depend on these flags. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D54960 llvm-svn: 347698	2018-11-27 19:45:10 +00:00
Alexey Bataev	8ab0924ab4	[OPENMP][NVPTX]Improved lock/critical constructs. Summary: Improved support for critical constructs + omp_..._lock... constructs. Reviewers: gtbercea, kkwli0, caomhin Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D54766 llvm-svn: 347342	2018-11-20 20:19:36 +00:00
Alexey Bataev	15ab891e68	[OPENMP]Make lambda mapping follow reqs for PTR_AND_OBJ mapping. Summary: The base pointer for the lambda mapping must point to the lambda capture placement and pointer must point to the captured variable itself. Patch fixes this problem. Reviewers: gtbercea Subscribers: guansong, openmp-commits, kkwli0, caomhin Differential Revision: https://reviews.llvm.org/D54260 llvm-svn: 346407	2018-11-08 15:47:30 +00:00
Alexey Bataev	9476ca7db9	[OPENMP][OFFLOADING]Change the lambda capturing flags. Summary: The previously used combination `PTR_AND_OBJ \| PRIVATE` could be used for mapping of some data in Fortran. Changed it to `PTR_AND_OBJ \| LITERAL`. Reviewers: gtbercea Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D54035 llvm-svn: 345981	2018-11-02 15:24:47 +00:00

... 6 7 8 9 10 ...

858 Commits