llvm-project

Commit Graph

Author	SHA1	Message	Date
Vyacheslav Zakharin	6baeeb9efa	[libomptarget] Fixed MSVC build fail caused by __attribute__((used)). Differential Revision: https://reviews.llvm.org/D97348	2021-02-24 09:59:39 -08:00
Shilei Tian	e5da63d5a9	[OpenMP] Fixed a crash when offloading to x86_64 with target nowait PR#49334 reports a crash when offloading to x86_64 with `target nowait`, which is caused by referencing a nullptr. The root cause of the issue is, when pushing a hidden helper task in `__kmp_push_task`, it also maps the gtid to its shadow gtid, which is wrong. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97329	2021-02-24 12:37:30 -05:00
Manoel Roemmer	542d9c2154	[libomptarget] Load images in order of registration This makes sure that images are loaded in the order in which they are registered with libomptarget. If a target can load multiple images and these images depend on each other (for example if one image contains the programs target regions and one image contains library code), then the order in which images are loaded can be important for symbol resolution (for example, in the VE plugin). In this case: because the same code exist in the host binaries, the order in which the host linker loads them (which is also the order in which images are registered with libomptarget) is the order in which the images have to be loaded onto the device. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95530	2021-02-24 18:15:41 +01:00
Shilei Tian	f6c2984a09	[OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM `ptx71` is not supported in release version of LLVM yet. As a result, the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned in D97004. Since the support in D97004 is just a WA for releease, and we'll not use it in the near future, using `ptx70` for CUDA 11 is feasible. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97195	2021-02-23 13:20:21 -05:00
Shilei Tian	309b00a42e	[OpenMP][NFC] clang-format the whole openmp project Same script as D95318. Test files are excluded. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D97088	2021-02-20 12:46:32 -05:00
Joel E. Denny	ef8b3b5ffd	[OpenMP] Fix nvptx CUDA_VERSION conversion As mentioned in PR#49250, without this patch, ptxas for CUDA 9.1 fails in the following two tests: - openmp/libomptarget/test/mapping/lambda_mapping.cpp - openmp/libomptarget/test/offloading/bug49021.cpp The error looks like: ``` ptxas /tmp/lambda_mapping-081ea9.s, line 828; error : Not a name of any known instruction: 'activemask' ``` The problem is that our cmake script converts CUDA version strings incorrectly: 9.1 becomes 9100, but it should be 9010, as shown in `getCudaVersion` in `clang/lib/Driver/ToolChains/Cuda.cpp`. Thus, `openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu` inadvertently enables `activemask` because it apparently becomes available in 9.2. This patch fixes the conversion. This patch does not fix the other two tests in PR#49250. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97012	2021-02-19 11:09:26 -05:00
Joel E. Denny	d2147b1a87	[OpenMP] Fix always,from and delete for data absent at exit Without this patch, there's a runtime error for those map types at exit from an "omp target data" or at "omp target exit data", but the spec says the list item should be ignored. This patch tests that fix in data_absent_at_exit.c, and it also improves other testing for data that is not fully present at exit. Reviewed By: grokos, RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D96999	2021-02-19 11:09:26 -05:00
Ron Lieberman	30c0d5b4c3	[OPENMP][AMDGCN] Improvements to print_kernel_trace (bit mask) allow bit masking to select various trace features. bit 0 => Launch tracing (stderr) bit 1 => timing of runtime (stdout) bit 2 => detailed launch tracing (stderr) bit 3 => timing goes to stdout instead of stderr example: LIBOMPTARGET_KERNEL_TRACE=7 does it all LIBOMPTARGET_KERNEL_TRACE=5 Launch + details LIBOMPTARGET_KERNEL_TRACE=2 timings + launch to stderr LIBOMPTARGET_KERNEL_TRACE=10 timings + launch to stdout Differential Revision: https://reviews.llvm.org/D96998	2021-02-19 06:47:22 -05:00
Shilei Tian	89827fd404	[OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1 CUDA 11.2 and CUDA 11.1 are all available now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97004	2021-02-18 21:04:39 -05:00
Jon Chesterfield	53d7fd3762	[libomptarget][amdgcn] Remove lookup of .language msgpack field	2021-02-17 23:02:16 +00:00
Alexey Bataev	60d71a286b	[OPENMP50]Allow overlapping mapping in target constructs. OpenMP 5.0 removed a lot of restriction for overlapped mapped items comparing to OpenMP 4.5. Patch restricts the checks for overlapped data mappings only for OpenMP 4.5 and less and reorders mapping of the arguments so, that present and alloc mappings are processed first and then all others. Differential Revision: https://reviews.llvm.org/D86119	2021-02-16 14:42:08 -08:00
Johannes Doerfert	2518cc65d2	[OpenMP][FIX] Avoid use of stack allocations in asynchronous calls As reported by Guilherme Valarini [0], we used to pass stack allocations to calls that can nowadays be asynchronous. This is arguably a problem and it will inevitably result in UB. To remedy the situation we allocate the locations as part of the AsyncInfoTy object. The lifetime of that object matches what we need for now. If the synchronization is not tied to the AsyncInfoTy object anymore we might need to have a different buffer construct in global space. This should be back-ported to LLVM 12 but needs slight modifications as it is based on refactoring patches we do not need to backport. [0] https://lists.llvm.org/pipermail/openmp-dev/2021-February/003867.html Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D96667	2021-02-16 15:38:11 -06:00
Johannes Doerfert	758b849931	[OpenMP] Unify omptarget API and usage wrt. `__tgt_async_info` This patch unifies our libomptarget API in two ways: - always pass a `__tgt_async_info` object, the Queue member decides if it is in use or not. - (almost) always synchronize in the interface layer and not in the omptarget layer. A side effect is that we now put all constructor and static initializer kernels in a stream too, if the device utilizes `__tgt_async_info`. The patch contains a TODO which can be addressed as we add support for asynchronous malloc and free in the plugin API. This is the only `synchronizeAsyncInfo` left in the omptarget layer. Site note: On a V100 system the GridMini performance for small sizes more than doubled. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96379	2021-02-16 15:38:06 -06:00
Johannes Doerfert	a2fc0d34db	[OpenMP] Move synchronization into `__tgt_async_info` The AsyncInfo should be passed everywhere and it should offer a way to ensure synchronization, given a libomptarget Device. This replaces D96431. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96438	2021-02-16 15:38:01 -06:00
Johannes Doerfert	942728763b	[OpenMP][NFC] Unify `target` API with other by passing a `__tgt_async_info` pointer Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96430	2021-02-16 15:37:56 -06:00
Johannes Doerfert	44f3022cdf	[OpenMP][NFC] Pass a DeviceTy, not the device number to `target` This unifies the API of `target` relative to `targetUpdateData` and such. Reviewed By: tianshilei1992, grokos Differential Revision: https://reviews.llvm.org/D96429	2021-02-16 15:37:51 -06:00
Johannes Doerfert	ea9395716e	[OpenMP][NFC] Clang format the libomptarget plugins Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96445	2021-02-16 15:37:46 -06:00
Johannes Doerfert	ad94fce845	[OpenMP][NFC] Eliminate sign comparison warning via explicit casts Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96812	2021-02-16 15:37:41 -06:00
Johannes Doerfert	9cd1e2228c	[OpenMP][NFC] Clang format libomptarget code (src & include) The struct and enum alignments are kept by disabling clang-format for that code region. Reviewed By: tianshilei1992, JonChesterfield, grokos Differential Revision: https://reviews.llvm.org/D96428	2021-02-16 15:37:35 -06:00
Jon Chesterfield	6f04addc8b	[libomptarget][amdgcn] Build amdgcn devicertl as openmp [libomptarget][amdgcn] Build amdgcn devicertl as openmp Change cmake to build as openmp and fix up some minor errors in the code. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D96533	2021-02-12 09:51:21 +00:00
Jon Chesterfield	56c446a878	[libomptarget][amdgcn] Tolerate deadstripped device_state variable [libomptarget][amdgcn] Tolerate deadstripped device_state variable The device_state variable may have been deadstripped. Similar to device_environment, leave detection of missing but used symbol to loader. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D96330	2021-02-09 16:29:53 +00:00
Jon Chesterfield	4756f76bce	[libomptarget][amdgcn] Tolerate deadstripped env variable [libomptarget][amdgcn] Tolerate deadstripped env variable Discovered by Pushpinder. If the device_environment variable is unused it can be deadstripped, in which case we should not abort due to it missing. This change is safe in that a missing symbol which is actually used can be reported by both linker and loader, and a missing unused symbol is better deadstripped than left in the image. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D96329	2021-02-09 11:58:37 +00:00
Jon Chesterfield	2fa4186d4e	[libomptarget][amdgcn] Fix language linkage post D95300, drop use of assert	2021-02-08 20:07:51 +00:00
Shilei Tian	b68a6b09e6	[OpenMP][libomptarget] Fixed an issue that device sync is skipped if the kernel doesn't have any argument Currently if there is not kernel argument, device synchronization will be skipped. This can lead to two issues: 1. If there is any device error, it will not be captured; 2. The target region might end before the kernel is done, which is not spec conformant. The test added in this patch only runs on NVPTX platform, although it will not be executed by Phab at all. It also requires `not` which is not available on most systems. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D96067	2021-02-04 20:14:24 -05:00
Shilei Tian	567b3f8841	[OpenMP][deviceRTLs] Drop `assert` in common parts of `deviceRTLs` The header `assert.h` needs to be included in order to use `assert` in the code. When building NVPTX `deviceRTLs` on a CUDA free system, it requires headers from `gcc-multilib`, which some systems don't have. This patch drops the use of `assert` in common parts of `deviceRTLs`. In light of `openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h`, a code block ``` if (!cond) __builtin_trap(); ``` is being used. The builtin will be translated to `call void @llvm.trap()`, and the corresponding PTX is `trap;`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95986	2021-02-04 12:39:43 -05:00
Shilei Tian	0f0ce3c12e	[OpenMP][NVPTX] Take functions in `deviceRTLs` as `convergent` OpenMP device compiler (similar to other SPMD compilers) assumes that functions are convergent by default to avoid invalid transformations, such as the bug (https://bugs.llvm.org/show_bug.cgi?id=49021). Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95971	2021-02-03 20:58:12 -05:00
Atmn Patel	b545667d0a	[OpenMP][Libomptarget] Remove possible harmful copy constructor call for RTLsTy From https://bugs.llvm.org/show_bug.cgi?id=48973, we know that `std::call_once(PM->RTLs.initFlag, &RTLsTy::LoadRTLs, PM->RTLs)` causes compile time problems in libstdc++v3 5.3.1. This is because there was a defect in the standard regarding the `call_once` (LWG 2442). This was fixed in libstdc++ soon thereafter, but there are likely other standard libraries where this will fail. By matching this function call with the other one, we fix this bug. Differential Revision: https://reviews.llvm.org/D95769	2021-02-01 20:13:03 -05:00
Joseph Huber	fda4853998	[OpenMP] Fix seg fault in libomptarget when using Info with multiple threads Summary: One option for the LIBOMPTARGET_INFO environment variable is to print the current status of the device's data mappings. These are a shared resource among threads so this needs to be protected when using multiple streams. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95786	2021-02-01 11:21:57 -05:00
Shilei Tian	26d38f6d20	[OpenMP][NVPTX] Refined CMake logic to choose compute capabilites This patch refines the logic to choose compute capabilites via the environment variable `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES`. It supports the following values (all case insensitive): - "all": Build `deviceRTLs` for all supported compute capabilites; - "auto": Only build for the compute capability auto detected. Note that this requires CUDA. If CUDA is not found, a CMake fatal error will be raised. - "xx,yy" or "xx;yy": Build for compute capabilities `xx` and `yy`. If `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES` is not set, it is equivalent to set it to `all`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95687	2021-01-30 15:14:48 -05:00
Shilei Tian	1b19c42302	[OpenMP][deviceRTLs] Separate declaration of target dependent functions from `target_impl.h` This patch created a new header file `target_interface.h` for declarations of all target dependent functions. All future targets can get things work by simply implementing all functions declared in the header and macros/data same as each `target_impl.h`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95300	2021-01-28 08:14:33 -05:00
Shilei Tian	5a64794bba	[OpenMP][NVPTX] Added the missing -O1 when building NVPTX bitcode libraries In the past `-O1` was used when building NVPTX bitcode libraries. After we switched to OpenMP, `-O1` was missing by mistake, leading to a huge performance regression. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95545	2021-01-28 08:13:38 -05:00
Shilei Tian	19248d30e4	[OpenMP][deviceRTLs] Added `[[clang::loader_uninitialized]]` explicitly `[[clang::loader_uninitialized]]` is in macro `SHARED` but it doesn't work for array like `parallelLevel`, so the variable will be zero initialized. There is also a similar issue for `omptarget_nvptx_device_State` which is in global address space. Its c'tor is also generated, which was not in the past when building the `deviceRTLs` with CUDA. In this patch, we added the attribute to the two variables explicitly. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95550	2021-01-28 08:12:49 -05:00
Vyacheslav Zakharin	0fc90873b2	[libomptarget][NFC] Link plugins with threads support library due to std::call_once usage. Differential Revision: https://reviews.llvm.org/D95572	2021-01-27 19:26:18 -08:00
Atmn Patel	8a77056256	[OpenMP][Libomptarget] Fix conditional in CMake for remote plugin The remote offloading plugin's CMakeLists was trying to build if its flag was enabled even if it didn't find gRPC/protobuf. The conditional was wrong, it's fixed by this. Differential Revision: https://reviews.llvm.org/D95574	2021-01-27 21:28:25 -05:00
Shilei Tian	fb12df4a8e	[OpenMP][NVPTX] Disable building NVPTX deviceRTL by default on a non-CUDA system D95466 dropped CUDA to build NVPTX deviceRTL and enabled it by default. However, the building requires some libraries that are not available on non-CUDA system by default, which could break the compilation. This patch disabled the build by default. It can be enabled with `LIBOMPTARGET_BUILD_NVPTX_BCLIB=ON`. Reviewed By: kparzysz Differential Revision: https://reviews.llvm.org/D95556	2021-01-27 17:06:14 -05:00
Giorgis Georgakoudis	1e59c1a898	[OpenMP][Libomptarget] Fix check-libomptarget The check-libomptarget fails when building with LLVM_ENABLE_PROJECTS. This is because test configuration misses the path to libomp.so and libLLVMSupport.so when time profiling is enabled (both libraries have the same path when building). This patch add the path to the configuration. Reviewed By: vzakhari Differential Revision: https://reviews.llvm.org/D95376	2021-01-27 06:46:40 -08:00
Shilei Tian	e7535f8fed	[OpenMP][NVPTX] Drop dependence on CUDA to build NVPTX `deviceRTLs` With D94745, we no longer use CUDA SDK to compile `deviceRTLs`. Therefore, many CMake code in the project is useless. This patch cleans up unnecessary code and also drops the requirement to build NVPTX `deviceRTLs`. CUDA detection is still being used however to determine whether we need to involve the tests. Auto detection of compute capability is enabled by default and can be disabled by setting CMake variable `LIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY=OFF`. If auto detection is enabled, and CUDA is also valid, it will only build the bitcode library for the detected version; otherwise, all variants supported will be generated. One drawback of this patch is, we now generate 96 variants of bitcode library, and totally 1485 files to be built with a clean build on a non-CUDA system. `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=""` can be used to disable building NVPTX `deviceRTLs`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95466	2021-01-26 20:21:36 -05:00
Jon Chesterfield	653655040f	[libomptarget][cuda] Handle missing _v2 symbols gracefully [libomptarget][cuda] Handle missing _v2 symbols gracefully Follow on from D95367. Dlsym the _v2 symbols if present, otherwise use the unsuffixed version. Builds a hashtable for the check, can revise for zero heap allocations later if necessary. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95415	2021-01-27 00:22:29 +00:00
Vyacheslav Zakharin	3caa2d3354	[libomptarget][NFC] Avoid gcc 5/6 issue with lambda captures. Differential Revision: https://reviews.llvm.org/D95486	2021-01-26 16:06:58 -08:00
Vyacheslav Zakharin	5f1d4d4779	[libomptarget][NFC] Use portable printf format specifiers. Differential Revision: https://reviews.llvm.org/D95476	2021-01-26 13:56:25 -08:00
Atmn Patel	810572cc96	[OpenMP][Libomptarget] Fix cmake error on remote plugin Requiring 3.15 causes a build breakage, I'm sure none of the contents actually require 3.15 or above. Differential Revision: https://reviews.llvm.org/D95474	2021-01-26 16:00:40 -05:00
Jon Chesterfield	7baff00eee	[libomptarget][cuda] Gracefully handle missing cuda library [libomptarget][cuda] Gracefully handle missing cuda library If using dynamic cuda, and it failed to load, it is not safe to call cuGetErrorString. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95412	2021-01-26 20:43:07 +00:00
Jon Chesterfield	fdeffd6fb0	[libomptarget][cuda] Only run tests when sure there is cuda available [libomptarget][cuda] Only run tests when sure there is cuda available Prior to D95155, building the cuda plugin implied cuda was installed locally. With that change, every machine can build a cuda plugin, but they won't all have cuda and/or an nvptx card installed locally. This change enables the nvptx tests when either: - libcuda is present - the user has forced use of the dlopen stub The default case when there is no cuda detected will no longer attempt to run the tests on nvptx hardware, as was the case before D95155. Reviewed By: jdoerfert, ronlieb Differential Revision: https://reviews.llvm.org/D95467	2021-01-26 20:41:06 +00:00
Atmn Patel	ec8f4a38c8	[OpenMP][Libomptarget] Introduce Remote Offloading Plugin This introduces a remote offloading plugin for libomptarget. This implementation relies on gRPC and protobuf, so this library will only build if both libraries are available on the system. The corresponding server is compiled to `openmp-offloading-server`. This is a large change, but the only way to split this up is into RTL/server but I fear that could introduce an inconsistency amongst them. Ideally, tests for this should be added to the current ones that but that is problematic for at least one reason. Given that libomptarget registers plugin on a first-come-first-serve basis, if we wanted to offload onto a local x86 through a different process, then we'd have to either re-order the plugin list in `rtl.cpp` (which is what I did locally for testing) or find a better solution for runtime plugin registration in libomptarget. Differential Revision: https://reviews.llvm.org/D95314	2021-01-26 15:33:38 -05:00
Atmn	683719bc0c	[OpenMP][Libomptarget] Introduce changes to support remote plugin In order to support remote execution, we need to be able to send the target binary description to the remote host for registration (and consequent deregistration). To support this, I added these two optional new functions to the plugin API: - `__tgt_rtl_register_lib` - `__tgt_rtl_unregister_lib` These functions will be called to properly manage the instance of libomptarget running on the remote host. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D93293	2021-01-26 14:19:27 -05:00
Jon Chesterfield	32cc5564e2	[libomptarget][devicertl][amdgpu] Fix build, variable renaming error	2021-01-26 19:05:21 +00:00
Shilei Tian	7c03f7d7d0	[OpenMP][deviceRTLs] Build the deviceRTLs with OpenMP instead of target dependent language From this patch (plus some landed patches), `deviceRTLs` is taken as a regular OpenMP program with just `declare target` regions. In this way, ideally, `deviceRTLs` can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics. Here're a list of changes in this patch. 1. For NVPTX, `DEVICE` is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove `DEVICE` or probably some other macros. 2. Shared variable is implemented with OpenMP allocator, which is defined in `allocator.h`. Again, this feature is not available on AMDGCN, so two macros are redefined properly. 3. CUDA header `cuda.h` is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation. 4. Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as `libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc`, such as `libomptarget-nvptx-cuda_80-sm_20.bc`. With this change, there are also multiple features to be expected in the near future: 1. CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version. 2. Atomic operations used in `deviceRTLs` can be replaced by `omp atomic` if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong. 3. Target specific parts will be wrapped into `declare variant` with `isa` selector if it can work properly. No target specific macro is needed anymore. 4. (Maybe more...) Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94745	2021-01-26 12:28:47 -05:00
George Rokos	94cf89d1c2	[libomptarget][NFC] Fixed obsolete function names in comments	2021-01-26 07:39:42 -08:00
Alexey Bataev	4a63e53373	[LIBOMPTARGET]FIX define declaration, NFC Fixed declaration of define by adding a comma symbol. Required to fix build without profiling.	2021-01-26 07:43:31 -05:00
Johannes Doerfert	8c7fdc4c61	[OpenMP] Add source location information to the libomptarget profile In much of the libomptarget interface we have an ident_t object now, if it is not null we can use it to improve the profile output. For now, we simply use the ident_t "source information string" as generated by the FE. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95282	2021-01-25 22:43:43 -06:00

1 2 3 4 5 ...

466 Commits