llvm-project

Commit Graph

Author	SHA1	Message	Date
Shilei Tian	e97e0a4fad	[AbstractAttributor] Fold __kmpc_parallel_level if possible Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible. Note that `__kmpc_parallel_level` doesn't take activeness into consideration, based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead of 0, 129, 130, etc. that also indicate activeness. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106154	2021-07-26 22:46:19 -04:00
Shilei Tian	f1b8fa55d0	[OpenMP][NVPTX] Disable OpenMPOpt when building deviceRTLs We build `deviceRTLs` with `-O1` by default, which also triggers OpenMPOpt. When the info cache is created, some attributes are removed. As a result, although we mark a few functions `noinline`, they are still inlined when the bitcode library is generated. This can cause an issue in middle end optimization. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106710	2021-07-25 10:38:27 -04:00
Joseph Huber	e1dedecaa6	[Libomptarget] Add unroll flag to shared variables loop Unrolling this loop provides better performance in practice because it is executed on the device and is likely to be very small. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106692	2021-07-23 16:45:27 -04:00
Johannes Doerfert	d12ee28e2e	[OpenMP] Simplify the ThreadStackTy for globalization fallback With D106496 we can make the globalization fallback stack much simpler and this version doesn't seem to experience the spurious failures and deadlocks we have seen before. Differential Revision: https://reviews.llvm.org/D106576	2021-07-22 23:57:46 -05:00
Jose M Monsalve Diaz	68d6278a6e	[OpenMP] Renaming RT functions `GetNumberOfBlocksInKernel` and `GetNumberOfThreadsInBlock` These functions should follow the camel case convention. These are really easy to change and are needed for D106033. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D106390	2021-07-22 18:17:49 -04:00
Joseph Huber	4a66860424	[OpenMP] Add an option to disable function internalization Function internalization can sometimes occur in situations where we want to keep the call sites intact. This patch adds an option to disable function internalization and prevents the device runtime from being internalized while creating the bitcode library. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106438	2021-07-21 21:18:18 -04:00
Joseph Huber	1684012a47	[Libomptarget] Introduce new main thread ID runtime function This patch introduces `__kmpc_is_generic_main_thread_id` which splits the old comparison into its own runtime function. The purpose of this is so we can fold this part independently, so when both this and `is_spmd_mode` are folded the final function will be folded as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106437	2021-07-21 21:18:14 -04:00
Joseph Huber	754eb1c210	[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. This makes it easier to implement the allocator in the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106496	2021-07-21 20:56:21 -04:00
Giorgis Georgakoudis	5a682d9b91	[OpenMP] Expose libomptarget function to get HW thread id The patch exposes the libomptarget runtime function that gets the hardware thread id through the kmpc API. This is to be used in SPMDization for checking the thread id to execute regions by a single thread in a block. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106323	2021-07-21 10:26:04 -07:00
Shilei Tian	55c65884a4	[OpenMP][deviceRTLs] Update return type of function __kmpc_parallel_level In `deviceRTLs`, the parallel level is stored in a shared variable of type `uint8_t`. `__kmpc_parallel_level` currently returns a 16-bit interger. This patch first changes the return type of the function to `uint8_t`, same as the shared variable, and then corrects function type which was updated in D105955. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106384	2021-07-20 15:45:43 -04:00
Joseph Huber	762badb0ab	[Libomptarget] Remove volatile from NVPTX work function Currently the NPVTX work function is marked volatile. This prevents some optimizations from using this value. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106310	2021-07-19 20:03:25 -04:00
Giorgis Georgakoudis	fb0cf01795	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `e9c7291cb2`. Fix failing tests	2021-07-19 07:54:26 -07:00
Shilei Tian	4357cfc792	[OpenMP][Offloading] Add -g when compiling deviceRTLs in debug mode Currently when we compile the project in debug mode, `-g` will not be added to compilation flag. The bc files generated in different mode are of different size. When using GPU debuggers like `cuda-gdb`, it is expected to provide more info with a debug version of bc lib. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D106229	2021-07-18 09:34:54 -04:00
Giorgis Georgakoudis	e9c7291cb2	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102107	2021-07-16 23:27:44 -07:00
Shilei Tian	97c8f60bba	[NFC][OpenMP][Offloading] Replaced explicit parallel level computation with function `__kmpc_parallel_level` There are two places in current deviceRTLs where it computes parallel level explicitly, which is basically the functionality of `__kmpc_parallel_level`. Starting from D105787, we plan to introduce a series of function call folding based on information that can be deducted during compilation time. Computation of parallel level is the next target. This patch makes steps for the optimization. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105955	2021-07-15 22:21:06 -04:00
Jon Chesterfield	b6b53ffef4	[libomptarget][devicertl] Remove branches around setting parallelLevel Simplifies control flow to allow store/load forwarding This change folds two basic blocks into one, leaving a single store to parallelLevel. This is a step towards spmd kernels with sufficiently aggressive inlining folding the loads from parallelLevel and thus discarding the nested parallel handling when it is unused. Transform: ``` int threadId = GetThreadIdInBlock(); if (threadId == 0) { parallelLevel[0] = expr; } else if (GetLaneId() == 0) { parallelLevel[GetWarpId()] = expr; } // => if (GetLaneId() == 0) { parallelLevel[GetWarpId()] = expr; } // because unsigned GetLaneId() { return GetThreadIdInBlock() & (WARPSIZE - 1);} // so whenever threadId == 0, GetLaneId() is also 0. ``` That replaces a store in two distinct basic blocks with as single store. A more aggressive follow up is possible if the threads in the warp/wave race to write the same value to the same address. This is not done as part of this change. ``` if (GetLaneId() == 0) { parallelLevel[GetWarpId()] = expr; } // => parallelLevel[GetWarpId()] = expr; // because unsigned GetWarpId() { return GetThreadIdInBlock() / WARPSIZE; } // so GetWarpId will index the same element for every thread in the warp // and, because expr is lane-invariant in this case, every lane stores the // same value to this unique address ``` Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D105699	2021-07-13 12:06:57 +01:00
Johannes Doerfert	a7b7b5dfe5	[OpenMP] Create and use `__kmpc_is_generic_main_thread` In order to fold calls based on high-level knowledge and control flow tracking it helps to expose the information as a runtime call. The logic: `!SPMD && getTID() == getMasterTID()` was used in various places and is now encapsulated in `__kmpc_is_generic_main_thread`. As part of this rewrite we replaced eager computation of arguments with on-demand computation, especially helpful if the calls can be folded and arguments don't need to be computed consequently. Differential Revision: https://reviews.llvm.org/D105768	2021-07-11 19:18:03 -05:00
Johannes Doerfert	1ab1f04a2b	[OpenMP] Simplify variable sharing and increase shared memory size In order to avoid malloc/free, up to NUM_SHARED_VARIABLES_IN_SHARED_MEM (=64) variables are communicated in dedicated shared memory instead. The simplification does avoid the need for an "init" and requires "deinit" only if we ever communicate more than NUM_SHARED_VARIABLES_IN_SHARED_MEM variables. Differential Revision: https://reviews.llvm.org/D105767	2021-07-11 19:18:03 -05:00
Johannes Doerfert	0a223827de	[OpenMP] Remove checkXXXX device runtime functions We had multiple functions to determine the execution mode (SPMD/Generic) and runtime status (initialized/uninitialized) but that just increased complexity without a real benefit. Especially with D102307 in mind it is helpful to reduce the dependence on the `ident_t` flags. Differential Revision: https://reviews.llvm.org/D105586	2021-07-10 18:20:40 -05:00
Johannes Doerfert	e2cfbfcc0c	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 17:53:56 -05:00
Nico Weber	d3e7491333	Revert Attributor patch series Broke check-clang, see https://reviews.llvm.org/D102307#2869065 Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`	2021-07-10 16:15:55 -04:00
Johannes Doerfert	e603ca0306	[OpenMP] Remove checkXXXX device runtime functions We had multiple functions to determine the execution mode (SPMD/Generic) and runtime status (initialized/uninitialized) but that just increased complexity without a real benefit. Especially with D102307 in mind it is helpful to reduce the dependence on the `ident_t` flags. Differential Revision: https://reviews.llvm.org/D105586	2021-07-10 12:32:51 -05:00
Johannes Doerfert	1d5711c3ee	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 12:32:50 -05:00
Shilei Tian	24a36ce58b	[OpenMP][Offloading] Replace all calls to `isSPMDMode` with `__kmpc_is_spmd_exec_mode` In our ongoing work, we are using `AbstractAttributor` to deduct execution model of device functions, and potententially remove unnecessary function calls to `__kmpc_is_spmd_exec_mode`. In current device runtime, we have mixed use of `isSPMDMode` and `__kmpc_is_spmd_exec_mode`, but in fact in `__kmpc_is_spmd_exec_mode` it simply calls `isSPMDMode`. Since all functions starting with `__kmpc` is C function, which doesn't have things like name mangling. It is more optimization friendly. In this patch, we simply replaced all calls to `isSPMDMode` with `__kmpc_is_spmd_exec_mode` to pave the way for the optimization. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105211	2021-06-30 15:39:57 -04:00
Jon Chesterfield	f66b8fdc0a	[libomptarget][amdgpu] Build openmp for two more targets [libomptarget][amdgpu] Build openmp for two more targets The 4800U APU is a gfx902 and the MI100 accelerator is a gfx908. Both numbers are listed in ROCT topology.c Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D104922	2021-06-25 19:02:03 +01:00
Joseph Huber	244e98ff48	[Libomptarget] Improve device runtime implementation for globalized variables. Currently the runtime implementation of `__kmpc_alloc_shared` is extremely slow because it allocated memory for each thread individually. This patch adds a small buffer for the threads to share data and will greatly improve performance for builds where all globalization could not be optimized out. If the shared buffer is full, then memory will not only be allocated per-warp rather than per-thread. Depends on D97680 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104666	2021-06-22 11:52:49 -04:00
Joseph Huber	952a0f2385	[Libomptarget] Introduce new globalization runtime calls Summary: This patch introduces the new globalization runtime to be used by D97680. These runtime calls will replace the __kmpc_data_sharing_push_stack and __kmpc_data_sharing_pop_stack functions. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102532	2021-06-22 10:05:42 -04:00
Jon Chesterfield	d54712ab4d	[libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation [libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation There are a lot of different ways we might implement the devicertl local alloc and free functions. Via host, local buffers (stack or arena), specialising per kernel etc. It is not yet clear what the right design is. This change makes the alloc and free functions weak, so one can override them from local tests while comparing options. Not strictly necessary, as a comparable patch can be applied locally each time, but would be convenient for out of tree dev. Plan would be to drop the weak attribute at the same time as introducing a working allocator to trunk. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102499	2021-05-21 16:09:22 +01:00
Jon Chesterfield	10de217209	[libomptarget][amdgpu] Fix truncation error for partial wavefront [libomptarget][amdgpu] Fix truncation error for partial wavefront The partial barrier implementation involves one wavefront resetting and N-1 waiting. This change future proofs against launching with a number of threads that is not a multiple of the wavefront size. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102407	2021-05-13 17:31:57 +01:00
Jon Chesterfield	72995a4bdf	[libomptarget][nfc] Add hook to easily disable building amdgcn bclib [libomptarget][nfc] Add hook to easily disable building amdgcn bclib This is useful when building LLVM with a toolchain that can't emit code for amdgcn, e.g. because it overrides the include search path with headers from another architecture, or the clang compiler is missing builtins. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102229	2021-05-11 17:23:09 +01:00
Vyacheslav Zakharin	f2f88f3e7a	An attempt to abandon omptarget out-of-tree builds. I want to start using LLVM component libraries in libomptarget to stop duplicating implementations already available in LLVM (e.g. LLVMObject, LLVMSupport, etc.). Without relying on LLVM in all libomptarget builds one has to provide fallback implementation for each used LLVM feature. This is an attempt to stop supporting out-of-llvm-tree builds of libomptarget. I understand that I may need to revert this, if this affects downstream projects in a bad way. Differential Revision: https://reviews.llvm.org/D101509	2021-05-07 12:43:50 -07:00
Jon Chesterfield	44ee974e2f	[libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one [libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one D101976 would require a second barrier instance. This NFC to amdgpu makes it simpler to add one (an extra global, one more line in init). Also renames the current barrier to L0. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102016	2021-05-06 23:52:19 +01:00
Michael Kruse	7308862ff5	[OpenMP][CMake] Use in-project clang as CUDA->IR compiler. If available, use the clang that is already built in the same project as CUDA compiler unless another executable is explicitly defined. This also ensures the generated deviceRTL IR will be consistent with the version of Clang. This patch is required to reliably test OpenMP offloading in a buildbot without either a two-stage build (e.g. with LLVM_ENABLE_RUNTIMES) or a separately installed clang on the worker that will eventually become outdated. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101265	2021-04-30 12:45:52 -05:00
Michael Kruse	3244a8b536	[OpenMP][CMake] Pass --cuda-path to regression tests. The OpenMP runtime can be compiled using a CUDA installed at non-default location with the -DCUDA_TOOLKIT_ROOT_DIR setting. However, check-openmp will fail afterwards because Clang needs to know where to find the CUDA headers. Fix by passing -cuda-path to Clang using the value of CUDA_TOOLKIT_ROOT_DIR which has been determined by CMake. Also set LD_LIBRARY_PATH such that it can find the cuda runtime when executing. This will ensure that the regression test do not depend on the current environment, but use the environment it was configured for. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101266	2021-04-27 16:27:40 -05:00
Jon Chesterfield	58f125493d	[libomptarget] Enable AMDGPU devicertl [libomptarget] Enable AMDGPU devicertl The amdgpu devicertl is written in freestanding openmp and compiles to a bitcode library (per listed gfx arch) with no unresolved symbols. It requires a recent clang, preferably the one from the same monorepo checkout. This is D98658, with printf explicitly stubbed out, after patching clang to no longer require an llvm with the amdgpu target enabled. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101213	2021-04-24 02:24:44 +01:00
Johannes Doerfert	17330a3cb1	[OpenMP] Avoid reading uninitialized parallel level values In a last minute change request for `a2dbfb6b72` we introduced a read of the uninitialized parallel level value in SPMD-mode. We go back to initializing the array early and checking for an adjusted level. Found by the miniqmc unit tests: https://cdash.qmcpack.org/CDash/viewTest.php?onlyfailed&buildid=203434 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101123	2021-04-23 11:21:58 -05:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
Jon Chesterfield	dbf8f2b089	Revert "[libomptarget] Build amdgcn devicertl by default" This reverts commit `e23f3502d9`. It broke the build of openmp for clang built without amdgcn support. D98746, under review, would allow this to reland.	2021-03-17 11:34:44 +00:00
Johannes Doerfert	0a954a528b	[OpenMP][FIX] Repair accidental replacement of _shfl_sync with _shfl This was broken accidentally in D95752. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D98677	2021-03-15 22:46:00 -05:00
Jon Chesterfield	e23f3502d9	[libomptarget] Build amdgcn devicertl by default [libomptarget] Build amdgcn devicertl by default The cmake for this looks for an llvm install and does the right thing when building as part of enable_runtimes. It will probably do the right thing in other settings - at least, it won't try to build this with gcc. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98658	2021-03-15 23:17:50 +00:00
Jon Chesterfield	bb38d7ff05	[libomptarget][nfc][amdgcn] Use precise triple for devicertl build	2021-03-15 20:24:13 +00:00
Jon Chesterfield	d0bc85f04a	[libomptarget][nfc] Drop unused DEVICE macro [libomptarget][nfc] Drop unused DEVICE macro Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98655	2021-03-15 20:12:50 +00:00
Jon Chesterfield	bcb3f0f867	[libomptarget] Fix devicertl build [libomptarget] Fix devicertl build The target specific functions in target_interface are extern C, but the implementations for nvptx were mostly C++ mangling. That worked out as a quirk of DEVICE macro expanding to nothing, except for shuffle.h which only forward declared the functions with C++ linkage. Also implements GetWarpSize, as used by shuffle, and includes target_interface in nvptx target_impl.cu to help catch future divergence between interface and implementation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98651	2021-03-15 19:50:22 +00:00
Jon Chesterfield	f675b3df48	[libomptarget] Drop assert.h, use freestanding for amdgcn devicertl [libomptarget] Drop assert.h, use freestanding for amdgcn devicertl Promotes the runtime assert to a link time error for the unimplemented fallback functions. Enables amdgcn to build with only clang provided headers, which makes it less likely to break other builds when enabled. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98649	2021-03-15 18:50:09 +00:00
Jon Chesterfield	156842937f	[libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding [libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding The glibc headers are a periodic source of problems compiling the devicertl. This patch resolves the following error run into while building llvm on a slightly different linux system. ``` In file included from .../lib/clang/13.0.0/include/inttypes.h:21: In file included from /usr/include/inttypes.h:25: /usr/include/features.h:461:12: fatal error: 'sys/cdefs.h' file not found # include <sys/cdefs.h> ^~~~~~~~~~~~~ ``` As a second patch, removing assert.h from shuffle will let amdgcn build as -ffreestanding, at which point only the headers that clang itself provides are used and interactions with the host glibc are eliminated. Doing the same for nvptx is complicated by printf handling but also seems worthwhile. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98565	2021-03-15 16:54:58 +00:00
Johannes Doerfert	66ba494b49	[OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant The shuffle idiom is differently implemented in our supported targets. To reduce the "target_impl" file we now move the shuffle idiom in it's own self-contained header that provides the implementation for AMDGPU and NVPTX. A fallback can be added later on. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95752	2021-03-11 23:31:30 -06:00
Shilei Tian	c41ae246ac	[OpenMP][Clang][NVPTX] Only build one bitcode library for each SM In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on NVPTX target. We don't need to have macros in source code to select right functions based on CUDA version. we don't need to compile multiple bitcode libraries of different CUDA versions for each SM. We don't need to worry about future compatibility with newer CUDA version. `-target-feature +ptx61` is used in this patch, which corresponds to the highest PTX version that CUDA 9.2 can support. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97198	2021-03-08 12:03:04 -05:00
Shilei Tian	f6c2984a09	[OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM `ptx71` is not supported in release version of LLVM yet. As a result, the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned in D97004. Since the support in D97004 is just a WA for releease, and we'll not use it in the near future, using `ptx70` for CUDA 11 is feasible. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97195	2021-02-23 13:20:21 -05:00
Shilei Tian	309b00a42e	[OpenMP][NFC] clang-format the whole openmp project Same script as D95318. Test files are excluded. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D97088	2021-02-20 12:46:32 -05:00
Joel E. Denny	ef8b3b5ffd	[OpenMP] Fix nvptx CUDA_VERSION conversion As mentioned in PR#49250, without this patch, ptxas for CUDA 9.1 fails in the following two tests: - openmp/libomptarget/test/mapping/lambda_mapping.cpp - openmp/libomptarget/test/offloading/bug49021.cpp The error looks like: ``` ptxas /tmp/lambda_mapping-081ea9.s, line 828; error : Not a name of any known instruction: 'activemask' ``` The problem is that our cmake script converts CUDA version strings incorrectly: 9.1 becomes 9100, but it should be 9010, as shown in `getCudaVersion` in `clang/lib/Driver/ToolChains/Cuda.cpp`. Thus, `openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu` inadvertently enables `activemask` because it apparently becomes available in 9.2. This patch fixes the conversion. This patch does not fix the other two tests in PR#49250. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97012	2021-02-19 11:09:26 -05:00

1 2 3 4 5 ...

264 Commits