llvm-project

Commit Graph

Author	SHA1	Message	Date
Joseph Huber	16064e71e9	[OpenMP] Reset async stream properly upon failure Summary: If the call to `synchronize` fails, it will currently block the stream indefinitely if execution is continued from this point. Additionally, if the program exits it will trigger an assertion on the non-null value of the async queue and prevent the runtime from printing debugging information. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D99443	2021-03-26 19:05:06 -04:00
Hansang Bae	467f39249d	[OpenMP] Misc. changes that add or remove pointer/bound checks -- Added or moved checks to appropriate places. -- Removed ineffective null check where the pointer is already being dereferenced around the code. -- Initialized variables that can be used without definitions. -- Added call to dlclose/FreeLibrary in OMPT tool activation. -- Added a new build compiler definition. Differential Revision: https://reviews.llvm.org/D98584	2021-03-23 18:55:08 -05:00
Shilei Tian	2df65f87c1	[OpenMP] Fixed a crash in hidden helper thread It is reported that after enabling hidden helper thread, the program can hit the assertion `new_gtid < __kmp_threads_capacity` sometimes. The root cause is explained as follows. Let's say the default `__kmp_threads_capacity` is `N`. If hidden helper thread is enabled, `__kmp_threads_capacity` will be offset to `N+8` by default. If the number of threads we need exceeds `N+8`, e.g. via `num_threads` clause, we need to expand `__kmp_threads`. In `__kmp_expand_threads`, the expansion starts from `__kmp_threads_capacity`, and repeatedly doubling it until the new capacity meets the requirement. Let's assume the new requirement is `Y`. If `Y` happens to meet the constraint `(N+8)2^X=Y` where `X` is the number of iterations, the new capacity is not enough because we have 8 slots for hidden helper threads. Here is an example. ``` #include <vector> int main(int argc, char argv[]) { constexpr const size_t N = 1344; std::vector<int> data(N); #pragma omp parallel for for (unsigned i = 0; i < N; ++i) { data[i] = i; } #pragma omp parallel for num_threads(N) for (unsigned i = 0; i < N; ++i) { data[i] += i; } return 0; } ``` My CPU is 20C40T, then `__kmp_threads_capacity` is 160. After offset, `__kmp_threads_capacity` becomes 168. `1344 = (160+8)*2^3`, then the assertions hit. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D98838	2021-03-18 18:25:36 -04:00
Jon Chesterfield	626a31de15	[libomptarget] Add register usage info to kernel metadata Add register usage information to the runtime metadata so that it can be used during kernel launch (that change will be in a different commit). Add this information to the kernel trace. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D98829	2021-03-18 17:00:42 +00:00
Jon Chesterfield	dbf8f2b089	Revert "[libomptarget] Build amdgcn devicertl by default" This reverts commit `e23f3502d9`. It broke the build of openmp for clang built without amdgcn support. D98746, under review, would allow this to reland.	2021-03-17 11:34:44 +00:00
Hansang Bae	a6f9cb6adc	[OpenMP] Add runtime interface for OpenMP 5.1 error directive The proposed new interface is for supporting `at(execution)` clause in the error directive. Differential Revision: https://reviews.llvm.org/D98448	2021-03-16 08:55:25 -05:00
Johannes Doerfert	0a954a528b	[OpenMP][FIX] Repair accidental replacement of _shfl_sync with _shfl This was broken accidentally in D95752. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D98677	2021-03-15 22:46:00 -05:00
Jon Chesterfield	e23f3502d9	[libomptarget] Build amdgcn devicertl by default [libomptarget] Build amdgcn devicertl by default The cmake for this looks for an llvm install and does the right thing when building as part of enable_runtimes. It will probably do the right thing in other settings - at least, it won't try to build this with gcc. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98658	2021-03-15 23:17:50 +00:00
Peyton, Jonathan L	7085f04573	[OpenMP] Remove unused cpu_stackoffset member	2021-03-15 16:52:04 -05:00
Jon Chesterfield	bb38d7ff05	[libomptarget][nfc][amdgcn] Use precise triple for devicertl build	2021-03-15 20:24:13 +00:00
Jon Chesterfield	d0bc85f04a	[libomptarget][nfc] Drop unused DEVICE macro [libomptarget][nfc] Drop unused DEVICE macro Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98655	2021-03-15 20:12:50 +00:00
Jon Chesterfield	7da76aaaf4	[libomptarget] Build amdgpu plugin by default [libomptarget] Build amdgpu plugin by default This will build the amdgpu plugin if cmake is able to find the hsa runtime library, which will be the case if rocm is installed or if the hsa library has been installed somewhere cmake looks. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98654	2021-03-15 20:12:01 +00:00
Jon Chesterfield	bcb3f0f867	[libomptarget] Fix devicertl build [libomptarget] Fix devicertl build The target specific functions in target_interface are extern C, but the implementations for nvptx were mostly C++ mangling. That worked out as a quirk of DEVICE macro expanding to nothing, except for shuffle.h which only forward declared the functions with C++ linkage. Also implements GetWarpSize, as used by shuffle, and includes target_interface in nvptx target_impl.cu to help catch future divergence between interface and implementation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98651	2021-03-15 19:50:22 +00:00
Jon Chesterfield	f675b3df48	[libomptarget] Drop assert.h, use freestanding for amdgcn devicertl [libomptarget] Drop assert.h, use freestanding for amdgcn devicertl Promotes the runtime assert to a link time error for the unimplemented fallback functions. Enables amdgcn to build with only clang provided headers, which makes it less likely to break other builds when enabled. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98649	2021-03-15 18:50:09 +00:00
Jon Chesterfield	156842937f	[libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding [libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding The glibc headers are a periodic source of problems compiling the devicertl. This patch resolves the following error run into while building llvm on a slightly different linux system. ``` In file included from .../lib/clang/13.0.0/include/inttypes.h:21: In file included from /usr/include/inttypes.h:25: /usr/include/features.h:461:12: fatal error: 'sys/cdefs.h' file not found # include <sys/cdefs.h> ^~~~~~~~~~~~~ ``` As a second patch, removing assert.h from shuffle will let amdgcn build as -ffreestanding, at which point only the headers that clang itself provides are used and interactions with the host glibc are eliminated. Doing the same for nvptx is complicated by printf handling but also seems worthwhile. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98565	2021-03-15 16:54:58 +00:00
George Rokos	2468fdd9af	[libomptarget] Add allocator support for target memory This patch adds the infrastructure for allocator support for target memory. Three allocators are introduced for device, host and shared memory. The corresponding API functions have the llvm_ prefix temporarily, until they become part of the OpenMP standard. Differential Revision: https://reviews.llvm.org/D97883	2021-03-13 03:47:07 -08:00
Johannes Doerfert	5449fbb5d4	[OpenMP][NFC] Use `AsyncInfo` as the variable name for a `__tgt_async_info` Reviewed By: grokos, tianshilei1992 Differential Revision: https://reviews.llvm.org/D96444	2021-03-11 23:31:34 -06:00
Johannes Doerfert	66ba494b49	[OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant The shuffle idiom is differently implemented in our supported targets. To reduce the "target_impl" file we now move the shuffle idiom in it's own self-contained header that provides the implementation for AMDGPU and NVPTX. A fallback can be added later on. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95752	2021-03-11 23:31:30 -06:00
Joseph Huber	807466ef28	[OpenMP] Restore backwards compatibility for libomptarget Summary: The changes introduced in D87946 changed the API for libomptarget functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x but was not given a backward-compatible interface. This change will require people using Clang 13.x or 12.x to recompile their offloading programs. Reviewed By: jdoerfert cchen Differential Revision: https://reviews.llvm.org/D98358	2021-03-11 09:52:11 -05:00
Leonard Chan	baf637dcde	Rename top-level LICENSE.txt files to LICENSE.TXT This makes all the license filenames uniform across subprojects. Differential Revision: https://reviews.llvm.org/D98380	2021-03-10 21:26:24 -08:00
AndreyChurbanov	aaf16b80dd	[OpenMP] libomp: eliminate pause from atomic CAS loops For clang this change is NFC cleanup, because clang never calls atomic functions from runtime library. Basically, pause is good in spin-loops waiting for something. Atomic CAS loops do not wait for anything, each CAS failure means some other thread progressed. Performance experiments show that the pause only causes unnecessary slowdown on CPUs with slow pause instruction, no difference on CPUs with fast pause instruction, removal of the pause gives lesser binary size which is good. Differential Revision: https://reviews.llvm.org/D97079	2021-03-09 18:30:08 +03:00
AndreyChurbanov	e4492b6f31	[OpenMP] NFC: temporarily disable assertion until the bug with dependences is fixed	2021-03-08 22:18:30 +03:00
Shilei Tian	c41ae246ac	[OpenMP][Clang][NVPTX] Only build one bitcode library for each SM In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on NVPTX target. We don't need to have macros in source code to select right functions based on CUDA version. we don't need to compile multiple bitcode libraries of different CUDA versions for each SM. We don't need to worry about future compatibility with newer CUDA version. `-target-feature +ptx61` is used in this patch, which corresponds to the highest PTX version that CUDA 9.2 can support. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97198	2021-03-08 12:03:04 -05:00
Peyton, Jonathan L	e2738b3758	[OpenMP] Fix potential integer overflow in dynamic schedule code Restrict the chunk_size * chunk_num to only occur for valid chunk_nums and reimplement calculating the limit to avoid overflow. Differential Revision: https://reviews.llvm.org/D96747	2021-03-08 09:43:05 -06:00
tlwilmar	97d000cfc6	Added API for "masked" construct via two entrypoints: __kmpc_masked, and __kmpc_end_masked. The "master" construct is deprecated. Changed proc-bind keyword from "master" to "primary". Use of both master construct and master as proc-bind keyword is still allowed, but deprecated. Remove references to "master" in comments and strings, and replace with "primary" or "primary thread". Function names and variables were not touched, nor were references to deprecated master construct. These can be updated over time. No new code should refer to master.	2021-03-05 09:29:57 -06:00
Joel E. Denny	d0eb25a643	[OpenMP] Encapsulate more in checkDeviceAndCtors This patch just encapsulates some repeated code. To do so, it relocates some functions from interface.cpp to omptarget.cpp. It also adjusts them to the LLVM coding style. This patch is almost NFC except some `DP` messages are a bit different. For example, messages like "Entering target region" are now emitted even if offload is disabled, but a subsequent "Offload is disabled" is then emitted. Reviewed By: jdoerfert, grokos Differential Revision: https://reviews.llvm.org/D97908	2021-03-04 12:03:42 -05:00
Joel E. Denny	bfe5452b93	[OpenMP] Fix lone target exit data Without this patch, an `omp target exit data` before the runtime is initialized produces a runtime error. This patch fixes that by changing `__tgt_target_data_end_mapper` to call `CheckDeviceAndCtors` like many other runtime routines. Discussed at <https://lists.llvm.org/pipermail/openmp-dev/2021-March/003920.html>. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D97907	2021-03-04 12:03:42 -05:00
Joel E. Denny	10c18c69f2	[OpenMP] Fix support for device as host Without this patch, when the offload device is set to `omp_get_initial_device()`, the runtime fails with an error diagnostic when entering target regions or target data regions. However, OpenMP 5.1, sec. 2.14.5 "target Construct", "Restrictions", p. 203, L3-5 states: > The device clause expression must evaluate to a non-negative integer > value that is less than or equal to the value of > omp_get_num_devices(). Sec. 3.7.7 "omp_get_initial_device", p. 412, L2-3 states: > The value of the device number is the value returned by the > omp_get_num_devices routine. Similarly, OpenMP 5.0, sec. 2.12.5 "target Construct", "Restrictions", p. 174 L30-32 states: > The device clause expression must evaluate to a non-negative integer > value less than the value of omp_get_num_devices() or to the value > of omp_get_initial_device(). This patch fixes this behavior by changing the runtime to behave as if offloading is disabled whenever it finds the offload device (either from a `device` clause or the default device) is set to the host device. In the case of mandatory offloading when `omp_get_num_devices() == 0`, it incorporates the behavior proposed for OpenMP 5.2 in OpenMP spec github issue 2669. Reviewed By: grokos, RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D97616	2021-03-04 12:03:42 -05:00
Hansang Bae	b6c2f538b2	[OpenMP] Add allocator support for target memory This is a preview of allocator support for target memory that depends on the offload runtime API which allocates memory as described below. llvm_omp_target_alloc_host(size_t size, int device_num); -- Returns non-migratable memory owned by host. -- Memory is accessible by host and device(s). llvm_omp_target_alloc_shared(size_t size, int device_num); -- Returns migratable memory owned by host and device. -- Memory is accessible by host and device. llvm_omp_target_alloc_device(size_t size, int device_num); -- Returns memory owned by device. -- Memory is only accessible by device. New memory space and predefined allocator names are -- llvm_omp_target_host_mem_space -- llvm_omp_target_shared_mem_space -- llvm_omp_target_device_mem_space -- llvm_omp_target_host_mem_alloc -- llvm_omp_target_shared_mem_alloc -- llvm_omp_target_device_mem_alloc Differential Revision: https://reviews.llvm.org/D96669	2021-03-02 16:45:12 -06:00
Alexey Bataev	0caf736d7e	[OPENMP50]Mapping of the subcomponents with the 'default' mappers. If the mapped structure has data members, which have 'default' mappers, need to map these members individually using their 'default' mappers. Differential Revision: https://reviews.llvm.org/D92195	2021-03-02 07:11:06 -08:00
Peyton, Jonathan L	e83380fccc	[OpenMP] Fix clang-cl build error regarding TSX intrinsics Fix for https://bugs.llvm.org/show_bug.cgi?id=49339 The CMake check for the RTM intrinsics needs the -mrtm flag to be set during the test. This way clang-cl correctly detects it has the _xbegin() intrinsic. Otherwise, the CMake check fails. Differential Revision: https://reviews.llvm.org/D97413	2021-03-02 07:47:42 -06:00
AndreyChurbanov	1df6e58e55	[OpenMP] libomp minor cleanup Cleanup changes: - check value read from file; - remove dead code; - make unsigned variable to read hexadecimal number to; - add debug assertion to check ref count. Differential Revision: https://reviews.llvm.org/D96893	2021-02-26 00:44:51 +03:00
AndreyChurbanov	4932101177	[OpenMP] libomp: fix ittnotify stack stitching for teams construct Stitching id could be overridden causing reference of destroyed object when number of teams is 1. The patch separates stitching id store location for teams and parallel nested in teams. Differential Revision: https://reviews.llvm.org/D96562	2021-02-26 00:23:24 +03:00
Peyton, Jonathan L	d12ae7db99	[OpenMP] Fix accidental addition of use omp_lib_kinds Fortran header accidentally had use omp_lib_kinds added inside a subroutine and function. This patch removes the lines.	2021-02-25 12:49:56 -06:00
Harmen Stoppels	a54f160b3a	Prefer /usr/bin/env xxx over /usr/bin/xxx where xxx = perl, python, awk Allow users to use a non-system version of perl, python and awk, which is useful in certain package managers. Reviewed By: JDevlieghere, MaskRay Differential Revision: https://reviews.llvm.org/D95119	2021-02-25 11:32:27 +01:00
Vyacheslav Zakharin	6baeeb9efa	[libomptarget] Fixed MSVC build fail caused by __attribute__((used)). Differential Revision: https://reviews.llvm.org/D97348	2021-02-24 09:59:39 -08:00
Joachim Protze	2fbce374c8	[OpenMP][Tests][NFC] rename macro to avoid naming clash Rename a macro use missed in `e0f3acc5d3`	2021-02-24 18:46:56 +01:00
Shilei Tian	e5da63d5a9	[OpenMP] Fixed a crash when offloading to x86_64 with target nowait PR#49334 reports a crash when offloading to x86_64 with `target nowait`, which is caused by referencing a nullptr. The root cause of the issue is, when pushing a hidden helper task in `__kmp_push_task`, it also maps the gtid to its shadow gtid, which is wrong. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97329	2021-02-24 12:37:30 -05:00
Joachim Protze	f3a72509a7	[OpenMP][Tests][NFC] lit might also be known as llvm-lit.py	2021-02-24 18:32:24 +01:00
Manoel Roemmer	542d9c2154	[libomptarget] Load images in order of registration This makes sure that images are loaded in the order in which they are registered with libomptarget. If a target can load multiple images and these images depend on each other (for example if one image contains the programs target regions and one image contains library code), then the order in which images are loaded can be important for symbol resolution (for example, in the VE plugin). In this case: because the same code exist in the host binaries, the order in which the host linker loads them (which is also the order in which images are registered with libomptarget) is the order in which the images have to be loaded onto the device. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95530	2021-02-24 18:15:41 +01:00
Joachim Protze	e0f3acc5d3	[OpenMP][Tests][NFC] rename macro to avoid naming clash Rename a macro and macro use missed in `35ab6d6390`	2021-02-24 18:13:28 +01:00
Joachim Protze	35ab6d6390	[OpenMP][Tests][NFC] rename macro to avoid naming clash When including <ostream>, the register_callback macro of the OMPT callback.h clashes with a function defined in ostream. This patch renames the macro and includes ompt into the macro name.	2021-02-24 18:03:54 +01:00
Shilei Tian	f6c2984a09	[OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM `ptx71` is not supported in release version of LLVM yet. As a result, the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned in D97004. Since the support in D97004 is just a WA for releease, and we'll not use it in the near future, using `ptx70` for CUDA 11 is feasible. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97195	2021-02-23 13:20:21 -05:00
Peyton, Jonathan L	56223b1e91	[OpenMP] Help static loop code avoid over/underflow This code alleviates some pathological loop parameters (lower, upper, stride) within calculations involved in the static loop code. It bounds the chunk size to the trip count if it is greater than the trip count and also minimizes problematic code for when trip count < nth. Differential Revision: https://reviews.llvm.org/D96426	2021-02-22 13:22:01 -06:00
Peyton, Jonathan L	1b968467c0	[OpenMP] Remove shutdown attempt on Windows process detach Only attempt shutdown if lpReserved is NULL. The Windows documentation states: When handling DLL_PROCESS_DETACH, a DLL should free resources such as heap memory only if the DLL is being unloaded dynamically (the lpReserved parameter is NULL). If the process is terminating (the lpReserved parameter is non-NULL), all threads in the process except the current thread either have exited already or have been explicitly terminated by a call to the ExitProcess function, which might leave some process resources such as heaps in an inconsistent state. In this case, it is not safe for the DLL to clean up the resources. Instead, the DLL should allow the operating system to reclaim the memory. Differential Revision: https://reviews.llvm.org/D96750	2021-02-22 13:15:06 -06:00
Peyton, Jonathan L	8c73be9d86	[OpenMP] Limit number of dispatch buffers This patch limits the number of dispatch buffers (used for loop worksharing construct) to between 1 and 4096. Differential Revision: https://reviews.llvm.org/D96749	2021-02-22 13:14:28 -06:00
Peyton, Jonathan L	55dff8b2e4	[OpenMP] Update HWLOC code for die level detection Differential Revision: https://reviews.llvm.org/D96748	2021-02-22 13:05:55 -06:00
AndreyChurbanov	1611e5473c	[OpenMP] libomp: cleanup some resource leaks Close mutexattr and condattr local objects to eliminate resource leaks. Differential Revision: https://reviews.llvm.org/D96892	2021-02-20 23:27:37 +03:00
Shilei Tian	309b00a42e	[OpenMP][NFC] clang-format the whole openmp project Same script as D95318. Test files are excluded. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D97088	2021-02-20 12:46:32 -05:00
Joel E. Denny	ef8b3b5ffd	[OpenMP] Fix nvptx CUDA_VERSION conversion As mentioned in PR#49250, without this patch, ptxas for CUDA 9.1 fails in the following two tests: - openmp/libomptarget/test/mapping/lambda_mapping.cpp - openmp/libomptarget/test/offloading/bug49021.cpp The error looks like: ``` ptxas /tmp/lambda_mapping-081ea9.s, line 828; error : Not a name of any known instruction: 'activemask' ``` The problem is that our cmake script converts CUDA version strings incorrectly: 9.1 becomes 9100, but it should be 9010, as shown in `getCudaVersion` in `clang/lib/Driver/ToolChains/Cuda.cpp`. Thus, `openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu` inadvertently enables `activemask` because it apparently becomes available in 9.2. This patch fixes the conversion. This patch does not fix the other two tests in PR#49250. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97012	2021-02-19 11:09:26 -05:00

1 2 3 4 5 ...

1616 Commits