llvm-project

Commit Graph

Author	SHA1	Message	Date
Nico Weber	d3e7491333	Revert Attributor patch series Broke check-clang, see https://reviews.llvm.org/D102307#2869065 Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`	2021-07-10 16:15:55 -04:00
Johannes Doerfert	e603ca0306	[OpenMP] Remove checkXXXX device runtime functions We had multiple functions to determine the execution mode (SPMD/Generic) and runtime status (initialized/uninitialized) but that just increased complexity without a real benefit. Especially with D102307 in mind it is helpful to reduce the dependence on the `ident_t` flags. Differential Revision: https://reviews.llvm.org/D105586	2021-07-10 12:32:51 -05:00
Johannes Doerfert	1d5711c3ee	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 12:32:50 -05:00
Joel E. Denny	d99f65de2a	[OpenMP] Avoid checking parent reference count in targetDataBegin This patch is an attempt to do for `targetDataBegin` what D104924 does for `targetDataEnd`: * Eliminates a lock/unlock of the data mapping table. * Clarifies the logic that determines whether a struct member's host-to-device transfer occurs. The old logic, which checks the parent struct's reference count, is a leftover from back when we had a different map interface (as pointed out at <https://reviews.llvm.org/D104924#2846972>). Additionally, it eliminates the `DeviceTy::getMapEntryRefCnt`, which is no longer used after this patch. While D104924 does not change the computation of `IsLast`, I found I needed to change the computation of `IsNew` for this patch. As far as I can tell, the change is correct, and this patch does not cause any additional `openmp` tests to fail. However, I'm not sure I've thought of all use cases. Please advise. Reviewed By: jdoerfert, jhuber6, protze.joachim, tianshilei1992, grokos, RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D105121	2021-07-10 12:15:04 -04:00
Joel E. Denny	1d0456361a	[OpenMP] Avoid checking parent reference count in targetDataEnd The patch has the following benefits: * Eliminates a lock/unlock of the data mapping table. * Clarifies the logic that determines whether a struct member's device-to-host transfer occurs. The old logic, which checks the parent struct's reference count, is a leftover from back when we had a different map interface (as pointed out at <https://reviews.llvm.org/D104924#2846972>). Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104924	2021-07-10 12:15:04 -04:00
Alexey Bataev	ab8989ab87	[OPENMP]Fix overlapped mapping for dereferenced pointer members. If the base is used in a map clause and later we have a memberexpr with this base, and the member is a pointer, and this pointer is dereferenced anyhow (subscript, array section, dereference, etc.), such components should be considered as overlapped, otherwise it may lead to incorrect size computations, since we try to map a pointee as a part of the whole struct, which is not true for the pointer members. Differential Revision: https://reviews.llvm.org/D105562	2021-07-09 12:51:26 -07:00
Michał Górny	2b0d95fb58	[openmp] [test] Add missing <limits> include to capacity_nthreads Differential Revision: https://reviews.llvm.org/D105474	2021-07-06 20:39:53 +02:00
Jon Chesterfield	ddfb074a80	[libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global [libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global Folds some duplicates logic into a helper function, passes the new environment struct into getLaunchVals which no longer reads the DeviceInfo global. Implemented on top of D105237 Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D105239	2021-07-06 17:06:38 +01:00
Atmn Patel	21e92612c0	[Libomptarget] Experimental Remote Plugin Fixes D97883 introduced a compile-time error in the experimental remote offloading libomptarget plugin, this patch fixes it and resolves a number of inconsistencies in the plugin as well: 1. Non-functional Asynchronous API 2. Unnecessarily verbose debug printing 3. Misc. code clean ups This is not intended to make any functional changes to the plugin. Differential Revision: https://reviews.llvm.org/D105325	2021-07-02 12:38:34 -04:00
Hansang Bae	f1b9ce2736	[OpenMP] Fix a few issues with hidden helper task This patch includes the following changes to address a few issues when using hidden helper task. - Assertion is triggered when there are inadvertent calls to hidden helper functions on non-Linux OS - Added deinit code in __kmp_internal_end_library function to fix random shutdown crashes - Moved task data access into the lock-guarded region in __kmp_push_task Differential Revision: https://reviews.llvm.org/D105308	2021-07-01 17:10:32 -05:00
Shilei Tian	369216ab31	[OpenMP][Offloading] Refined return value of `DeviceTy::getOrAllocTgtPtr` `DeviceTy::getOrAllocTgtPtr` just returns a target pointer. In addition, two bool values (`IsNew` and `IsHostPtr`) are passed by reference to make the change in the function available in callee. In this patch, a struct, which contains the target pointer, two flags, and an iterator to the map table entry corresponding to the queried host pointer, will be returned. In addition to make the logic clearer regarding the two bool values, this paves the way for the next patch to fix the data race in `bug49334.cpp` by attaching an event to the map table entry (and that's why we need the iterator). Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104382	2021-07-01 12:32:03 -04:00
Jon Chesterfield	db89414da4	[libomptarget][nfc] Move grid size computation Change getLaunchVals to return the integers used for launch Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D105237	2021-07-01 12:53:04 +01:00
Dhruva Chakrabarti	98c36f0079	Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size" This reverts commit `2240b41ee4`. A value of 0 for KernDescVal WG_Size implies it is unknown, so it should be set to the default. The above change was made without this assumption. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105250	2021-06-30 17:15:00 -07:00
Jon Chesterfield	4b0926b044	[libomptarget][nfc] Replace out arguments with struct return A step towards making this function adequately self contained that it can be tested easily. No functional change intended here, left variable names unchanged. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D105229	2021-06-30 22:40:07 +01:00
Jon Chesterfield	d86b0073cf	[libomptarget][amdgpu][nfc] Fix build warnings, drop some headers Removes stdarg header, drops uses of iostream, fix some format string errors. Also changes a C style struct to C++ style to avoid a warning from clang/ Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D104923	2021-06-30 22:23:36 +01:00
Shilei Tian	24a36ce58b	[OpenMP][Offloading] Replace all calls to `isSPMDMode` with `__kmpc_is_spmd_exec_mode` In our ongoing work, we are using `AbstractAttributor` to deduct execution model of device functions, and potententially remove unnecessary function calls to `__kmpc_is_spmd_exec_mode`. In current device runtime, we have mixed use of `isSPMDMode` and `__kmpc_is_spmd_exec_mode`, but in fact in `__kmpc_is_spmd_exec_mode` it simply calls `isSPMDMode`. Since all functions starting with `__kmpc` is C function, which doesn't have things like name mangling. It is more optimization friendly. In this patch, we simply replaced all calls to `isSPMDMode` with `__kmpc_is_spmd_exec_mode` to pave the way for the optimization. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105211	2021-06-30 15:39:57 -04:00
Dhruva Chakrabarti	e0b713a035	[libomptarget] [amdgpu] Change default number of teams per computation unit This patch is related to https://reviews.llvm.org/D98832. Based on discussions there, I decided to separate out the teams default as this patch. This change is to increase the number of teams per computation unit so as to provide more wavefronts for hiding latency. This change improves performance for some programs, including 20-50% for some Stream benchmarks. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D99003	2021-06-29 15:34:35 -07:00
Dhruva Chakrabarti	2240b41ee4	[libomptarget] [amdgpu] Fix default setting of max flat workgroup size When max flat workgroup size is not specified, it is set to the default workgroup size. This prevents kernel launch with a workgroup size larger than the default. The fix is to ignore a size of 0 and treat it as unspecified. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105073	2021-06-29 13:47:24 -07:00
Johannes Doerfert	4eb90e893f	Revert "[OpenMP] Add Two-level Distributed Barrier" This reverts commit `25073a4ecf`. This breaks non-x86 OpenMP builds for a while now. Until a solution is ready to be upstreamed we revert the feature and unblock those builds. See: https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821 and https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821 The currently proposed fix (D104788) seems not to be ready yet: https://reviews.llvm.org/D104788#2841928	2021-06-29 09:38:27 -05:00
Johannes Doerfert	bc8bb3df35	Revert "[omp] Fix build without ITT after D103121 changes" This reverts commit `eab1fd389b`. This commit fixed a problem with `25073a4ecf` (D103121) which is the one we actually need to revert to unblock non-X86 builds of OpenMP. Can be reapplied, or merged into, D103121 as it goes in again.	2021-06-29 09:38:27 -05:00
Joseph Huber	2190c48fde	[OpenMP][Documentation] Add FAQ entry for CMake module This patch adds documentation for using the CMake find module for OpenMP target offloading provided by LLVM. It also removes the requirement for AMD's architecture to be set as this isn't necessary for upstream LLVM. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105051	2021-06-28 17:05:07 -04:00
Joseph Huber	c9f3240c9d	[OpenMP][Documentation] Add OpenMPOpt optimization section Add some information about the optimizations currently provided by OpenMPOpt. Every optimization performed should eventually be listed here. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105050	2021-06-28 17:05:03 -04:00
Pushpinder Singh	20df2c7052	[AMDGPU][Libomptarget] Collect allocatable memory pools using HSA The logic is almost similar to that of system.cpp with one change that instead of adding all the memory pools to a device struct it only keeps a single pool. The existing approach also always allocated memory on the first HSA pool found for a GPU. This depends on D104691. The goal of this series of patches is to remove _atl_machine global. The next patch will drop g_atl_machine entirely. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104695	2021-06-28 11:28:04 +00:00
Jon Chesterfield	f66b8fdc0a	[libomptarget][amdgpu] Build openmp for two more targets [libomptarget][amdgpu] Build openmp for two more targets The 4800U APU is a gfx902 and the MI100 accelerator is a gfx908. Both numbers are listed in ROCT topology.c Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D104922	2021-06-25 19:02:03 +01:00
Jon Chesterfield	96f6873dff	[OpenMP][NFC] Drop unused headers from amdgpu plugin	2021-06-25 12:08:56 +01:00
AndreyChurbanov	b2787945f9	[OpenMP][NFC] libomp: fix wrong debug assertion. Normalized bounds of chunk of iterations to steal from are inclusive, so upper bound should not be decremented in expression to check. Problem was in attempt to steal iterations 0:0, that caused assertion after wrong decrement. Reported in comment to https://reviews.llvm.org/D103648. Differential Revision: https://reviews.llvm.org/D104880	2021-06-25 02:02:14 +03:00
Aakanksha Patil	3453f3dd46	[AMDGPU] Add gfx1035 target Differential Revision: https://reviews.llvm.org/D104804	2021-06-24 14:32:41 -04:00
Joel E. Denny	9fa5e3280d	[OpenMP] Fix delete map type in ref count debug messages For example, without this patch: ``` $ cat test.c int main() { int x; #pragma omp target enter data map(alloc: x) #pragma omp target enter data map(alloc: x) #pragma omp target enter data map(alloc: x) #pragma omp target exit data map(delete: x) ; return 0; } $ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c $ LIBOMPTARGET_DEBUG=1 ./a.out \|& grep 'Creating\\|Mapping exists\\|last' Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=1, Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (incremented), Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=3 (incremented), Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (decremented) Libomptarget --> There are 4 bytes allocated at target address 0x00000000013bb040 - is not last ``` `RefCount` is reported as decremented to 2, but it ought to be reset because of the `delete` map type, and `is not last` is incorrect. This patch migrates the reset of reference counts from `DeviceTy::deallocTgtPtr` to `DeviceTy::getTgtPtrBegin`, which then correctly reports the reset. Based on the `IsLast` result from `DeviceTy::getTgtPtrBegin`, `targetDataEnd` then correctly reports `is last` for any deletion. `DeviceTy::deallocTgtPtr` is responsible only for the final reference count decrement and mapping removal. An obscure side effect of this patch is that a `delete` map type when the reference count is infinite yields `DelEntry=IsLast=false` in `targetDataEnd` and so no longer results in a `DeviceTy::deallocTgtPtr` call. Without this patch, that call is a no-op anyway besides some unnecessary locking and mapping table lookups. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104560	2021-06-23 09:57:19 -04:00
Joel E. Denny	48421ac441	[OpenMP] Improve ref count debug messages For example, without this patch: ``` $ cat test.c int main() { int x; #pragma omp target enter data map(alloc: x) #pragma omp target exit data map(release: x) ; return 0; } $ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c $ LIBOMPTARGET_DEBUG=1 ./a.out \|& grep 'Creating\\|Mapping exists' Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, updated RefCount=1 ``` There are two problems in this example: * `RefCount` is not reported when a mapping is created, but it might be 1 or infinite. In this case, because it's created by `omp target enter data`, it's 1. Seeing that would make later `RefCount` messages easier to understand. * `RefCount` is still 1 at the `omp target exit data`, but it's reported as `updated`. The reason it's still 1 is that, upon deletions, the reference count is generally not updated in `DeviceTy::getTgtPtrBegin`, where the report is produced. Instead, it's zeroed later in `DeviceTy::deallocTgtPtr`, where it's actually removed from the mapping table. This patch makes the following changes: * Report the reference count when creating a mapping. * Where an existing mapping is reported, always report a reference count action: * `update suppressed` when `UpdateRefCount=false` * `incremented` * `decremented` * `deferred final decrement`, which replaces the misleading `updated` in the above example * Add comments to `DeviceTy::getTgtPtrBegin` to explain why it does not zero the reference count. (Please advise if these comments miss the point.) * For unified shared memory, don't report confusing messages like `RefCount=` or `RefCount= updated` given that reference counts are irrelevant in this case. Instead, just report `for unified shared memory`. * Use `INFO` not `DP` consistently for `Mapping exists` messages. * Fix device table dumps to print `INF` instead of `-1` for an infinite reference count. Reviewed By: jhuber6, grokos Differential Revision: https://reviews.llvm.org/D104559	2021-06-23 09:57:19 -04:00
Joseph Huber	72d4cd627c	[OpenMP] Introduce an CMake find module for OpenMP Target support This introduces a CMake find module for detecting target offloading support in a compiler. The goal is to make it easier to incorporate target offloading into a cmake project. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D104710	2021-06-22 23:01:38 -04:00
Joseph Huber	422adaa879	[OpenMP] Add thread limit environment variable support to plugins The OpenMP 5.1 standard defines the environment variable `OMP_TEAMS_THREAD_LIMIT` to limit the number of threads that will be run in a single block. This patch adds support for this into the AMDGPU and CUDA plugins. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103923	2021-06-22 16:25:40 -04:00
Shilei Tian	0029059074	[NFC][OpenMP][Offloading] Unified the construction of mapping table entry This patch unifies construction of mapping table entry to use `emplace`. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104580	2021-06-22 12:38:47 -04:00
Joseph Huber	244e98ff48	[Libomptarget] Improve device runtime implementation for globalized variables. Currently the runtime implementation of `__kmpc_alloc_shared` is extremely slow because it allocated memory for each thread individually. This patch adds a small buffer for the threads to share data and will greatly improve performance for builds where all globalization could not be optimized out. If the shared buffer is full, then memory will not only be allocated per-warp rather than per-thread. Depends on D97680 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104666	2021-06-22 11:52:49 -04:00
Joseph Huber	952a0f2385	[Libomptarget] Introduce new globalization runtime calls Summary: This patch introduces the new globalization runtime to be used by D97680. These runtime calls will replace the __kmpc_data_sharing_push_stack and __kmpc_data_sharing_pop_stack functions. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102532	2021-06-22 10:05:42 -04:00
AndreyChurbanov	5dd4d0d46f	[OpenMP] libomp: fix dynamic loop dispatcher Restructured dynamic loop dispatcher code. Fixed use of dispatch buffers for nonmonotonic dynamic (static_steal) schedule: - eliminated possibility of stealing iterations of the wrong loop when victim thread changed its buffer to work on another loop; - fixed race when victim thread changed its buffer to work in nested parallel; - eliminated "static" property of the schedule, that is now a single thread can execute whole loop. Differential Revision: https://reviews.llvm.org/D103648	2021-06-22 16:29:01 +03:00
Pushpinder Singh	9d110f9159	[AMDGPU][Libomptarget] Move allow_access_to_all_gpu_agents to rtl.cpp Moving this method helps eliminate a use of g_atl_machine. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104691	2021-06-22 11:44:52 +00:00
Vladislav Vinogradov	eab1fd389b	[omp] Fix build without ITT after D103121 changes Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D104638	2021-06-21 18:17:52 +03:00
Vyacheslav Zakharin	aad9e48c5f	[NFC][libomptarget] Remove redundant libelf dependency for elf_common. Differential Revision: https://reviews.llvm.org/D104549	2021-06-21 07:19:55 -07:00
Pushpinder Singh	7a97cd9da7	[AMDGPU][Libomptarget] Remove redundant functions There does not seem to be any use of these functions. They just put the value to a local which is never used again. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104512	2021-06-21 06:13:24 +00:00
Shilei Tian	ec97866454	[OpenMP] Make bug49334.cpp more reproducible `bug49334.cpp` cannot detect data race in `libomptarget` efficiently. It is reported that with `N = 256` and `BS = 16`, the data race can be reproduced more steadily. The next coming pathces will fix it so this patch is expected to fail now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104552	2021-06-18 18:35:41 -04:00
Asher Mancinelli	5c189d30e6	[OpenMP] Update FAQ for enabling cuda offloading Add an FAQ entry and add a few lines to an existing one. Document the use of `GCC_INSTALL_PREFIX` for pointing clang to correct GCC installation for two-stage build. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D104474	2021-06-18 11:55:45 -06:00
Vyacheslav Zakharin	836992ab9a	[NFC][libomptarget] Build elf_common with PIC. Differential Revision: https://reviews.llvm.org/D104545	2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin	c5b7c7c8f7	[NFC][libomptarget] Fixed -DLLVM_ENABLE_RUNTIMES="openmp" build. Differential Revision: https://reviews.llvm.org/D104535	2021-06-18 09:20:10 -07:00
Terry Wilmarth	25073a4ecf	[OpenMP] Add Two-level Distributed Barrier Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper barrier. This barrier is designed to handle fine granularity parallelism where barriers are used frequently with little compute and memory access between barriers. There is no need to use it for codes with few barriers and large granularity compute, or memory intensive applications, as little difference will be seen between this barrier and the default hyper barrier. This barrier is designed to work optimally with a fixed number of threads, and has a significant setup time, so should NOT be used in situations where the number of threads in a team is varied frequently. The two-level distributed barrier is off by default -- hyper barrier is used by default. To use this barrier, you must set all barrier patterns to use this type, because it will not work with other barrier patterns. Thus, to turn it on, the following settings are required: KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER, and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed barrier. Differential Revision: https://reviews.llvm.org/D103121	2021-06-16 15:34:55 -05:00
Vyacheslav Zakharin	b5c4fc0f23	[NFC][libomptarget] Reduce the dependency on libelf This change-set removes libelf usage from elf_common part of the plugins. libelf is still used in x86_64 generic plugin code and in some plugins (e.g. amdgpu) - these will have to be cleaned up in separate checkins. Differential Revision: https://reviews.llvm.org/D103545	2021-06-16 08:34:23 -07:00
AndreyChurbanov	610fea65e2	[OpenMP] libomp: fixed implementation of OMP 5.1 inoutset task dependence type Refactored code of dependence processing and added new inoutset dependence type. Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps. All dependence flags library gets so far and corresponding dependence types: 1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET. Differential Revision: https://reviews.llvm.org/D97085	2021-06-16 14:47:29 +03:00
Joachim Protze	d2a7871b5e	[OpenMP][NFC] Add back suppression of warning Commit `cff215565e` did not fix all unused variables in different builds, so adding back the suppression for now.	2021-06-16 10:14:59 +02:00
Joachim Protze	cff215565e	[OpenMP] Remove unused variables from libomp code Several variables were left unused as a result of different patches removing their use. Two variables have some use: `poll_count` is used by the KMP_BLOCKING macro only under certain conditions. Adding (void) to tell the compiler to ignore the unused variable. `padding` is a dummy stack allocation with no intent to be used. Also adding (void) to make the compiler ignore the unused variable. Differential Revision: https://reviews.llvm.org/D104303	2021-06-16 09:33:46 +02:00
Peyton, Jonathan L	56da28240f	[OpenMP] Add GOMP 5.0 version symbols to API * Add GOMP versioned pause functions * Add GOMP versioned affinity format functions To do the affinity format functions, only attach versioned symbols to the APPEND Fortran entries (e.g., omp_set_affinity_format_) since GOMP only exports two symbols (one for Fortran, one for C). Our affinity format functions have three symbols. e.g., with omp_set_affinity_format: 1) omp_set_affinity_format (Fortran interface) 2) omp_set_affinity_format_ (Fortran interface) 3) ompc_set_affinity_format (C interface) Have the GOMP version of the C symbol alias the ompc_* 3) version instead of the Fortran unappended version 1). Differential Revision: https://reviews.llvm.org/D103647	2021-06-15 16:25:00 -05:00
Peyton, Jonathan L	92baf414db	[OpenMP] Fix affinity determine capable algorithm on Linux Remove strange checks for syscall() arguments where mask is NULL. Valgrind reports these as error usages for the syscall. Instead, just check if CACHE_LINE bytes is long enough. If not, then search for the size. Also, by limiting the first size detection attempt to CACHE_LINE bytes, instead of 1MB, we don't use more than one cache line for the mask size. Before this patch, sometimes the returned mask size was 640 bytes (10 cache lines) because the initial call to getaffinity() was limited only by the internal kernel mask size which can be very large. Differential Revision: https://reviews.llvm.org/D103637	2021-06-15 16:21:30 -05:00
Peyton, Jonathan L	0ddde4d865	[OpenMP] Lazily assign root affinity Lazily set affinity for root threads. Previously, the root thread executing middle initialization would attempt to assign affinity to other existing root threads. This was not working properly as the set_system_affinity() function wasn't setting the affinity for the target thread. Instead, the middle init thread was resetting the its own affinity using the target thread's affinity mask. Differential Revision: https://reviews.llvm.org/D103625	2021-06-15 16:21:06 -05:00
Pushpinder Singh	cadcaf3f46	[AMDGPU][Libomptarget] Drop dead code related to g_atl_machine This patch includes some changes which deletes the code accessing g_atl_machine global. Some accesses related to memory_pools are still remaining. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103813	2021-06-15 05:21:35 +00:00
Ron Lieberman	91f147792e	[libomptarget][amdgpu] Remove stray fprintf in rtl.cpp remove unintended fprintf in rtl.cpp Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D104003	2021-06-10 01:57:30 +00:00
AndreyChurbanov	9ce2e5e700	Revert "[OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type" This reverts commit `a1f550e052`. Revert in order to fix backwards compatibility breakage caused by type size change for task dependence flag.	2021-06-09 17:38:38 +03:00
Joachim Protze	639b397931	[OpenMP][Tools] Fix Archer handling of task dependencies The current handling of dependencies in Archer has two flaws: - annotation of dependency synchronization is not limited to sibling tasks - annotation of in/out dependencies is based on the assumption, that dependency variables will rarely be byte-sized variables. This patch introduces a map in the generating task to manage the dependency variables for the child tasks. The map is only accesses from the generating task, so no locking is necessary. This also limits the dependency-based synchronization to sibling tasks. This patch also introduces proper handling for new dependency types such as mutexinoutset and inoutset. Differential Revision: https://reviews.llvm.org/D103608	2021-06-09 13:36:20 +02:00
Joachim Protze	08d8f1a958	[OpenMP][Tools] Cleanup memory pool used in Archer The main motivation for reusing objects is that it helps to avoid creating and leaking synchronization clocks in TSan. The reused object will reuse the synchronization clock in TSan. Before, new and delete operators were overloaded to get and return memory for the object from/to the object pool. This patch replaces the operator overloading with explicit static New/Delete functions. Objects for parallel regions and implicit tasks will always be recruited and returned to the thread-local object pool. Only for explicit task, there is a chance that an other thread completes the task and will free the object. This patch optimizes the thread-local New/Delete calls by avoiding locks and only lock if the pool is empty. Remote threads return the object into a separate queue. The chunk size for allocations is now decided based on page size. The objects will also be aligned to cache lines avoiding false sharing. This is the first patch in a series to provide better tasking support. Differential Revision: https://reviews.llvm.org/D103606	2021-06-09 13:36:19 +02:00
Joachim Protze	82e4e50531	[OpenMP][Tools] Fix Archer for MACOS Archer uses weak symbol overloads of TSan functions to enable loading the tool even if the application is not built with TSan. For MACOS the tool collects the function pointer at runtime. When adding the function entry/exit markers, we missed to add the functions in the MACOS codepath. This patch also replaces the repeated function lookup by a single initial function lookup and fixes the disabling logic in RunningOnValgrind. Differential Revision: https://reviews.llvm.org/D103607	2021-06-09 13:36:19 +02:00
Brendon Cahoon	294efbbd3e	Reland "[AMDGPU] Add gfx1013 target" This reverts commit `211e584fa2`. Fixed a use-after-free error that caused the sanitizers to fail.	2021-06-08 21:15:35 -04:00
Joseph Huber	df965513a9	[OpenMP] Add an information flag for device data transfers This patch adds an information flag that indicated when data is being copied to and from the device. This will be helpful for finding redundant or unnecessary data transfers in applications. Reviewed By: jdoerfert, grokos Differential Revision: https://reviews.llvm.org/D103927	2021-06-08 20:23:27 -04:00
Brendon Cahoon	211e584fa2	Revert "[AMDGPU] Add gfx1013 target" This reverts commit `ea10a86984`. A sanitizer buildbot reports an error.	2021-06-08 16:29:41 -04:00
Brendon Cahoon	ea10a86984	[AMDGPU] Add gfx1013 target Differential Revision: https://reviews.llvm.org/D103663	2021-06-08 12:49:49 -04:00
Vignesh Balasubramanian	f61602b0d3	[OpenMP][OMPD] Implementation of OMPD debugging library - libompd. This is the first of seven patches that implements OMPD, a debugging interface to support debugging of OpenMP programs. It contains support code required in "openmp/runtime" for OMPD implementation. Reviewed By: @hbae Differential Revision: https://reviews.llvm.org/D100181	2021-06-08 16:44:22 +05:30
Peyton, Jonathan L	d70e1f1276	[OpenMP][runtime] add .clang-tidy file Use same checks as compiler-rt which removes checks for readability-* and llvm-header style. Differential Revision: https://reviews.llvm.org/D103711	2021-06-07 13:56:39 -05:00
AndreyChurbanov	a1f550e052	[OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type Refactored code of dependence processing and added new inoutset dependence type. Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps. Size of type of the dependence flag changed from 1 to 4 bytes in clang. All dependence flags library gets so far and corresponding dependence types: 1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET. Differential Revision: https://reviews.llvm.org/D97085	2021-06-07 21:42:51 +03:00
Bryan Chan	54f059c900	[OpenMP] Check loc for NULL before dereferencing it The ident_t * argument in __kmp_get_monotonicity was being used without a customary NULL check, causing the function to crash in a Debug build. Release builds were not affected thanks to dead store elimination.	2021-06-07 10:45:48 -04:00
Pushpinder Singh	4f8bc7caf4	[AMDGPU][Libomptarget] Remove atlc global This global struct used to hold various flags for monitoring the initialization of hsa. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103795	2021-06-07 11:09:01 +00:00
Pushpinder Singh	f5f329a371	[AMDGPU][Libomptarget] Rework logic for locating kernarg pools Previous logic was to always use the first kernarg pool found to allocate kernel args. This patch changes this to use only the kernarg pool which has non-zero size. This logic is also reworked to not use any globals. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103600	2021-06-07 06:41:37 +00:00
Terry Wilmarth	8ec9aa236e	[OpenMP] Add experimental nesting mode feature Nesting mode is a new experimental feature in the OpenMP runtime. It allows a user to set up nesting for an application in a way that corresponds to the hardware topology levels on the machine an application is being run on. For example, if a machine has 2 sockets, each with 12 cores, then use of nesting mode could set up an outer level of nesting that uses 2 threads per parallel region, and an inner level of nesting that uses 12 threads per parallel region. Nesting mode is controlled with the KMP_NESTING_MODE environment variable as follows: 1) KMP_NESTING_MODE = 0: Nesting mode is off (default); max-active-levels-var is set to 1 (the default -- nesting is off, nested parallel regions are serialized). 2) KMP_NESTING_MODE = 1: Nesting mode is on, and a number of threads will be assigned for each level discovered in the machine topology; max-active-levels-var is set to the number of levels discovered. 3) KMP_NESTING_MODE = n, n>1: [Note: this option is experimental and may change or be removed in the future.] Nesting mode is on, and a number of threads will be assigned for each topology level discovered on the machine, up to k<=n levels (since there may be fewer than n levels discovered in the topology), and beyond the kth level, nested parallel regions will be serialized; NOTE: max-active-levels-var is 1 (the default -- nesting is off, and nested parallel regions are serialized until the user changes max-active-levels-var. If the user sets OMP_NUM_THREADS or OMP_MAX_ACTIVE_LEVELS, they will override KMP_NESTING_MODE settings for the associated environment variables. The detected topology may be limited by an affinity mask setting on the initial thread, or if the user sets KMP_HW_SUBSET. See also: KMP_HOT_TEAMS_MAX_LEVEL for controlling use of hot teams for nested parallel regions. Note that this feature only sets numbers of threads used at nesting levels. The user should make use of OMP_PLACES and OMP_PROC_BIND or KMP_AFFINITY for affinitizing those threads, if desired. Differential Revision: https://reviews.llvm.org/D102188	2021-06-04 16:01:11 -05:00
Peyton, Jonathan L	56dd158c32	[OpenMP] fix spelling error in message-converter.pl	2021-06-04 11:20:32 -05:00
Peyton, Jonathan L	f7655f3df3	[OpenMP] Fix improper printf format specifier	2021-06-02 11:04:48 -05:00
Hansang Bae	7ba4e96ede	[OpenMP] Use new task type/flag for taskwait depend events. Differential Revision: https://reviews.llvm.org/D103464	2021-06-02 10:16:38 -05:00
Pushpinder Singh	b25546a4b4	[AMDGPU][Libomptarget][NFC] Remove bunch of dead structs Dropped structs are atmi_machine_t, atmi_device_t and atmi_memory_t Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103509	2021-06-02 10:40:51 +00:00
Pushpinder Singh	2368170a8d	[AMDGPU][Libomptarget][NFC] Remove atmi_place_t atmi_place_t has been replaced with int DeviceId. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103508	2021-06-02 10:35:28 +00:00
Peyton, Jonathan L	2020c981fa	[OpenMP] Add L2-Tile equivalence for KNL When on KNL and L2 or Tile layer is detected, manually add the corresponding layer which is equivalent. Differential Revision: https://reviews.llvm.org/D102865	2021-06-01 14:17:13 -05:00
Hansang Bae	cf5c94ef08	[OpenMP] Define named constants for interop's foreign runtime ID Also added missing Fortran definitions for interop support. Differential Revision: https://reviews.llvm.org/D102883	2021-06-01 13:06:59 -05:00
Pushpinder Singh	fb113264a8	[AMDGPU][Libomptarget] Remove g_atmi_machine global Turns out the only purpose of this class was verify if device ID was in range or not which could be done easily by using g_atl_machine. Still getting rid of g_atl_machine is pending which would be done in a later patch. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103443	2021-06-01 12:34:24 +00:00
Pushpinder Singh	4fc3286951	[AMDGPU][Libomptarget][NFC] Split host and device malloc This patch splits the code path for host and device malloc. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103389	2021-05-31 12:09:18 +00:00
Pushpinder Singh	8b79dfb302	[AMDGPU][Libomptarget][NFC] Remove atmi_mem_place_t This struct was used to specify the device on which memory was being allocated/free in atmi_malloc/free. It has now been replaced with int DeviceId. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103239	2021-05-27 11:53:18 +00:00
Jon Chesterfield	2fdf8bbd19	[libomptarget][nfc][amdgpu] Factor out setting upper bounds Refactor suggested in D103037 to help avoid similar copy-paste errors. Change is mechanical. Some parts of this would be more robust with unsigned. Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D103090	2021-05-26 19:57:49 +01:00
Jon Chesterfield	c5c1ec7945	[libomptarget][nfc][amdgpu] Refactor uses of KernelInfoTable Suggested in D103059. Use a single lookup instead of two, more const, less mutation. Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D103093	2021-05-26 19:25:25 +01:00
Jon Chesterfield	07f59baad6	[libomptarget][nfc][amdgpu] Remove atmi_status_t type ATMI_STATUS_UNKNOWN was unused, deleted references to it. Replaced ATMI_STATUS_{SUCCESS,ERROR} with HSA_STATUS_{SUCCESS,ERROR} Replaced atmi_status_t with hsa_status_t Otherwise no change. In particular, conversions between atmi_status_t and hsa_status_t will now be conversions between hsa_status_t and itself. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D103115	2021-05-26 17:02:19 +01:00
Pushpinder Singh	a2d6ef5876	[AMDGPU][Libomptarget] Inline atmi_init/atmi_finalize After D102847, these functions can be inlined. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103075	2021-05-26 10:50:08 +00:00
Pushpinder Singh	cc8661ac4a	[AMDGPU][Libomptarget] Delete g_atmi_initialized This patch drops g_atmi_initialized and inlines the Initialize & Finalize methods from Runtime class. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102847	2021-05-26 10:46:54 +00:00
Pushpinder Singh	7648b6978e	[AMDGPU][Libomptarget] Move Kernel/Symbol info tables to RTLDeviceInfoTy Two globals KernelInfoTable & SymbolInfoTable are moved into RTLDeviceInfoTy class. This builds on the top of D102691. [2/2] Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102692	2021-05-26 10:02:28 +00:00
Jon Chesterfield	df005fa364	[libomptarget][nfc] Move hostcall required test to rtl [libomptarget][nfc] Move hostcall required test to rtl Remove a global, fix minor race. First of N patches to bring up hostcall. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103058	2021-05-25 22:43:17 +01:00
Pushpinder Singh	b0d68c7141	[AMDGPU][Libomptarget] Mark lambda_by_value test as XFAIL Reason: Missing printf definition Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103078	2021-05-25 12:16:54 +00:00
Jon Chesterfield	75492e20fb	[libomptarget][nfc] Accept callable for hsa iterate_symbols [libomptarget][nfc] Accept callable for hsa iterate_symbols Candidate refactor to simplify D102692 Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D103030	2021-05-25 09:29:11 +01:00
Dhruva Chakrabarti	96d70f4d28	[libomptarget] [amdgpu] Added LDS usage to the kernel trace Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103059	2021-05-24 19:33:48 -07:00
Hansang Bae	95cefacfe1	[OpenMP] Fix crashing critical section with hint clause Runtime was using the default lock type without using the hint. Differential Revision: https://reviews.llvm.org/D102955	2021-05-24 17:25:01 -05:00
Dhruva Chakrabarti	ca17b26d4d	[libomptarget] [amdgpu] Fix copy-paste error setting NumThreads for a corner case. Fix the case where NumTeams was set incorrectly instead of NumThreads Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103037	2021-05-24 15:23:15 -07:00
Pushpinder Singh	486110eb41	[AMDGPU][Libomptarget] Remove global KernelNameMap KernelNameMap contains entries like "key.kd" => key which clearly could be replaced by simple logic of removing suffix from the key. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102691	2021-05-24 08:46:08 +00:00
AndreyChurbanov	aa6e7e8da8	[OpenMP] libomp: move warnings to after library initialization Warnings on deprecated api cannot be suppressed if the library is not initialized. With this change it is possible to set KMP_WARNINGS=false to suppress the warnings. Differential Revision: https://reviews.llvm.org/D102676	2021-05-21 23:47:23 +03:00
George Rokos	d0bc04d6b9	[libomptarget] Fix a bug whereby firstprivates are not copied over to the device The check for the TO flag when processing firstprivates is missing. As a result, sometimes the device copy of a firstprivate never gets initialized. Currectly we try to force lambda structs to be allocated immediately by marking them as a non-firstprivate, so that PrivateArgumentManagerTy::addArg allocates memory for them immediately. However, calling addArg with IsFirstPrivate=false makes the function skip initializing the device copy. Whether an argument is firstprivate and whether we need to allocate memory immediately are not synonyms, so this patch introduces one more control variable for immediate allocation and sets it apart from initialization. Differential Revision: https://reviews.llvm.org/D102890	2021-05-21 10:52:08 -07:00
Jon Chesterfield	d54712ab4d	[libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation [libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation There are a lot of different ways we might implement the devicertl local alloc and free functions. Via host, local buffers (stack or arena), specialising per kernel etc. It is not yet clear what the right design is. This change makes the alloc and free functions weak, so one can override them from local tests while comparing options. Not strictly necessary, as a comparable patch can be applied locally each time, but would be convenient for out of tree dev. Plan would be to drop the weak attribute at the same time as introducing a working allocator to trunk. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102499	2021-05-21 16:09:22 +01:00
Jon Chesterfield	68b88ae670	[libomptarget] Improve dlwrap compile time error diagnostic [libomptarget] Improve dlwrap compile time error diagnostic The dlwrap interface takes an explict arity, e.g. DLWRAP(cuAlloc, 2); This probably can't be eliminated as it controls the argument list of an external symbol, not an inline header function. If the arity given is too big, the error from clang referring to the line is in the middle of implementation details. /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1277:7: error: static_assert failed due to requirement '0UL < tuple_size<std::tuple<>>::value' "tuple index is in range" static_assert(__i < tuple_size<tuple<>>::value, ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1260:7: ... /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1260:7: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:93:27 ... /home/amd/llvm-project/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp:34:1: note: in instantiation of template class 'dlwrap::trait<cudaError_enum ()(unsigned long , unsigned long)>::arg<2>' requested here DLWRAP(cuMemAlloc, 3); ^ /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:51:31: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:166:3: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:133:3: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:186:37: ... If the arity is too small, the diagnostic is better: cuda/dynamic_cuda/cuda.cpp:34:1: error: too few arguments to function call, expected 2, have 1 DLWRAP(cuMemAlloc, 1); This patch changes the diagnostic to: cuda/dynamic_cuda/cuda.cpp:34:1: error: static_assert failed due to requirement '1 == trait<cudaError_enum ()(unsigned long , unsigned long)>::nargs' "Arity Error" DLWRAP(cuMemAlloc, 1); or cuda/dynamic_cuda/cuda.cpp:34:1: error: static_assert failed due to requirement '3 == trait<cudaError_enum ()(unsigned long , unsigned long)>::nargs' "Arity Error" DLWRAP(cuMemAlloc, 3); Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102858	2021-05-20 20:33:36 +01:00
Jon Chesterfield	d18fb09c69	[libomptarget][amdgpu] Remove majority of fatal errors [libomptarget][amdgpu] Remove majority of fatal errors Replaces most calls to exit() with returning an error to the library entry point. Minor changes to error handling for clear bugs, remove some dead code. Each exit() call site replaced is either in a library entry point or a function that already returns error codes on some paths. The existing handling is not well tested but replacing exit() with a fallback path should be a strict improvement. Remaining two early exit points are an abort() from a callback and exit() from within msgpack. Fixes for those are less obvious and left for a later patch. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102346	2021-05-20 16:26:43 +01:00
Jon Chesterfield	ea68ad6e26	[libomptarget] Disable test bug49334 on amdgpu [libomptarget] Disable test bug49334 on amdgpu Hangs on amdgpu, do not know why. Disable to unblock build. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D102017	2021-05-20 15:46:56 +01:00
Pushpinder Singh	d7503c3bce	[AMDGPU][Libomptarget] Rename & move g_executables to private This patch moves g_executables to private member of Runtime class and is renamed to HSAExecutables following LLVM naming convention. This movement required making Runtime::Initialize and Runtime::Finalize non-static. Verified the correctness of this change by running libomptarget tests on gfx906. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102600	2021-05-18 05:43:23 +00:00
Pushpinder Singh	3bc2b97b34	[AMDGPU][libomptarget] Remove unused global variables This initial patch removes some unused variables from global namespace. There will more incoming patches for moving global variables to classes or static members. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102598	2021-05-18 05:40:49 +00:00
Shilei Tian	af6511d730	[OpenMP] Fixed Bug 49356 Bug 49356 (https://bugs.llvm.org/show_bug.cgi?id=49356) reports crash in the test case `tasking/bug_taskwait_detach.cpp`, which is caused by the wrong function declaration. `gtid` in `__kmpc_omp_task` should be `kmp_int32`. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D102584	2021-05-17 12:14:54 -04:00
Aakanksha Patil	464e4dc50f	[AMDGPU] Add gfx1034 target Differential Revision: https://reviews.llvm.org/D102306	2021-05-13 14:25:18 -04:00
Jon Chesterfield	10de217209	[libomptarget][amdgpu] Fix truncation error for partial wavefront [libomptarget][amdgpu] Fix truncation error for partial wavefront The partial barrier implementation involves one wavefront resetting and N-1 waiting. This change future proofs against launching with a number of threads that is not a multiple of the wavefront size. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102407	2021-05-13 17:31:57 +01:00
Jon Chesterfield	b049870d3b	[libomptarget][amdgpu] Convert an assert to print and offload_fail [libomptarget][amdgpu] Convert an assert to print and offload_fail The kernel launched is supposed to be present in the binary, but a not yet diagnosed bug means it is missing for some of the qmcpack test cases. Changing from assert to print and offload_fail should help diagnose that and similar bugs. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102378	2021-05-13 17:31:36 +01:00
Michael Kruse	34ed3e6337	[OpenMP] Test unified shared memory tests only on systems that support it. Add a `REQUIRES: unified_shared_memory` option to tests that use `#pragma omp requires unified_shared_memory`. For CUDA, the feature tag is derived from LIBOMPTARGET_DEP_CUDA_ARCH which itself is derived using [[ https://cmake.org/cmake/help/latest/module/FindCUDA.html#commands \| cuda_select_nvcc_arch_flags ]]. The latter determines which compute capability the GPU in the system supports. To ensure that this is the CUDA arch being used, we could also set the `-Xopenmp-target -march=` flag. In the absence of an NVIDIA GPU, LIBOMPTARGET_DEP_CUDA_ARCH will be 35. That is, in that case we are assuming unified_shared_memory is not available. CUDA plugin testing could be disabled entirely in this case, but this currently depends on `LIBOMPTARGET_CAN_LINK_LIBCUDA OR LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA`, not on whether the hardware is actually available. For all other targets, nothing changes and we are assuming unified shared memory is available. This might need refinement if not the case. This tries to fix the [[ http://meinersbur.de:8011/#/builders/143 \| OpenMP Offloading Buildbot ]] that, although brand-new, only has a Pascal-generation (sm_61) GPU installed. Hence, tests that require unified shared memory are currently failing. I wish I had known in advance. Reviewed By: protze.joachim, tianshilei1992 Differential Revision: https://reviews.llvm.org/D101498	2021-05-13 11:08:04 -05:00
Jon Chesterfield	9934571eab	[libomptarget][amdgpu][nfc] Expand errorcheck macros [libomptarget][amdgpu][nfc] Expand errorcheck macros These macros expand to continue, which is confusing, or exit, which is incompatible with continuing execution on offloading fail. Expanding the macros in place makes the code look untidy but the control flow obvious and amenable to improving. In particular, exit becomes easier to eliminate. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102230	2021-05-12 17:30:41 +01:00
Christopher Pulido	4fb0aaf033	[OpenMP] Changes to enable MSVC ARM64 build of libomp This is the first in a series of changes to the OpenMP runtime that have been done internally by Microsoft. This patch makes the necessary changes to enable libomp.dll to build with the MSVC compiler targeting ARM64. Differential Revision: https://reviews.llvm.org/D101173	2021-05-11 23:03:12 +03:00
Jon Chesterfield	72995a4bdf	[libomptarget][nfc] Add hook to easily disable building amdgcn bclib [libomptarget][nfc] Add hook to easily disable building amdgcn bclib This is useful when building LLVM with a toolchain that can't emit code for amdgcn, e.g. because it overrides the include search path with headers from another architecture, or the clang compiler is missing builtins. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102229	2021-05-11 17:23:09 +01:00
Peyton, Jonathan L	c765d140fe	[OpenMP] Fix hidden helper + affinity When KMP_AFFINITY is set, each worker thread's gtid value is used as an index into the place list to determine the thread's placement. With hidden helpers enabled, this gtid value is shifted down leading to unexpected shifted thread placement. This patch restores the previous behavior by adjusting the mask index to take the number of hidden helper threads into account. Hidden helper threads are given the full initial mask and do not participate in any of the other affinity mechanisms (place partitioning, balanced affinity). Their affinity is only printed for debug builds. Differential Revision: https://reviews.llvm.org/D101882	2021-05-11 08:54:22 -05:00
Jon Chesterfield	dedca78d48	[libomptarget][nfc] Drop stringify in macro [libomptarget][nfc] Drop stringify in macro A step towards deleting the macros entirely. Differential Revision: https://reviews.llvm.org/D102228	2021-05-11 12:19:55 +01:00
Jon Chesterfield	6da348569c	[libomptarget] Add support for target allocators to dynamic cuda RTL [libomptarget] Add support for target allocators to dynamic cuda RTL Follow on to D102000 which introduced new calls into libcuda. This patch adds the corresponding entry points to dynamic_cuda, fixing the build for systems that do not have the cuda toolkit installed. Function types and enum from https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102169	2021-05-10 15:27:50 +01:00
Pushpinder Singh	9586937ef5	[AMDGPU][OpenMP] Disable tests when amdgpu-arch fails This patch prevents runtime tests running on systems without amdgpu. Reviewed By: protze.joachim, tianshilei1992 Differential Revision: https://reviews.llvm.org/D102054	2021-05-10 07:37:27 +00:00
Vyacheslav Zakharin	f2f88f3e7a	An attempt to abandon omptarget out-of-tree builds. I want to start using LLVM component libraries in libomptarget to stop duplicating implementations already available in LLVM (e.g. LLVMObject, LLVMSupport, etc.). Without relying on LLVM in all libomptarget builds one has to provide fallback implementation for each used LLVM feature. This is an attempt to stop supporting out-of-llvm-tree builds of libomptarget. I understand that I may need to revert this, if this affects downstream projects in a bad way. Differential Revision: https://reviews.llvm.org/D101509	2021-05-07 12:43:50 -07:00
Joseph Huber	a15f8589f4	[libomptarget] Add support for target memory allocators to cuda RTL Summary: The allocator interface added in D97883 allows the RTL to allocate shared and host-pinned memory from the cuda plugin. This patch adds support for these to the runtime. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D102000	2021-05-07 10:27:02 -04:00
Jon Chesterfield	44ee974e2f	[libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one [libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one D101976 would require a second barrier instance. This NFC to amdgpu makes it simpler to add one (an extra global, one more line in init). Also renames the current barrier to L0. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102016	2021-05-06 23:52:19 +01:00
Jon Chesterfield	7e9351b9de	[libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin [libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin Drops an enum that was identical to a HSA one, localises some functions where they were only called from one TU. Covers everything internalize + adce can identify as dead, except for msgpack::dump which is useful when debugging. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102014	2021-05-06 23:16:32 +01:00
Jon Chesterfield	25fe17d3c1	[libomptarget] Initial documentation on amdgpu offload [libomptarget] Initial documentation on amdgpu offload Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101927	2021-05-05 19:58:52 +01:00
Peyton, Jonathan L	9982f33e2c	[OpenMP] Refactor/Rework topology discovery code This patch does the following: 1) Introduce kmp_topology_t as the runtime-friendly structure (the corresponding global variable is __kmp_topology) to determine the exact machine topology which can vary widely among current and future architectures. The current design is not easy to expand beyond the assumed three layer topology: sockets, cores, and threads so a rework capable of using the existing KMP_AFFINITY mechanisms is required. This new topology structure has: * The depth and types of the topology * Ratio count for each consecutive level (e.g., number of cores per socket, number of threads per core) * Absolute count for each level (e.g., 2 sockets, 16 cores, 32 threads) * Equivalent topology layer map (e.g., Numa domain is equivalent to socket, L1/L2 cache equivalent to core) * Whether it is uniform or not The hardware threads are represented with the kmp_hw_thread_t structure. This structure contains the ids (e.g., socket 0, core 1, thread 0) and other information grabbed from the previous Address structure. The kmp_topology_t structure contains an array of these. 2) Generalize the KMP_HW_SUBSET envirable for the new kmp_topology_t structure. The algorithm doesn't assume any order with tiles,numa domains,sockets,cores,threads. Instead it just parses the envirable, makes sure it is consistent with the detected topology (including taking into account equivalent layers) and then trims away the unneeded subset of hardware threads. To enable this, a new kmp_hw_subset_t structure is introduced which contains a vector of items (hardware type, number user wants, offset). Any keyword within __kmp_hw_get_keyword() can be used as a name and can be shortened as well. e.g., KMP_HW_SUBSET=1s,2numa,4tile,2c,3t can be used on the KNL SNC-4 machine. 3) Simplify topology detection functions so they only do the singular task of detecting the machine's topology. Printing, and all canonicalizing functionality is now done afterwards. So many lines of duplicated code are eliminated. 4) Add new ll_caches and numa_domains to OMP_PLACES, and consequently, KMP_AFFINITY's granularity setting. All the names within __kmp_hw_get_keyword() are available for use in OMP_PLACES or KMP_AFFINITY's granularity setting. 5) Simplify and future-proof code where explicit lists of allowed affinity settings keywords inside if() conditions. 6) Add x86 CPUID leaf 4 cache detection to existing x2apic id method so equivalent caches could be detected (in particular for the ll_caches place). Differential Revision: https://reviews.llvm.org/D100997	2021-05-03 18:00:24 -05:00
Pushpinder Singh	ae845d6426	[AMDGPU][OpenMP] Enable Libomptarget runtime tests This enables the runtime tests on amdgpu targets. 10 tests have been marked as XFAIL on amdgcn currently mostly due to missing printf. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D99656	2021-05-03 05:56:42 +00:00
Martin Storsjö	01d27fc408	[OpenMP] Fix warnings due to redundant semicolons. NFC.	2021-05-02 21:51:06 +03:00
Kevin Athey	bc9120047b	Correct tiny misspelling (readlef -> readelf). Getting my feet wet here as a new committer. Correct misspelling in check-depends.pl. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D101552	2021-04-30 17:20:35 -07:00
Michael Kruse	7308862ff5	[OpenMP][CMake] Use in-project clang as CUDA->IR compiler. If available, use the clang that is already built in the same project as CUDA compiler unless another executable is explicitly defined. This also ensures the generated deviceRTL IR will be consistent with the version of Clang. This patch is required to reliably test OpenMP offloading in a buildbot without either a two-stage build (e.g. with LLVM_ENABLE_RUNTIMES) or a separately installed clang on the worker that will eventually become outdated. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101265	2021-04-30 12:45:52 -05:00
Michael Kruse	3244a8b536	[OpenMP][CMake] Pass --cuda-path to regression tests. The OpenMP runtime can be compiled using a CUDA installed at non-default location with the -DCUDA_TOOLKIT_ROOT_DIR setting. However, check-openmp will fail afterwards because Clang needs to know where to find the CUDA headers. Fix by passing -cuda-path to Clang using the value of CUDA_TOOLKIT_ROOT_DIR which has been determined by CMake. Also set LD_LIBRARY_PATH such that it can find the cuda runtime when executing. This will ensure that the regression test do not depend on the current environment, but use the environment it was configured for. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101266	2021-04-27 16:27:40 -05:00
Joachim Protze	24f836e8fd	[OpenMP][libomptarget] Separate lit tests for different offloading targets (2/2) This patch fuses the RUN lines for most libomptarget tests. The previous patch D101315 created separate test targets for each supported offloading triple. This patch updates the RUN lines in libomptarget tests to use a generic run line independent of the offloading target selected for the lit instance. In cases, where no RUN line was defined for a specific offloading target, the corresponding target is declared as XFAIL. If it turns out that a test actually supports the target, the XFAIL line can be removed. Differential Revision: https://reviews.llvm.org/D101326	2021-04-27 15:54:32 +02:00
Joachim Protze	b845217b1d	[OpenMP][libomptarget] Separate lit tests for different offloading targets (1/2) This patch creates a separate test directory for each offloading target to be tested. This allows to test multiple architectures in one configuration, while still see all failing tests separately. The lit test names include the target triple, so that it will be easier to spot the failing target. This patch also allows to mark expected failing tests based on the target-triple, as the currently used triple is added to the lit "features": ``` // XFAIL: nvptx64-nvidia-cuda ``` Differential Revision: https://reviews.llvm.org/D101315	2021-04-27 12:30:01 +02:00
Joseph Huber	077fe0f739	[OpenMP][Documentation] Add FAQ entry for dynamically linked libraries Summary: Add an FAW entry detailing the support for using dynamically linked libraries with OpenMP Offloading	2021-04-26 14:21:17 -04:00
Jon Chesterfield	58f125493d	[libomptarget] Enable AMDGPU devicertl [libomptarget] Enable AMDGPU devicertl The amdgpu devicertl is written in freestanding openmp and compiles to a bitcode library (per listed gfx arch) with no unresolved symbols. It requires a recent clang, preferably the one from the same monorepo checkout. This is D98658, with printf explicitly stubbed out, after patching clang to no longer require an llvm with the amdgpu target enabled. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101213	2021-04-24 02:24:44 +01:00
Johannes Doerfert	17330a3cb1	[OpenMP] Avoid reading uninitialized parallel level values In a last minute change request for `a2dbfb6b72` we introduced a read of the uninitialized parallel level value in SPMD-mode. We go back to initializing the array early and checking for an adjusted level. Found by the miniqmc unit tests: https://cdash.qmcpack.org/CDash/viewTest.php?onlyfailed&buildid=203434 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101123	2021-04-23 11:21:58 -05:00
Joseph Huber	59b6849012	[OpenMP] Replace global InfoLevel with a reference to an internal one. Summary: This patch improves the implementation of D100774 by replacing the global variable introduced with a function that returns a reference to an internal one. This removes the need to define the variable in every plugin that uses it. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101102	2021-04-23 09:43:46 -04:00
Joseph Huber	2b6f20082e	[OpenMP] Add function for setting LIBOMPTARGET_INFO at runtime Summary: This patch adds a new runtime function __tgt_set_info_flag that allows the user to set the information level at runtime without using the environment variable. Using this will require an extern function, but will eventually be added into an auxilliary library for OpenMP support functions. This patch required moving the current InfoLevel to a global variable which must be instantiated by each plugin. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100774	2021-04-22 12:48:11 -04:00
Alexey Bataev	ca70512099	[OPENMP]Mark test as unsupported to avoid possible unexpected passes, NFC.	2021-04-22 08:06:25 -07:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
Alexey Bataev	079884225a	[OPENMP]Fix PR49698: OpenMP declare mapper causes segmentation fault. The implicitly generated mappings for allocation/deallocation in mappers runtime should be mapped as implicit, also no need to clear member_of flag to avoid ref counter increment. Also, the ref counter should not be incremented for the very first element that comes from the mapper function. Differential Revision: https://reviews.llvm.org/D100673	2021-04-21 10:38:31 -07:00
Peyton, Jonathan L	4457565757	[OpenMP] Implement GOMP task reductions Implement the remaining GOMP_* functions to support task reductions in taskgroup, parallel, loop, and taskloop constructs. The unused mem argument to many of the work-sharing constructs has to do with the scan() directive/ inscan() modifier. If mem is set, each function will call KMP_FATAL() and tell the user scan/inscan is unsupported. The GOMP reduction implementation is kept separate from our implementation because of how GOMP presents reduction data and computes the reductions. GOMP expects the privatized copies to be present even after a #pragma omp parallel reduction(task:...) region has ended so the data is stored inside GOMP's uintptr_t* data pseudo-structure. This style is tightly coupled with GCC compiler codegen. There also isn't any init(), combiner(), fini() functions in GOMP's codegen so the two implementations were to disparate to try to wrap GOMP's around our own. Differential Revision: https://reviews.llvm.org/D98806	2021-04-16 16:36:31 -05:00
Peyton, Jonathan L	5ebbb366c4	[OpenMP] Allow affinity to re-detect for child processes Current atfork() handler for child processes does not reset the affinity masks array which prevents users from setting their own affinity in child processes. Differential Revision: https://reviews.llvm.org/D99218	2021-04-16 16:34:02 -05:00
Hansang Bae	9b98497b44	[OpenMP] Add omp_target_is_accessible() to header files -- Added omp_target_is_accessible to the header files -- Added missing const qualifier to device memory routines Differential Revision: https://reviews.llvm.org/D100420	2021-04-16 07:54:15 -05:00
Joseph Huber	83d4b2e2e0	[OpenMP] Add info for device table changes Summary: This patch adds a feature to print information whenever the host-device pointer mapping table is changed by inserting or removing an entry. This introduces a new bit field for LIBOMPTARGET_INFO at position 0x8. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100600	2021-04-15 18:39:48 -04:00
Hansang Bae	77dc7b4653	[OpenMP] Fix printing routine for OMP_TOOL_VERBOSE_INIT Also fixed typo in the verbose message. Differential Revision: https://reviews.llvm.org/D100414	2021-04-14 07:55:26 -05:00
Hansang Bae	3da61ddae7	[OpenMP] Define omp_is_initial_device() variants in omp.h omp_is_initial_device() is marked as a built-in function in the current compiler, and user code guarded by this call may be optimized away, resulting in undesired behavior in some cases. This patch provides a possible fix for such cases by defining the routine as a variant function and removing it from builtin list. Differential Revision: https://reviews.llvm.org/D99447	2021-04-06 16:58:01 -05:00
Peyton, Jonathan L	2aebb7cb3c	[OpenMP] Fix incorrect KMP_STRLEN() macro The second argument to the strnlen_s(str, size) function should be sizeof(str) when str is a true array of characters with known size (instead of just a char*). Use type traits to determine if first parameter is a character array and use the correct size based on that trait. Differential Revision: https://reviews.llvm.org/D98209	2021-04-05 09:03:09 -05:00
Joseph Huber	0af4e74aef	[OpenMP][NFC] Fix typo in libomptarget error message Summary: There was a typo suggesting the user to use `LIBOMPTARGET_DEBUG` instead of `LIBOMPTARGET_INFO`	2021-04-01 12:45:28 -04:00
Joseph Huber	29338459fb	[OpenMP] Trim error messages in CUDA plugin Summary: Remove some of the error messages printed when the CUDA plugin fails. The current error messages can be confusing because they are the first error messages printed after the async stream finds an error. This means that the printed values aren't related to what caused the issue, but are simply the last asyncronous operation that succeeded on the device. Remove these as they can be misleading. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D99510	2021-03-29 12:20:19 -04:00
Alexey Bataev	0411b23319	[OPENMP]Map data field with l-value reference types. Added initial support dfor the mapping of the data members with l-value reference types. Differential Revision: https://reviews.llvm.org/D98812	2021-03-29 07:07:09 -07:00
Joseph Huber	16064e71e9	[OpenMP] Reset async stream properly upon failure Summary: If the call to `synchronize` fails, it will currently block the stream indefinitely if execution is continued from this point. Additionally, if the program exits it will trigger an assertion on the non-null value of the async queue and prevent the runtime from printing debugging information. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D99443	2021-03-26 19:05:06 -04:00
Hansang Bae	467f39249d	[OpenMP] Misc. changes that add or remove pointer/bound checks -- Added or moved checks to appropriate places. -- Removed ineffective null check where the pointer is already being dereferenced around the code. -- Initialized variables that can be used without definitions. -- Added call to dlclose/FreeLibrary in OMPT tool activation. -- Added a new build compiler definition. Differential Revision: https://reviews.llvm.org/D98584	2021-03-23 18:55:08 -05:00
Shilei Tian	2df65f87c1	[OpenMP] Fixed a crash in hidden helper thread It is reported that after enabling hidden helper thread, the program can hit the assertion `new_gtid < __kmp_threads_capacity` sometimes. The root cause is explained as follows. Let's say the default `__kmp_threads_capacity` is `N`. If hidden helper thread is enabled, `__kmp_threads_capacity` will be offset to `N+8` by default. If the number of threads we need exceeds `N+8`, e.g. via `num_threads` clause, we need to expand `__kmp_threads`. In `__kmp_expand_threads`, the expansion starts from `__kmp_threads_capacity`, and repeatedly doubling it until the new capacity meets the requirement. Let's assume the new requirement is `Y`. If `Y` happens to meet the constraint `(N+8)2^X=Y` where `X` is the number of iterations, the new capacity is not enough because we have 8 slots for hidden helper threads. Here is an example. ``` #include <vector> int main(int argc, char argv[]) { constexpr const size_t N = 1344; std::vector<int> data(N); #pragma omp parallel for for (unsigned i = 0; i < N; ++i) { data[i] = i; } #pragma omp parallel for num_threads(N) for (unsigned i = 0; i < N; ++i) { data[i] += i; } return 0; } ``` My CPU is 20C40T, then `__kmp_threads_capacity` is 160. After offset, `__kmp_threads_capacity` becomes 168. `1344 = (160+8)*2^3`, then the assertions hit. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D98838	2021-03-18 18:25:36 -04:00
Jon Chesterfield	626a31de15	[libomptarget] Add register usage info to kernel metadata Add register usage information to the runtime metadata so that it can be used during kernel launch (that change will be in a different commit). Add this information to the kernel trace. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D98829	2021-03-18 17:00:42 +00:00
Jon Chesterfield	dbf8f2b089	Revert "[libomptarget] Build amdgcn devicertl by default" This reverts commit `e23f3502d9`. It broke the build of openmp for clang built without amdgcn support. D98746, under review, would allow this to reland.	2021-03-17 11:34:44 +00:00
Hansang Bae	a6f9cb6adc	[OpenMP] Add runtime interface for OpenMP 5.1 error directive The proposed new interface is for supporting `at(execution)` clause in the error directive. Differential Revision: https://reviews.llvm.org/D98448	2021-03-16 08:55:25 -05:00
Johannes Doerfert	0a954a528b	[OpenMP][FIX] Repair accidental replacement of _shfl_sync with _shfl This was broken accidentally in D95752. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D98677	2021-03-15 22:46:00 -05:00
Jon Chesterfield	e23f3502d9	[libomptarget] Build amdgcn devicertl by default [libomptarget] Build amdgcn devicertl by default The cmake for this looks for an llvm install and does the right thing when building as part of enable_runtimes. It will probably do the right thing in other settings - at least, it won't try to build this with gcc. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98658	2021-03-15 23:17:50 +00:00
Peyton, Jonathan L	7085f04573	[OpenMP] Remove unused cpu_stackoffset member	2021-03-15 16:52:04 -05:00
Jon Chesterfield	bb38d7ff05	[libomptarget][nfc][amdgcn] Use precise triple for devicertl build	2021-03-15 20:24:13 +00:00
Jon Chesterfield	d0bc85f04a	[libomptarget][nfc] Drop unused DEVICE macro [libomptarget][nfc] Drop unused DEVICE macro Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98655	2021-03-15 20:12:50 +00:00
Jon Chesterfield	7da76aaaf4	[libomptarget] Build amdgpu plugin by default [libomptarget] Build amdgpu plugin by default This will build the amdgpu plugin if cmake is able to find the hsa runtime library, which will be the case if rocm is installed or if the hsa library has been installed somewhere cmake looks. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98654	2021-03-15 20:12:01 +00:00
Jon Chesterfield	bcb3f0f867	[libomptarget] Fix devicertl build [libomptarget] Fix devicertl build The target specific functions in target_interface are extern C, but the implementations for nvptx were mostly C++ mangling. That worked out as a quirk of DEVICE macro expanding to nothing, except for shuffle.h which only forward declared the functions with C++ linkage. Also implements GetWarpSize, as used by shuffle, and includes target_interface in nvptx target_impl.cu to help catch future divergence between interface and implementation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98651	2021-03-15 19:50:22 +00:00
Jon Chesterfield	f675b3df48	[libomptarget] Drop assert.h, use freestanding for amdgcn devicertl [libomptarget] Drop assert.h, use freestanding for amdgcn devicertl Promotes the runtime assert to a link time error for the unimplemented fallback functions. Enables amdgcn to build with only clang provided headers, which makes it less likely to break other builds when enabled. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98649	2021-03-15 18:50:09 +00:00
Jon Chesterfield	156842937f	[libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding [libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding The glibc headers are a periodic source of problems compiling the devicertl. This patch resolves the following error run into while building llvm on a slightly different linux system. ``` In file included from .../lib/clang/13.0.0/include/inttypes.h:21: In file included from /usr/include/inttypes.h:25: /usr/include/features.h:461:12: fatal error: 'sys/cdefs.h' file not found # include <sys/cdefs.h> ^~~~~~~~~~~~~ ``` As a second patch, removing assert.h from shuffle will let amdgcn build as -ffreestanding, at which point only the headers that clang itself provides are used and interactions with the host glibc are eliminated. Doing the same for nvptx is complicated by printf handling but also seems worthwhile. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98565	2021-03-15 16:54:58 +00:00
George Rokos	2468fdd9af	[libomptarget] Add allocator support for target memory This patch adds the infrastructure for allocator support for target memory. Three allocators are introduced for device, host and shared memory. The corresponding API functions have the llvm_ prefix temporarily, until they become part of the OpenMP standard. Differential Revision: https://reviews.llvm.org/D97883	2021-03-13 03:47:07 -08:00
Johannes Doerfert	5449fbb5d4	[OpenMP][NFC] Use `AsyncInfo` as the variable name for a `__tgt_async_info` Reviewed By: grokos, tianshilei1992 Differential Revision: https://reviews.llvm.org/D96444	2021-03-11 23:31:34 -06:00
Johannes Doerfert	66ba494b49	[OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant The shuffle idiom is differently implemented in our supported targets. To reduce the "target_impl" file we now move the shuffle idiom in it's own self-contained header that provides the implementation for AMDGPU and NVPTX. A fallback can be added later on. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95752	2021-03-11 23:31:30 -06:00
Joseph Huber	807466ef28	[OpenMP] Restore backwards compatibility for libomptarget Summary: The changes introduced in D87946 changed the API for libomptarget functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x but was not given a backward-compatible interface. This change will require people using Clang 13.x or 12.x to recompile their offloading programs. Reviewed By: jdoerfert cchen Differential Revision: https://reviews.llvm.org/D98358	2021-03-11 09:52:11 -05:00
Leonard Chan	baf637dcde	Rename top-level LICENSE.txt files to LICENSE.TXT This makes all the license filenames uniform across subprojects. Differential Revision: https://reviews.llvm.org/D98380	2021-03-10 21:26:24 -08:00
AndreyChurbanov	aaf16b80dd	[OpenMP] libomp: eliminate pause from atomic CAS loops For clang this change is NFC cleanup, because clang never calls atomic functions from runtime library. Basically, pause is good in spin-loops waiting for something. Atomic CAS loops do not wait for anything, each CAS failure means some other thread progressed. Performance experiments show that the pause only causes unnecessary slowdown on CPUs with slow pause instruction, no difference on CPUs with fast pause instruction, removal of the pause gives lesser binary size which is good. Differential Revision: https://reviews.llvm.org/D97079	2021-03-09 18:30:08 +03:00
AndreyChurbanov	e4492b6f31	[OpenMP] NFC: temporarily disable assertion until the bug with dependences is fixed	2021-03-08 22:18:30 +03:00
Shilei Tian	c41ae246ac	[OpenMP][Clang][NVPTX] Only build one bitcode library for each SM In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on NVPTX target. We don't need to have macros in source code to select right functions based on CUDA version. we don't need to compile multiple bitcode libraries of different CUDA versions for each SM. We don't need to worry about future compatibility with newer CUDA version. `-target-feature +ptx61` is used in this patch, which corresponds to the highest PTX version that CUDA 9.2 can support. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97198	2021-03-08 12:03:04 -05:00
Peyton, Jonathan L	e2738b3758	[OpenMP] Fix potential integer overflow in dynamic schedule code Restrict the chunk_size * chunk_num to only occur for valid chunk_nums and reimplement calculating the limit to avoid overflow. Differential Revision: https://reviews.llvm.org/D96747	2021-03-08 09:43:05 -06:00
tlwilmar	97d000cfc6	Added API for "masked" construct via two entrypoints: __kmpc_masked, and __kmpc_end_masked. The "master" construct is deprecated. Changed proc-bind keyword from "master" to "primary". Use of both master construct and master as proc-bind keyword is still allowed, but deprecated. Remove references to "master" in comments and strings, and replace with "primary" or "primary thread". Function names and variables were not touched, nor were references to deprecated master construct. These can be updated over time. No new code should refer to master.	2021-03-05 09:29:57 -06:00
Joel E. Denny	d0eb25a643	[OpenMP] Encapsulate more in checkDeviceAndCtors This patch just encapsulates some repeated code. To do so, it relocates some functions from interface.cpp to omptarget.cpp. It also adjusts them to the LLVM coding style. This patch is almost NFC except some `DP` messages are a bit different. For example, messages like "Entering target region" are now emitted even if offload is disabled, but a subsequent "Offload is disabled" is then emitted. Reviewed By: jdoerfert, grokos Differential Revision: https://reviews.llvm.org/D97908	2021-03-04 12:03:42 -05:00
Joel E. Denny	bfe5452b93	[OpenMP] Fix lone target exit data Without this patch, an `omp target exit data` before the runtime is initialized produces a runtime error. This patch fixes that by changing `__tgt_target_data_end_mapper` to call `CheckDeviceAndCtors` like many other runtime routines. Discussed at <https://lists.llvm.org/pipermail/openmp-dev/2021-March/003920.html>. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D97907	2021-03-04 12:03:42 -05:00
Joel E. Denny	10c18c69f2	[OpenMP] Fix support for device as host Without this patch, when the offload device is set to `omp_get_initial_device()`, the runtime fails with an error diagnostic when entering target regions or target data regions. However, OpenMP 5.1, sec. 2.14.5 "target Construct", "Restrictions", p. 203, L3-5 states: > The device clause expression must evaluate to a non-negative integer > value that is less than or equal to the value of > omp_get_num_devices(). Sec. 3.7.7 "omp_get_initial_device", p. 412, L2-3 states: > The value of the device number is the value returned by the > omp_get_num_devices routine. Similarly, OpenMP 5.0, sec. 2.12.5 "target Construct", "Restrictions", p. 174 L30-32 states: > The device clause expression must evaluate to a non-negative integer > value less than the value of omp_get_num_devices() or to the value > of omp_get_initial_device(). This patch fixes this behavior by changing the runtime to behave as if offloading is disabled whenever it finds the offload device (either from a `device` clause or the default device) is set to the host device. In the case of mandatory offloading when `omp_get_num_devices() == 0`, it incorporates the behavior proposed for OpenMP 5.2 in OpenMP spec github issue 2669. Reviewed By: grokos, RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D97616	2021-03-04 12:03:42 -05:00
Hansang Bae	b6c2f538b2	[OpenMP] Add allocator support for target memory This is a preview of allocator support for target memory that depends on the offload runtime API which allocates memory as described below. llvm_omp_target_alloc_host(size_t size, int device_num); -- Returns non-migratable memory owned by host. -- Memory is accessible by host and device(s). llvm_omp_target_alloc_shared(size_t size, int device_num); -- Returns migratable memory owned by host and device. -- Memory is accessible by host and device. llvm_omp_target_alloc_device(size_t size, int device_num); -- Returns memory owned by device. -- Memory is only accessible by device. New memory space and predefined allocator names are -- llvm_omp_target_host_mem_space -- llvm_omp_target_shared_mem_space -- llvm_omp_target_device_mem_space -- llvm_omp_target_host_mem_alloc -- llvm_omp_target_shared_mem_alloc -- llvm_omp_target_device_mem_alloc Differential Revision: https://reviews.llvm.org/D96669	2021-03-02 16:45:12 -06:00
Alexey Bataev	0caf736d7e	[OPENMP50]Mapping of the subcomponents with the 'default' mappers. If the mapped structure has data members, which have 'default' mappers, need to map these members individually using their 'default' mappers. Differential Revision: https://reviews.llvm.org/D92195	2021-03-02 07:11:06 -08:00
Peyton, Jonathan L	e83380fccc	[OpenMP] Fix clang-cl build error regarding TSX intrinsics Fix for https://bugs.llvm.org/show_bug.cgi?id=49339 The CMake check for the RTM intrinsics needs the -mrtm flag to be set during the test. This way clang-cl correctly detects it has the _xbegin() intrinsic. Otherwise, the CMake check fails. Differential Revision: https://reviews.llvm.org/D97413	2021-03-02 07:47:42 -06:00
AndreyChurbanov	1df6e58e55	[OpenMP] libomp minor cleanup Cleanup changes: - check value read from file; - remove dead code; - make unsigned variable to read hexadecimal number to; - add debug assertion to check ref count. Differential Revision: https://reviews.llvm.org/D96893	2021-02-26 00:44:51 +03:00
AndreyChurbanov	4932101177	[OpenMP] libomp: fix ittnotify stack stitching for teams construct Stitching id could be overridden causing reference of destroyed object when number of teams is 1. The patch separates stitching id store location for teams and parallel nested in teams. Differential Revision: https://reviews.llvm.org/D96562	2021-02-26 00:23:24 +03:00
Peyton, Jonathan L	d12ae7db99	[OpenMP] Fix accidental addition of use omp_lib_kinds Fortran header accidentally had use omp_lib_kinds added inside a subroutine and function. This patch removes the lines.	2021-02-25 12:49:56 -06:00
Harmen Stoppels	a54f160b3a	Prefer /usr/bin/env xxx over /usr/bin/xxx where xxx = perl, python, awk Allow users to use a non-system version of perl, python and awk, which is useful in certain package managers. Reviewed By: JDevlieghere, MaskRay Differential Revision: https://reviews.llvm.org/D95119	2021-02-25 11:32:27 +01:00
Vyacheslav Zakharin	6baeeb9efa	[libomptarget] Fixed MSVC build fail caused by __attribute__((used)). Differential Revision: https://reviews.llvm.org/D97348	2021-02-24 09:59:39 -08:00
Joachim Protze	2fbce374c8	[OpenMP][Tests][NFC] rename macro to avoid naming clash Rename a macro use missed in `e0f3acc5d3`	2021-02-24 18:46:56 +01:00
Shilei Tian	e5da63d5a9	[OpenMP] Fixed a crash when offloading to x86_64 with target nowait PR#49334 reports a crash when offloading to x86_64 with `target nowait`, which is caused by referencing a nullptr. The root cause of the issue is, when pushing a hidden helper task in `__kmp_push_task`, it also maps the gtid to its shadow gtid, which is wrong. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97329	2021-02-24 12:37:30 -05:00
Joachim Protze	f3a72509a7	[OpenMP][Tests][NFC] lit might also be known as llvm-lit.py	2021-02-24 18:32:24 +01:00
Manoel Roemmer	542d9c2154	[libomptarget] Load images in order of registration This makes sure that images are loaded in the order in which they are registered with libomptarget. If a target can load multiple images and these images depend on each other (for example if one image contains the programs target regions and one image contains library code), then the order in which images are loaded can be important for symbol resolution (for example, in the VE plugin). In this case: because the same code exist in the host binaries, the order in which the host linker loads them (which is also the order in which images are registered with libomptarget) is the order in which the images have to be loaded onto the device. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95530	2021-02-24 18:15:41 +01:00
Joachim Protze	e0f3acc5d3	[OpenMP][Tests][NFC] rename macro to avoid naming clash Rename a macro and macro use missed in `35ab6d6390`	2021-02-24 18:13:28 +01:00
Joachim Protze	35ab6d6390	[OpenMP][Tests][NFC] rename macro to avoid naming clash When including <ostream>, the register_callback macro of the OMPT callback.h clashes with a function defined in ostream. This patch renames the macro and includes ompt into the macro name.	2021-02-24 18:03:54 +01:00
Shilei Tian	f6c2984a09	[OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM `ptx71` is not supported in release version of LLVM yet. As a result, the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned in D97004. Since the support in D97004 is just a WA for releease, and we'll not use it in the near future, using `ptx70` for CUDA 11 is feasible. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97195	2021-02-23 13:20:21 -05:00
Peyton, Jonathan L	56223b1e91	[OpenMP] Help static loop code avoid over/underflow This code alleviates some pathological loop parameters (lower, upper, stride) within calculations involved in the static loop code. It bounds the chunk size to the trip count if it is greater than the trip count and also minimizes problematic code for when trip count < nth. Differential Revision: https://reviews.llvm.org/D96426	2021-02-22 13:22:01 -06:00
Peyton, Jonathan L	1b968467c0	[OpenMP] Remove shutdown attempt on Windows process detach Only attempt shutdown if lpReserved is NULL. The Windows documentation states: When handling DLL_PROCESS_DETACH, a DLL should free resources such as heap memory only if the DLL is being unloaded dynamically (the lpReserved parameter is NULL). If the process is terminating (the lpReserved parameter is non-NULL), all threads in the process except the current thread either have exited already or have been explicitly terminated by a call to the ExitProcess function, which might leave some process resources such as heaps in an inconsistent state. In this case, it is not safe for the DLL to clean up the resources. Instead, the DLL should allow the operating system to reclaim the memory. Differential Revision: https://reviews.llvm.org/D96750	2021-02-22 13:15:06 -06:00
Peyton, Jonathan L	8c73be9d86	[OpenMP] Limit number of dispatch buffers This patch limits the number of dispatch buffers (used for loop worksharing construct) to between 1 and 4096. Differential Revision: https://reviews.llvm.org/D96749	2021-02-22 13:14:28 -06:00
Peyton, Jonathan L	55dff8b2e4	[OpenMP] Update HWLOC code for die level detection Differential Revision: https://reviews.llvm.org/D96748	2021-02-22 13:05:55 -06:00
AndreyChurbanov	1611e5473c	[OpenMP] libomp: cleanup some resource leaks Close mutexattr and condattr local objects to eliminate resource leaks. Differential Revision: https://reviews.llvm.org/D96892	2021-02-20 23:27:37 +03:00
Shilei Tian	309b00a42e	[OpenMP][NFC] clang-format the whole openmp project Same script as D95318. Test files are excluded. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D97088	2021-02-20 12:46:32 -05:00
Joel E. Denny	ef8b3b5ffd	[OpenMP] Fix nvptx CUDA_VERSION conversion As mentioned in PR#49250, without this patch, ptxas for CUDA 9.1 fails in the following two tests: - openmp/libomptarget/test/mapping/lambda_mapping.cpp - openmp/libomptarget/test/offloading/bug49021.cpp The error looks like: ``` ptxas /tmp/lambda_mapping-081ea9.s, line 828; error : Not a name of any known instruction: 'activemask' ``` The problem is that our cmake script converts CUDA version strings incorrectly: 9.1 becomes 9100, but it should be 9010, as shown in `getCudaVersion` in `clang/lib/Driver/ToolChains/Cuda.cpp`. Thus, `openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu` inadvertently enables `activemask` because it apparently becomes available in 9.2. This patch fixes the conversion. This patch does not fix the other two tests in PR#49250. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97012	2021-02-19 11:09:26 -05:00
Joel E. Denny	d2147b1a87	[OpenMP] Fix always,from and delete for data absent at exit Without this patch, there's a runtime error for those map types at exit from an "omp target data" or at "omp target exit data", but the spec says the list item should be ignored. This patch tests that fix in data_absent_at_exit.c, and it also improves other testing for data that is not fully present at exit. Reviewed By: grokos, RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D96999	2021-02-19 11:09:26 -05:00
Ron Lieberman	30c0d5b4c3	[OPENMP][AMDGCN] Improvements to print_kernel_trace (bit mask) allow bit masking to select various trace features. bit 0 => Launch tracing (stderr) bit 1 => timing of runtime (stdout) bit 2 => detailed launch tracing (stderr) bit 3 => timing goes to stdout instead of stderr example: LIBOMPTARGET_KERNEL_TRACE=7 does it all LIBOMPTARGET_KERNEL_TRACE=5 Launch + details LIBOMPTARGET_KERNEL_TRACE=2 timings + launch to stderr LIBOMPTARGET_KERNEL_TRACE=10 timings + launch to stdout Differential Revision: https://reviews.llvm.org/D96998	2021-02-19 06:47:22 -05:00
Shilei Tian	89827fd404	[OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1 CUDA 11.2 and CUDA 11.1 are all available now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97004	2021-02-18 21:04:39 -05:00
AndreyChurbanov	dab5d6c2eb	[OpenMP] fix race condition in test	2021-02-18 02:27:49 +03:00
Jon Chesterfield	53d7fd3762	[libomptarget][amdgcn] Remove lookup of .language msgpack field	2021-02-17 23:02:16 +00:00
AndreyChurbanov	cf1ddae7e3	[OpenMP][NFC] replaced 'dependencies' with 'dependences' in comments and debug prints	2021-02-18 00:38:18 +03:00
Alexey Bataev	60d71a286b	[OPENMP50]Allow overlapping mapping in target constructs. OpenMP 5.0 removed a lot of restriction for overlapped mapped items comparing to OpenMP 4.5. Patch restricts the checks for overlapped data mappings only for OpenMP 4.5 and less and reorders mapping of the arguments so, that present and alloc mappings are processed first and then all others. Differential Revision: https://reviews.llvm.org/D86119	2021-02-16 14:42:08 -08:00
Johannes Doerfert	2518cc65d2	[OpenMP][FIX] Avoid use of stack allocations in asynchronous calls As reported by Guilherme Valarini [0], we used to pass stack allocations to calls that can nowadays be asynchronous. This is arguably a problem and it will inevitably result in UB. To remedy the situation we allocate the locations as part of the AsyncInfoTy object. The lifetime of that object matches what we need for now. If the synchronization is not tied to the AsyncInfoTy object anymore we might need to have a different buffer construct in global space. This should be back-ported to LLVM 12 but needs slight modifications as it is based on refactoring patches we do not need to backport. [0] https://lists.llvm.org/pipermail/openmp-dev/2021-February/003867.html Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D96667	2021-02-16 15:38:11 -06:00
Johannes Doerfert	758b849931	[OpenMP] Unify omptarget API and usage wrt. `__tgt_async_info` This patch unifies our libomptarget API in two ways: - always pass a `__tgt_async_info` object, the Queue member decides if it is in use or not. - (almost) always synchronize in the interface layer and not in the omptarget layer. A side effect is that we now put all constructor and static initializer kernels in a stream too, if the device utilizes `__tgt_async_info`. The patch contains a TODO which can be addressed as we add support for asynchronous malloc and free in the plugin API. This is the only `synchronizeAsyncInfo` left in the omptarget layer. Site note: On a V100 system the GridMini performance for small sizes more than doubled. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96379	2021-02-16 15:38:06 -06:00
Johannes Doerfert	a2fc0d34db	[OpenMP] Move synchronization into `__tgt_async_info` The AsyncInfo should be passed everywhere and it should offer a way to ensure synchronization, given a libomptarget Device. This replaces D96431. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96438	2021-02-16 15:38:01 -06:00
Johannes Doerfert	942728763b	[OpenMP][NFC] Unify `target` API with other by passing a `__tgt_async_info` pointer Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96430	2021-02-16 15:37:56 -06:00
Johannes Doerfert	44f3022cdf	[OpenMP][NFC] Pass a DeviceTy, not the device number to `target` This unifies the API of `target` relative to `targetUpdateData` and such. Reviewed By: tianshilei1992, grokos Differential Revision: https://reviews.llvm.org/D96429	2021-02-16 15:37:51 -06:00
Johannes Doerfert	ea9395716e	[OpenMP][NFC] Clang format the libomptarget plugins Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96445	2021-02-16 15:37:46 -06:00
Johannes Doerfert	ad94fce845	[OpenMP][NFC] Eliminate sign comparison warning via explicit casts Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96812	2021-02-16 15:37:41 -06:00
Johannes Doerfert	9cd1e2228c	[OpenMP][NFC] Clang format libomptarget code (src & include) The struct and enum alignments are kept by disabling clang-format for that code region. Reviewed By: tianshilei1992, JonChesterfield, grokos Differential Revision: https://reviews.llvm.org/D96428	2021-02-16 15:37:35 -06:00
AndreyChurbanov	5631842d18	[OpenMP] NFC: fix test removing the target construct	2021-02-13 04:49:52 +03:00
AndreyChurbanov	091e8daa24	[OpenMP] fix test adding mapping of shared variables	2021-02-13 04:13:54 +03:00
Martin Storsjö	496ca4127e	[OpenMP] Silence more warning flags This silences warnings like these, in mingw builds with clang: runtime/src/kmp_atomic.h:1021:13: warning: '__kmpc_atomic_cmplx8_rd' has C-linkage specified, but returns user-defined type 'kmp_cmplx64' (aka '__kmp_cmplx64_t') which is incompatible with C [-Wreturn-type-c-linkage] runtime/src/z_Windows_NT_util.cpp:479:17: warning: cast from 'volatile void ' to 'type-parameter-0-0 ' drops volatile qualifier [-Wcast-qual] flag = (C )th->th.th_sleep_loc; runtime/src/z_Windows_NT_util.cpp:1321:14: warning: cast to 'void ' from smaller integer type 'DWORD' (aka 'unsigned long') [-Wint-to-void-pointer-cast] } else if ((void )exit_val != (void )th) { Differential Revision: https://reviews.llvm.org/D96585	2021-02-12 21:55:32 +02:00
Martin Storsjö	16428a8d91	[OpenMP] Avoid warnings about unused static functions on windows Add ifdefs around one function that only is used in unix build configurations. Add a void cast for a windows specific function that currently is unused but may be intended to be used at some point. Differential Revision: https://reviews.llvm.org/D96584	2021-02-12 21:55:31 +02:00
Martin Storsjö	b388c84c09	[OpenMP] Remove two entirely unused variables Differential Revision: https://reviews.llvm.org/D96583	2021-02-12 21:55:31 +02:00
Martin Storsjö	b3d84790fa	[OpenMP] Add void casts to silence unused variable warnings These variables are used only in certain build configurations, or marked with a todo comment indicating that they should be used/checked/reported. Differential Revision: https://reviews.llvm.org/D96582	2021-02-12 21:55:31 +02:00
Martin Storsjö	3f9519b768	[OpenMP] Only use #pragma comment(lib, ...) in MSVC build configurations MinGW build configurations don't support this pragma (unless compiling with clang, with -fms-extensions, and linking with lld), and at least clang warns about it. This library does end up linked by the cmake files anyway (as long as the check works properly). Differential Revision: https://reviews.llvm.org/D96581	2021-02-12 21:55:31 +02:00
Martin Storsjö	77632422bc	[OpenMP] Fix the check for libpsapi for i386 check_library_exists fails for stdcall functions, because that check doesn't include the necessary headers (and thus fails with an undefined reference to _EnumProcessModules, when the import library symbol actually is called _EnumProcessModules@16). Merge the two previous checks check_include_files and check_library_exists into one with check_c_source_compiles, and merge the variables that indicate whether it succeeded. Differential Revision: https://reviews.llvm.org/D96580	2021-02-12 21:55:30 +02:00
Jon Chesterfield	6f04addc8b	[libomptarget][amdgcn] Build amdgcn devicertl as openmp [libomptarget][amdgcn] Build amdgcn devicertl as openmp Change cmake to build as openmp and fix up some minor errors in the code. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D96533	2021-02-12 09:51:21 +00:00
AndreyChurbanov	838dcdb5fc	[OpenMP] libomp: minor changes to improve library performance Three minor changes in this patch: - added UNLIKELY hint to few rarely executed branches; - replaced couple of run time checks with debug assertions; - moved check of presence of ittnotify tool from inside the function call. Differential Revision: https://reviews.llvm.org/D95816	2021-02-12 00:43:13 +03:00
Hansang Bae	ffb21e7f05	[OpenMP] Enable omp_get_num_devices() on Windows This patch enables omp_get_num_devices() and omp_get_initial_device() on Windows by providing an alternative to dlsym on Windows, and proposes to add a new libomptarget entry, __tgt_get_num_devices(). Differential Revision: https://reviews.llvm.org/D96182	2021-02-11 14:53:48 -06:00
Nawrin Sultana	4692bb4a8a	[OpenMP] Add lower and upper bound in num_teams clause This patch adds lower-bound and upper-bound to num_teams clause according to OpenMP 5.1 specification. The initial number of teams created is implementation defined, but it will be greater than or equal to lower-bound and less than or equal to upper-bound. If num_teams clause is not specified, the number of teams created is implementation defined, but it will be greater or equal to 1. Differential Revision: https://reviews.llvm.org/D95820	2021-02-10 13:58:50 -06:00
Jon Chesterfield	56c446a878	[libomptarget][amdgcn] Tolerate deadstripped device_state variable [libomptarget][amdgcn] Tolerate deadstripped device_state variable The device_state variable may have been deadstripped. Similar to device_environment, leave detection of missing but used symbol to loader. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D96330	2021-02-09 16:29:53 +00:00
Jon Chesterfield	4756f76bce	[libomptarget][amdgcn] Tolerate deadstripped env variable [libomptarget][amdgcn] Tolerate deadstripped env variable Discovered by Pushpinder. If the device_environment variable is unused it can be deadstripped, in which case we should not abort due to it missing. This change is safe in that a missing symbol which is actually used can be reported by both linker and loader, and a missing unused symbol is better deadstripped than left in the image. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D96329	2021-02-09 11:58:37 +00:00
Jon Chesterfield	2fa4186d4e	[libomptarget][amdgcn] Fix language linkage post D95300, drop use of assert	2021-02-08 20:07:51 +00:00
Shilei Tian	b68a6b09e6	[OpenMP][libomptarget] Fixed an issue that device sync is skipped if the kernel doesn't have any argument Currently if there is not kernel argument, device synchronization will be skipped. This can lead to two issues: 1. If there is any device error, it will not be captured; 2. The target region might end before the kernel is done, which is not spec conformant. The test added in this patch only runs on NVPTX platform, although it will not be executed by Phab at all. It also requires `not` which is not available on most systems. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D96067	2021-02-04 20:14:24 -05:00
Shilei Tian	567b3f8841	[OpenMP][deviceRTLs] Drop `assert` in common parts of `deviceRTLs` The header `assert.h` needs to be included in order to use `assert` in the code. When building NVPTX `deviceRTLs` on a CUDA free system, it requires headers from `gcc-multilib`, which some systems don't have. This patch drops the use of `assert` in common parts of `deviceRTLs`. In light of `openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h`, a code block ``` if (!cond) __builtin_trap(); ``` is being used. The builtin will be translated to `call void @llvm.trap()`, and the corresponding PTX is `trap;`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95986	2021-02-04 12:39:43 -05:00
Shilei Tian	0f0ce3c12e	[OpenMP][NVPTX] Take functions in `deviceRTLs` as `convergent` OpenMP device compiler (similar to other SPMD compilers) assumes that functions are convergent by default to avoid invalid transformations, such as the bug (https://bugs.llvm.org/show_bug.cgi?id=49021). Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95971	2021-02-03 20:58:12 -05:00
Shilei Tian	3c31b78455	[OpenMP] Fixed an issue that taskwait doesn't work on detachable task D77609 mistakenly changed the bebavior of task waiting on detachable task that a detachable task is not waited, based on https://lists.llvm.org/pipermail/openmp-dev/2021-February/003836.html. This patch fixed it. Thank Raúl for the report. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95798	2021-02-03 13:12:43 -05:00
Peyton, Jonathan L	ffca74b8b8	[OpenMP] Fix sign comparison warnings from GCC New affinity patch introduced legitimate sign-compare warnings that clang doesn't report but GCC-10 does. This removes the warnings by changing two variables types to unsigned. Differential Revision: https://reviews.llvm.org/D95818	2021-02-02 10:52:16 -06:00
Joseph Huber	ed8943c087	[OpenMP][NFC] Adding FAQ Entry for errors with static libraries	2021-02-02 10:50:22 -05:00
Atmn Patel	b545667d0a	[OpenMP][Libomptarget] Remove possible harmful copy constructor call for RTLsTy From https://bugs.llvm.org/show_bug.cgi?id=48973, we know that `std::call_once(PM->RTLs.initFlag, &RTLsTy::LoadRTLs, PM->RTLs)` causes compile time problems in libstdc++v3 5.3.1. This is because there was a defect in the standard regarding the `call_once` (LWG 2442). This was fixed in libstdc++ soon thereafter, but there are likely other standard libraries where this will fail. By matching this function call with the other one, we fix this bug. Differential Revision: https://reviews.llvm.org/D95769	2021-02-01 20:13:03 -05:00
AndreyChurbanov	d7b12004bd	[OpenMP] libomp: implement nteams-var and teams-thread-limit-var ICVs The change includes OMP_NUM_TEAMS, OMP_TEAMS_THREAD_LIMIT env variables, omp_set_num_teams, omp_get_max_teams, omp_set_teams_thread_limit, omp_get_teams_thread_limit routines. Differential Revision: https://reviews.llvm.org/D95003	2021-02-01 22:54:11 +03:00
Shilei Tian	f0129cc35e	[OpenMP] Disable tests if FileCheck is not available in in-tree building FileCheck is required for OpenMP tests. The current detection can fail if building OpenMP in-tree when user sets `LLVM_INSTALL_TOOLCHAIN_ONLY=ON`. As a result, CMake will raise an error and the compilation will be broken. This patch fixed the issue. When `FileCheck` is not a target, tests will just be skipped. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D95689	2021-02-01 13:14:55 -05:00
Joseph Huber	fda4853998	[OpenMP] Fix seg fault in libomptarget when using Info with multiple threads Summary: One option for the LIBOMPTARGET_INFO environment variable is to print the current status of the device's data mappings. These are a shared resource among threads so this needs to be protected when using multiple streams. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95786	2021-02-01 11:21:57 -05:00
xgupta	94fac81fcc	[Branch-Rename] Fix some links According to the [[ https://foundation.llvm.org/docs/branch-rename/ \| status of branch rename ]], the master branch of the LLVM repository is removed on 28 Jan 2021. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D95766	2021-02-01 16:43:21 +05:30
Tobias Hieta	c3c02d0d5a	[OpenMP] Fix python3 compatibility in openmp's lit.cfg Differential Revision: https://reviews.llvm.org/D95669	2021-02-01 08:20:26 +01:00
Shilei Tian	26d38f6d20	[OpenMP][NVPTX] Refined CMake logic to choose compute capabilites This patch refines the logic to choose compute capabilites via the environment variable `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES`. It supports the following values (all case insensitive): - "all": Build `deviceRTLs` for all supported compute capabilites; - "auto": Only build for the compute capability auto detected. Note that this requires CUDA. If CUDA is not found, a CMake fatal error will be raised. - "xx,yy" or "xx;yy": Build for compute capabilities `xx` and `yy`. If `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES` is not set, it is equivalent to set it to `all`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95687	2021-01-30 15:14:48 -05:00
Jonathan Peyton	67773681c0	[OpenMP] Add environment variable to force monotonic dynamic scheduling This patch introduces a new environment variable to force monotonic behavior for users that absolutely need it. This is in anticipation of 5.0 change that uses non-monotonic behavior for dynamic scheduling by default. Fixes for that and the actual switch are coming soon. Differential Revision: https://reviews.llvm.org/D95263	2021-01-29 12:23:27 -06:00
Shilei Tian	7bc31018f7	[OpenMP][NFC] Added release note for new `deviceRTLs` and hidden helper task Added release note for new `deviceRTLs` and hidden helper task for LLVM 12. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95584	2021-01-29 13:13:03 -05:00
AndreyChurbanov	7f5ad0e071	[OpenMP] libomp: fix build by cl with vs2019 Replace VLA with dynamic allocation using alloca(). This fixes https://bugs.llvm.org/show_bug.cgi?id=48919. Differential Revision: https://reviews.llvm.org/D95627	2021-01-29 13:16:41 +03:00
AndreyChurbanov	ac70a53653	[OpenMP] NFC: disabled two flakey tests as the bug in libomp not fixed yet	2021-01-29 00:54:13 +03:00
Shilei Tian	1b19c42302	[OpenMP][deviceRTLs] Separate declaration of target dependent functions from `target_impl.h` This patch created a new header file `target_interface.h` for declarations of all target dependent functions. All future targets can get things work by simply implementing all functions declared in the header and macros/data same as each `target_impl.h`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95300	2021-01-28 08:14:33 -05:00
Shilei Tian	5a64794bba	[OpenMP][NVPTX] Added the missing -O1 when building NVPTX bitcode libraries In the past `-O1` was used when building NVPTX bitcode libraries. After we switched to OpenMP, `-O1` was missing by mistake, leading to a huge performance regression. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95545	2021-01-28 08:13:38 -05:00
Shilei Tian	19248d30e4	[OpenMP][deviceRTLs] Added `[[clang::loader_uninitialized]]` explicitly `[[clang::loader_uninitialized]]` is in macro `SHARED` but it doesn't work for array like `parallelLevel`, so the variable will be zero initialized. There is also a similar issue for `omptarget_nvptx_device_State` which is in global address space. Its c'tor is also generated, which was not in the past when building the `deviceRTLs` with CUDA. In this patch, we added the attribute to the two variables explicitly. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95550	2021-01-28 08:12:49 -05:00
Shilei Tian	c571b16834	[OpenMP] Disabled profiling in `libomp` by default to unblock link errors Link error occurred when time profiling in libomp is enabled by default because `libomp` is assumed to be a C library but the dependence on `libLLVMSupport` for profiling is a C++ library. Currently the issue blocks all OpenMP tests in Phabricator. This patch set a new CMake option `OPENMP_ENABLE_LIBOMP_PROFILING` to enable/disable the feature. By default it is disabled. Note that once time profiling is enabled for `libomp`, it becomes a C++ library. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95585	2021-01-28 07:24:32 -05:00
Vyacheslav Zakharin	0fc90873b2	[libomptarget][NFC] Link plugins with threads support library due to std::call_once usage. Differential Revision: https://reviews.llvm.org/D95572	2021-01-27 19:26:18 -08:00
Atmn Patel	8a77056256	[OpenMP][Libomptarget] Fix conditional in CMake for remote plugin The remote offloading plugin's CMakeLists was trying to build if its flag was enabled even if it didn't find gRPC/protobuf. The conditional was wrong, it's fixed by this. Differential Revision: https://reviews.llvm.org/D95574	2021-01-27 21:28:25 -05:00
Shilei Tian	fb12df4a8e	[OpenMP][NVPTX] Disable building NVPTX deviceRTL by default on a non-CUDA system D95466 dropped CUDA to build NVPTX deviceRTL and enabled it by default. However, the building requires some libraries that are not available on non-CUDA system by default, which could break the compilation. This patch disabled the build by default. It can be enabled with `LIBOMPTARGET_BUILD_NVPTX_BCLIB=ON`. Reviewed By: kparzysz Differential Revision: https://reviews.llvm.org/D95556	2021-01-27 17:06:14 -05:00
Peyton, Jonathan L	8e67134364	[OpenMP] Fix misleading warning for OMP_PLACES When OMP_PLACES contains an invalid value, the warning informs the user that the fallback is OMP_PLACES=threads, but the actual internal setting is OMP_PLACES=cores and is detected as such with KMP_SETTINGS=1. This patch informs the user that OMP_PLACES=cores is being used instead of OMP_PLACES=threads. Differential Revision: https://reviews.llvm.org/D95170	2021-01-27 14:27:24 -06:00
Peyton, Jonathan L	598c590b3c	[OpenMP] Add cpuid leaf 1f topology discovery This patch adds the new algorithm for topology discovery using cpuid leaf 1f. Only the new die level is detected and integrated into the current affinity mechanisms including KMP_AFFINITY (granularity level and compact/scatter algorithm), OMP_PLACES=dies, and KMP_HW_SUBSET. Differential Revision: https://reviews.llvm.org/D95157	2021-01-27 14:27:23 -06:00
Peyton, Jonathan L	9f87c6b47d	[OpenMP] Fix HWLOC topology detection for 2.0.x HWLOC 2.0 has numa nodes as separate children and are not in the main parent/child topology tree anymore. This change takes this into account. The main topology detection loop in the create_hwloc_map() routine starts at a hardware thread within the initial affinity mask and goes up the topology tree setting the socket/core/thread labels correctly. This change also introduces some of the more generic changes that the future kmp_topology_t structure will take advantage of including a generic ratio & count array (finding all ratios of topology layers like threads/core cores/socket and finding all counts of each topology layer), generic radix1 reduction step, generic uniformity check, and generic printing of topology (en_US.txt) Differential Revision: https://reviews.llvm.org/D95156	2021-01-27 14:27:23 -06:00
Giorgis Georgakoudis	1e59c1a898	[OpenMP][Libomptarget] Fix check-libomptarget The check-libomptarget fails when building with LLVM_ENABLE_PROJECTS. This is because test configuration misses the path to libomp.so and libLLVMSupport.so when time profiling is enabled (both libraries have the same path when building). This patch add the path to the configuration. Reviewed By: vzakhari Differential Revision: https://reviews.llvm.org/D95376	2021-01-27 06:46:40 -08:00

... 3 4 5 6 7 ...

1958 Commits