llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexey Bataev	80256605f8	[OpenMP] support depend clause for taskwait directive, by Deepak Eachempati. This patch adds clang (parsing, sema, serialization, codegen) support for the 'depend' clause on the 'taskwait' directive. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D113540	2021-11-19 06:30:17 -08:00
Peyton, Jonathan L	a733b18bdb	[OpenMP][libomp] Enable HWLOC topology detection of multiple CPU kinds Teach the HWLOC topology method how to detect Atom and Core types so hybrid CPUs are properly detected and represented when using the HWLOC topology method. Differential Revision: https://reviews.llvm.org/D112270	2021-11-17 16:30:18 -06:00
Peyton, Jonathan L	286094af9b	[OpenMP][libomp] Improve Windows Processor Group handling within topology The current implementation of Windows Processor Groups has a separate topology method to handle them. This patch deprecates that specific method and uses the regular CPUID topology method by default and inserts the Windows Processor Group objects in the topology manually. Notes: * The preference for processor groups is lowered to a value less than socket so that the user will see sockets in the KMP_AFFINITY=verbose output instead of processor groups when sockets=processor groups. * The topology's capacity is modified to handle additional topology layers without the need for reallocation. * If a user asks for a granularity setting that is "above" the processor group layer, then the granularity is adjusted "down" to the processor group since this is the coarsest layer available for threads. Differential Revision: https://reviews.llvm.org/D112273	2021-11-17 16:29:01 -06:00
Peyton, Jonathan L	1dd797168e	[OpenMP][libomp] Add support for offline CPUs in Linux If some CPUs are offline, then make sure they are not included in the fullMask even if norespect is given to KMP_AFFINITY. Differential Revision: https://reviews.llvm.org/D112274	2021-11-17 16:28:01 -06:00
Peyton, Jonathan L	a0afb9d0fc	[OpenMP][libomp] Allow users to specify KMP_HW_SUBSET in any order Remove restriction forcing users to specify the KMP_HW_SUBSET value in topology order. This patch sorts the user KMP_HW_SUBSET value before trying to apply it. For example: 1s,4c,2t is equivalent to 2t,1s,4c Differential Revision: https://reviews.llvm.org/D112027	2021-11-17 15:27:37 -06:00
Jonathan Peyton	c46becf500	[OpenMP][libomp][NFC] Remove non-ASCII apostrophe in comment	2021-11-17 14:46:40 -06:00
Martin Storsjö	9b2b549837	[OpenMP] Silence build warnings when built with MinGW There's an attempt to upstream this change in https://github.com/intel/ittapi/pull/25 too. Differential Revision: https://reviews.llvm.org/D114069	2021-11-17 18:51:18 +02:00
Joseph Huber	374cd0fb61	[OpenMP] Fix initializer not working on AMDGPU The RAII class used for debugging RTL entry used a shared variable to keep track of the current depth. This used a global initializer, which isn't supported on AMDGPU. This patch removes the initializer and instead sets it to zero when the state is initialized in the runtime. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D113963	2021-11-16 08:17:15 -05:00
Shao-Ce SUN	0c660256eb	[NFC] Trim trailing whitespace in *.rst	2021-11-15 09:17:08 +08:00
Nawrin Sultana	7a5680233e	[OpenMP] Set default blocktime to 0 for hybrid cpu Differential Revision:https://reviews.llvm.org/D113012	2021-11-12 12:05:35 -06:00
Joel E. Denny	c9dfe322ee	[OpenMP] Fix main thread barrier for Pascal and amdgpu Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113602	2021-11-12 11:18:45 -05:00
Bran Hagger	9f15cacc2e	[OpenMP] Allow building libomp using Microsoft Visual C++ naming scheme Differential Revision: https://reviews.llvm.org/D110354	2021-11-11 13:11:56 -06:00
Jon Chesterfield	27177b82d4	[OpenMP] Lower printf to __llvm_omp_vprintf Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf` which takes the same const char, void arguments as cuda vprintf and also passes the size of the void* alloca which will be needed by a non-stub implementation of `__llvm_omp_vprintf` for amdgpu. This removes the amdgpu link error on any printf in a target region in favour of silently compiling code that doesn't print anything to stdout. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112680	2021-11-10 15:30:56 +00:00
Joachim Protze	52da6f562e	Revert "[openmp] Add OMPT initialization in libomptarget" Reverting initial OMPT for target implementation in favor of a different implementation. This reverts commit `3bc8ce5dd7`.	2021-11-10 12:44:25 +01:00
Atmn Patel	737c4a2673	[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit cleaner. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113421	2021-11-09 15:11:05 -05:00
Jonathan Peyton	48b67dca2c	[OpenMP][libomp][CMake] use uppercase_CMAKE_BUILD_TYPE Have standalone builds define uppercase_CMAKE_BUILD_TYPE and use it. llvm/CMakeLists.txt defines uppercase_CMAKE_BUILD_TYPE for regular LLVM builds with OpenMP enabled. Differential Revision: https://reviews.llvm.org/D112951	2021-11-09 11:03:04 -06:00
Atmn Patel	ef717f3852	Revert "[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files" This reverts commit `81a7cad2ff`.	2021-11-09 02:10:42 -05:00
Atmn Patel	81a7cad2ff	[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit cleaner. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113421	2021-11-09 01:52:52 -05:00
Vyacheslav Zakharin	1b409df613	[NFC] Initial documentation for declare target indirect support. Differential Revision: https://reviews.llvm.org/D110193	2021-11-08 15:12:03 -08:00
Jon Chesterfield	0fa45d6d80	Revert "[OpenMP] Lower printf to __llvm_omp_vprintf" This reverts commit `db81d8f6c4`.	2021-11-08 20:28:57 +00:00
Jon Chesterfield	dc9edc6a6d	Revert "[openmp] Fix build, test passes on CI unexpectedly" This reverts commit `c499d690cd`.	2021-11-08 20:28:52 +00:00
Jon Chesterfield	c499d690cd	[openmp] Fix build, test passes on CI unexpectedly	2021-11-08 18:45:27 +00:00
Jon Chesterfield	db81d8f6c4	[OpenMP] Lower printf to __llvm_omp_vprintf Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf` which takes the same const char, void arguments as cuda vprintf and also passes the size of the void* alloca which will be needed by a non-stub implementation of `__llvm_omp_vprintf` for amdgpu. This removes the amdgpu link error on any printf in a target region in favour of silently compiling code that doesn't print anything to stdout. The exact set of changes to check-openmp probably needs revision before commit Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112680	2021-11-08 18:38:00 +00:00
Quinn Pham	c3b15b71ce	[NFC] Inclusive Language: change master to main for .chm files [NFC] As part of using inclusive language within the llvm project, this patch replaces master with main when referring to `.chm` files. Reviewed By: teemperor Differential Revision: https://reviews.llvm.org/D113299	2021-11-08 08:23:04 -06:00
@t-msn	0808d956c4	[OpenMP] libomp: Fix handling of barrier pattern environment variables It is better to set all barrier patterns to use "dist" when at least one environment variable specifies "dist". Otherwise if only one environment is set to "dist" and others left blank inadvertently, it would result in mixing dist barrier with default hyper barrier pattern. Differential Revision: https://reviews.llvm.org/D112597	2021-11-08 15:01:26 +03:00
Jon Chesterfield	4f4c826e75	[libomptarget] Drop remote plugin cmake version requirement to match llvm LLVM docs at https://llvm.org/docs/CMake.html#quick-start state 3.13.4 Reviewed By: atmnpatel Differential Revision: https://reviews.llvm.org/D113271	2021-11-05 17:34:28 +00:00
Johannes Doerfert	d4b1cf8f9c	[OpenMP] Build device runtimes for sm_86 Reviewed By: carlo.bertolli Differential Revision: https://reviews.llvm.org/D113111	2021-11-04 17:54:59 -05:00
Johannes Doerfert	ab9f3f5d25	[OpenMP] Introduce the keepAlive function into the old device RT Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D113110	2021-11-04 17:54:56 -05:00
Johannes Doerfert	93bebdc78f	[OpenMP][NFCI] Cleanup new device RT mapping interface Minimize the `impl` interface and clean up some uses of mapping functions. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D112154	2021-11-04 17:54:53 -05:00
Johannes Doerfert	73720c8059	[OpenMP][FIX] Introduce and use a simple generic-mode barrier Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was OK to be used in the custom state machine. Now that SPMD barriers are assumed to be aligned we need to use a "generic" barrier in places that are not aligned. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112893	2021-11-02 23:22:01 -05:00
Johannes Doerfert	ccb5d2726a	[OpenMP][FIX] Avoid a race between initialization and first state reads When we pick state 0 to initialize state but thread N is going to be the "main thread", in generic mode, we would require extra synchronization. Instead, we should pick the main thread to initialize state in generic mode and any thread in SPMD mode. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112874	2021-11-02 23:21:49 -05:00
Med Ismail Bennani	797b50d4be	Revert "Use `GNUInstallDirs` to support custom installation dirs. -- LLVM" This reverts commit `6fd2db04d0` since it broke GreenDragon LLDB-Incremental bot: https://green.lab.llvm.org/green/job/lldb-cmake/37560/console Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>	2021-11-02 19:11:44 +01:00
John Ericson	6fd2db04d0	Use `GNUInstallDirs` to support custom installation dirs. -- LLVM This is a new draft of D28234. I previously did the unorthodox thing of pushing to it when I wasn't the original author, but since this version - Uses `GNUInstallDirs`, rather than mimics it, as the original author was hesitant to do but others requested. - Is much broader, effecting many more projects than LLVM itself. I figured it was time to make a new revision. I am using this patch (and many back-ports) as the basis of https://github.com/NixOS/nixpkgs/pull/111487 for my distro (NixOS). It looked like people were generally on board in D28234, but I make note of this here in case extra motivation is useful. --- As pointed out in the original issue, a central tension is that LLVM already has some partial support for these sorts of things. For example `LLVM_LIBDIR_SUFFIX`, or `COMPILER_RT_INSTALL_PATH`. Because it's not quite clear yet what to do about those, we are holding off on changing libdirs and `compiler-rt`. for this initial PR. --- On the advice of @lebedev.ri, I am splitting this up a bit per subproject, starting with LLVM. To allow it to be more easily reviewed. This and the subsequent patch must be landed together, as this will not build alone. But the rest can be landed on their own. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D100810	2021-11-02 10:23:30 -04:00
Shilei Tian	025f549240	[OpenMP][DeviceRTL] Fixed an issue that causes hang in SU3 The synchronization at the end of parallel region cannot make sure all threads exit the scope. As a result, the assertions right after it might be hit, and further the `state::assumeInitialState(IsSPMD)` in `__kmpc_target_deinit` may not hold as well. We either add a synchronization right after the parallel region, or remove the assertions and assuptions. Here we choose the first one as those assertions and assumptions can help optimizations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112861	2021-10-30 14:44:29 -04:00
Kazu Hirata	3cfc1757c5	Ensure newlines at the end of files (NFC)	2021-10-29 20:26:09 -07:00
Joseph Huber	927c74d4da	[OpenMP] Fix assert macro expr Summary: A previous patch changed the check and mistakenly only did `!expr` when this is a macro expansion and could only apply to the left side of an expression.	2021-10-29 17:44:13 -04:00
Joseph Huber	2c6a4e5678	[OpenMP] Use the assertion formatting from assert.h This patch changes the `assert_assume` function used for internal assumptions in the device runtime to use a more standard formatting for the assumption message. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112842	2021-10-29 16:44:01 -04:00
Joseph Huber	35f42340a2	[OpenMP][Docs] Add documentation for device RTL debugging Add documentation for the debugging features in the OpenMP device runtime library. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112010	2021-10-29 14:57:14 -04:00
Joseph Huber	6dd791bca8	[OpenMP] Check output of malloc in the device for debug A common problem is the device running out of global heap memory and crashing due to a nullptr dereference when using the data sharing stack. This explicitly checks that a nullptr was not returned by malloc when debugging field 1 is enabled. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112005	2021-10-29 14:57:12 -04:00
Joseph Huber	74f91741b6	[OpenMP] Use function tracing RAII for runtime functions. This patch adds support for using function tracing features to track the executino of runtime functions in the device runtime library. This is enabled by first compiling the new runtime with `-fopenmp-target-debug=3` and running with `LIBOMPTARGET_DEVICE_RTL_DEBUG=3`. The output only tracks team 0 and thread 0 so there isn't much output when using a generic region. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112002	2021-10-29 14:57:11 -04:00
Jon Chesterfield	4d50803ce4	[libomptarget] Build DeviceRTL for amdgpu Passes same tests as the current deviceRTL. Includes cmake change from D111987. CI is showing a different set of pass/fails to local, committing this without the tests enabled by default while debugging that difference. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227	2021-10-28 12:34:01 +01:00
Jon Chesterfield	22bd75be70	[openmp] Fix a git misfire in `cf37a94c1e`	2021-10-28 01:35:25 +01:00
Jon Chesterfield	6c7b203d1d	Revert "[libomptarget] Build DeviceRTL for amdgpu" - more tests failing on CI than failed locally when writing this patch This reverts commit `33427fdb7b`.	2021-10-28 01:01:53 +01:00
Jon Chesterfield	cf37a94c1e	[openmp] Add amdgpu impl missed from D112153	2021-10-28 00:55:53 +01:00
Jon Chesterfield	33427fdb7b	[libomptarget] Build DeviceRTL for amdgpu Passes same tests as the current deviceRTL. Includes cmake change from D111987. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227	2021-10-28 00:41:45 +01:00
Johannes Doerfert	48877525cf	[OpenMP] Remove obsolete external interface for device RT We do not generate _serialized_parallel calls in device mode, no need for an external API. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D112145	2021-10-27 18:22:35 -05:00
Johannes Doerfert	5102c3c61e	[OpenMP][FIX] Do not adjust the level after the environment was popped Exiting a data environment will reset all values, it is wrong to adjust them afterwards. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112144	2021-10-27 18:22:33 -05:00
Johannes Doerfert	b16aadf0a7	[OpenMP] Introduce aligned synchronization into the new device RT We will later use the fact that a barrier is aligned to reason about thread divergence. For now we introduce the assumption and some more documentation. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112153	2021-10-27 18:22:31 -05:00
Johannes Doerfert	ef922c692f	[OpenMP][FIX] Query proper thread ID information to support nesting The OpenMP thread ID is not the hardware thread ID if we have nesting. We need to ask the runtime properly to ensure correct results. Note that the loop interface is going to change soon so we do not adjust it now but simply ignore the extra argument. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111950	2021-10-27 18:18:44 -05:00
Johannes Doerfert	4c88341d17	[OpenMP][FIX] Do check the level before return team size The team size could/should be an ICV but since we know it is either 1 or a value we can leave it in the team state for now. However, we still need to determine if the current level is nested before we use it. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D111949	2021-10-27 18:18:42 -05:00
Johannes Doerfert	dc72960967	[OpenMP][FIX] Do not dereference a potential nullptr The first thread state in the new GPU runtime doesn't have a previous one and we should not dereference the nullptr placeholder. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111946	2021-10-27 18:18:39 -05:00
AndreyChurbanov	a64797b5b8	[OpenMP][NFC] disable test on power because of -mlong-double-80 option	2021-10-27 16:54:44 +03:00
AndreyChurbanov	c704b25b44	[OpenMP] libomp: Fix possible NULL dereference. According to dlsym description, the value of symbol could be NULL, and there is no error in this case. Thus dlerror will also return NULL in this case. We need to check the value returned by dlerror before printing it. Differential Revision: https://reviews.llvm.org/D112174	2021-10-27 16:54:44 +03:00
Vignesh Balasubramanian	b0277bef97	[OpenMP][OMPD] Implementation of OMPD debugging library - libompd. This is a continuation of the review: https://reviews.llvm.org/D100183 It contains routines that retrieve OpenMP ICV values for OMPD. Reviewed By: @hbae Differential Revision: https://reviews.llvm.org/D100184	2021-10-27 16:31:19 +05:30
Jon Chesterfield	e42e5785ad	[libomptarget][nfc]Generalise DeviceRTL cmake to allow building for amdgpu Essentially moves the foreach over sm integers into a macro and instantiates it for nvptx. NFC in that the macro is not presently instantiated for amdgpu as the corresponding code doesn't compile yet. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111987	2021-10-26 21:18:21 +01:00
Ron Lieberman	be03ef3ed1	[openmp][lit] Add support to OpenMP lit.cfg for ROCR_VISIBLE_DEVICES env-var add support for ROCR_VISIBLE_DEVICES similar to name and purpose as CUDA_VISIBLE_DEVICES Differential Revision: https://reviews.llvm.org/D112503	2021-10-26 13:46:42 +00:00
Georgios Rokos	2feafa2e46	[libomptarget][NFC] Add comment explaining why we pass argument bases and offsets as two separate entities to the plugins.	2021-10-25 14:51:14 -07:00
Shilei Tian	2a30c03c62	[OpenMP][Offloading] Only get trip count if team construct Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D112475	2021-10-25 17:16:14 -04:00
AndreyChurbanov	e38a1deb66	[OpenMP] libomp: disable definitions of 5.1 atomics for non-x86 arch. Declarations of 5.1 atomic entries were added under "#if KMP_ARCH_X86 \|\| KMP_ARCH_X86_64" in kmp_atomic.h, but definitions of the functions missed architecture guard in kmp_atomic.cpp. As a result mangled symbols were available on non-x86 architecture. The patch eliminates these unexpected symbols from the library. Differential Revision: https://reviews.llvm.org/D112261	2021-10-25 21:17:26 +03:00
Vladimir Inđić	f41d08540b	[OpenMP][OMPT] thread_num determination during execution of nested serialized parallel regions __ompt_get_task_info_internal function is adapted to support thread_num determination during the execution of multiple nested serialized parallel regions enclosed by a regular parallel region. Consider the following program that contains parallel region R1 executed by two threads. Let the worker thread T of region R1 executes serialized parallel regions R2 that encloses another serialized parallel region R3. Note that the thread T is the master thread of both R2 and R3 regions. Assume that __ompt_get_task_info_internal function is called with the argument "ancestor_level == 1" during the execution of region R3. The function should determine the "thread_num" of the thread T inside the team of region R2, whose implicit task is at level 1 inside the hierarchy of active tasks. Since the thread T is the master thread of region R2, one should expected that "thread_num" takes a value 0. After the while loop finishes, the following stands: "lwt != NULL", "prev_lwt == NULL", "prev_team" represents the team information about the innermost serialized parallel region R3. This results in executing the assignment "thread_num = prev_team->t.t_master_tid". Note that "prev_team->t.t_master_tid" was initialized at the moment of R2’s creation and represents the "thread_num" of the thread T inside the region R1 which encloses R2. Since the thread T is the worker thread of the region R1, "the thread_num" takes value 1, which is a contradiction. This patch proposes to use "lwt" instead of "prev_lwt" when determining the "thread_num". If "lwt" exists, the task at the requested level belongs to the serialized parallel region. Since the serialized parallel region is executed by one thread only, the "thread_num" takes value 0. Similarly, assume that __ompt_get_task_info_internal function is called with the argument "ancestor_level == 2" during the execution of region R3. The function should determine the "thread_num" of the thread T inside the team of region R1. Since the thread is the worker inside the region R1, one should expected that "thread_num" takes value 1. After the loop finishes, the following stands: "lwt == NULL", "prev_lwt != NULL", "prev_team" represents the team information about the innermost serialized parallel region R3. This leads to execution of the assignment "thread_num = 0", which causes a contradiction. Ignoring the "prev_lwt" leads to executing the assignment "thread_num = prev_team->t.t_master_tid" instead. From the previous explanation, it is obvious that "thread_num" takes value 1. Note that the "prev_lwt" variable is marked as unnecessary and thus removed. This patch introduces the test case which represents the OpenMP program described earlier in the summary. Differential Revision: https://reviews.llvm.org/D110699	2021-10-25 18:21:20 +02:00
Vladimir Inđić	f2410bfb1c	[OpenMP][OMPT][clang] task frame support fixed in __kmpc_fork_call __kmp_fork_call sets the enter_frame of the active task (th_curren_task) before new parallel region begins. After the region is finished, the enter_frame is cleared. The old implementation of __kmpc_fork_call didn’t clear the enter_frame of active task. Also, the way of initializing the enter_frame of the active task was wrong. Consider the following two OpenMP programs. The first program: Let R1 be the serialized parallel region that encloses another serialized parallel region R2. Assume that thread that executes R2 is going to create a new serialized parallel region R3 by executing __kmpc_fork_call. This thread is responsible to set enter_frame of R2's implicit task. Note that the information about R2's implicit task is present inside master_th->th.th_current_task at this moment, while lwt represents the information about R1's implicit task. The old implementation uses lwt and resets enter_frame of R1's implicit task instead of R2's implicit task. The new implementation uses master_th->th.th_current_task instead. The second program: Consider the OpenMP program that contains parallel region R1 which encloses an explicit task T. Assume that thread should create another parallel region R2 during the execution of the T. The __kmpc_fork_call is responsible to create R2 and set enter frame of T whose information is present inside the master_th->th.th_current_task. Old implementation tries to set the frame of parent_team->t.t_implicit_task_taskdata[tid] which corresponds to the implicit task of the R1, instead of T. Differential Revision: https://reviews.llvm.org/D112419	2021-10-25 18:21:19 +02:00
Joachim Protze	7368227965	[OpenMP][Tests] Test omp_get_wtime for invariants As discussed in D108488, testing for invariants of omp_get_wtime would be more reliable than testing for duration of sleep, as return from sleep might be delayed due to system load. Alternatively/in addition, we could compare the time measured by omp_get_wtime to time measured with C++11 chrono (for portability?). Differential Revision: https://reviews.llvm.org/D112458	2021-10-25 18:20:59 +02:00
Joachim Protze	3f229f42b7	[OpenMP][Tests][NFC] Actually check for test outcome The CHECK: line in the test had no effect, because the test does not pipe to FileCheck. Since the test only checks for a single value, encode the result in the return value of the test.	2021-10-25 18:20:12 +02:00
Joachim Protze	047890bc3f	[OpenMP][Tests][NFC] Mark tests trying to link COI as unsupported For some tests with target-related functionality icc 18/19 tries to link libioffload_target.so.5, which fails for missing COI symbols.	2021-10-25 18:20:12 +02:00
Joachim Protze	d7fdd236d5	[OpenMP][Tests][NFC] Replace atomic increment by reduction Also mark the test as unsupported by intel-21, because the test does not terminate	2021-10-25 18:20:12 +02:00
Joachim Protze	38f78dd2e2	[OpenMP][Tools][NFC] Fix C99-style declaration of iteration variables Where possible change to declare the variable before the loop. Where not possible, specifically request -std=c99 (could be limited to specific compilers like icc).	2021-10-25 18:20:12 +02:00
Joachim Protze	d29a7d23ec	[OpenMP][Tools][NFC] Pass intel license ENV to lit	2021-10-25 18:20:11 +02:00
Kazu Hirata	d8e4170b0a	Ensure newlines at the end of files (NFC)	2021-10-23 08:45:29 -07:00
Jon Chesterfield	bf6f955f39	[libomptarget] Run GPU offloading tests on both new and old runtime Implemented by patching python config instead of modifying all the tests so that -generic and XFAIL work as usual. Expectation is for this to be reverted once the old runtime is deleted. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D112225	2021-10-22 23:28:44 +01:00
Vladimir Inđić	ba02586fbe	[OpenMP][OMPT][GOMP] task frame support in KMP_API_NAME_GOMP_PARALLEL_SECTIONS KMP_API_NAME_GOMP_PARALLEL_SECTIONS function was missing the task frame support. This patch introduced a fix responsible to set properly the exit_frame of the innermost implicit task that corresponds to the parallel section construct, as well as the enter_frame of the task that encloses the mentioned implicit task. This patch also introduced a simple test case sections_serialized.c that contains serialized parallel section construct and validates whether the mentioned task frames are set correctly. Differential Revision: https://reviews.llvm.org/D112205	2021-10-22 11:01:10 -05:00
AndreyChurbanov	52f4922ebb	[OpenMP][NFC] skip atomic tests for non-x86 arch	2021-10-21 21:51:33 +03:00
Jon Chesterfield	a602c2b51d	[libomptarget][DeviceRTL] Generalise and simplify cmakelists Step towards building the DeviceRTL for amdgpu. Mostly replaces cuda-specific toolchain finding logic with the generic logic currently found in the amdgpu deviceRTL cmake. Also deletes dead code and changes the default to build on systems without cuda installed, as the library doesn't use cuda and the amdgpu-only systems generally won't have cuda installed. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111983	2021-10-21 16:14:29 +01:00
Nawrin Sultana	99d1ce4a62	[OpenMP] Add GOMP allocator functions This patch adds GOMP_alloc and GOMP_free functions of LIBGOMP. Differential revision: https://reviews.llvm.org/D111673	2021-10-20 11:37:29 -05:00
Joseph Huber	b1ce454930	[OpenMP] Remove macro guards for device debugging The plugin currently uses a macro to check if this is a debug built before assigning the debug kind variable to the device environment struct. This is being deprecated because the new device runtime does not maintain separate debug builds and should always be availible. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112083	2021-10-19 12:21:43 -04:00
Jon Chesterfield	7272982e1d	[libomptarget] Refactor DeviceRTL prior to AMDGPU bringup Subset of D111993. Fix typos, rename read to load. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111999	2021-10-19 08:05:06 +01:00
AndreyChurbanov	63f8099e23	[OpenMP] libomp: add check of task function pointer for NULL. This patch allows to simplify compiler implementation on "taskwait nowait" construct. The "taskwait nowait" is semantically equivalent to the empty task. Instead of creating an empty routine as a task entry, compiler can just send NULL pointer to the runtime. Then the runtime will make all the work with dependences and return because of the absent task routine. Differential Revision: https://reviews.llvm.org/D112015	2021-10-18 19:48:30 +03:00
Jon Chesterfield	251b1e7c25	[libomptarget] Pass OMP_TARGET_OFFLOAD env variable through to tests Useful for OMP_TARGET_OFFLOAD=MANDATORY when testing Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111995	2021-10-18 16:03:03 +01:00
@vladaindjic	59a994e8da	[OpenMP][OMPT] thread_num determination for programs with explicit tasks __ompt_get_task_info_internal is now able to determine the right value of the “thread_num” argument during the execution of an explicit task. During the execution of a while loop that iterates over the ancestor tasks hierarchy, the “prev_team” variable was always set to “team” variable at the beginning of each loop iteration. Assume that the program contains a parallel region which encloses an explicit task executed by the worker thread of the region. Also assume that the tool inquires the “thread_num” of a worker thread for the implicit task that corresponds to the region (task at “ancestor_level == 1”) and expects to receive the value of “thread_num > 0”. After the loop finishes, both “team” and “prev_team” variables are equal and point to the team information of the parallel region. The “thread_num” is set to “prev_team->t.t_master_tid”, that is equal to “team->t.t_master_tid”. In this case, “team->t.t_master_tid” is 0, since the master thread of the region is the initial master thread of the program. This leads to a contradiction. To prevent this, “prev_team” variable is set to “team” variable only at the time when the loop that has already encountered the implicit task (“taskdata” variable contains the information about an implicit task) continues iterating over the implicit task’s ancestors, if any. After the mentioned loop finishes, the “prev_team” variable might be equal to NULL. This means that the task at requested “ancestor_level” belongs to the innermost parallel region, so the “thread_num” will be determined by calling the “__kmp_get_tid”. To prove that this patch works, the test case “explicit_task_thread_num.c” is provided. It contains the example of the program explained earlier in the summary. Differential Revision: https://reviews.llvm.org/D110473	2021-10-18 13:54:22 +02:00
Joachim Protze	c93fb143b9	[OpenMP][Tests][NFC] Work around ICC bug Older intel compilers miss the privatization of nested loop variables for doacross loops. Declaring the variable in the loop makes the test more robust.	2021-10-18 13:54:15 +02:00
Joachim Protze	5918688248	[OpenMP][Tests][NFC] Flagging OMPT tests as XFAIL for Intel compilers With Intel 19 compiler the teams tests fail to link while trying to link liboffload.	2021-10-18 13:50:03 +02:00
Shilei Tian	2c941fa2f9	[OpenMP][deviceRTLs] Fix wrong return value of `__kmpc_is_spmd_exec_mode` D110279 introduced a bug to the device runtime. In `__kmpc_parallel_51`, we detect whether we are already in parallel region by `__kmpc_parallel_level() > __kmpc_is_spmd_exec_mode()`. It is based on the assumption that: - In SPMD mode, parallel level is initialized to 1. - In generic mode, parallel level is initialized to 0. - `__kmpc_is_spmd_exec_mode` returns `1` for SPMD mode, 0 otherwise. Because the return value type of `__kmpc_is_spmd_exec_mode` is `int8_t`, there was an implicit cast from `bool` to `int8_t`. We can make sure it is either 0 or 1 since C++14. In D110279, the return value is the result of an `and` operation, which is 2 in SPMD mode. This breaks the assumption in `__kmpc_parallel_51`. Reviewed By: carlo.bertolli, dpalermo Differential Revision: https://reviews.llvm.org/D111905	2021-10-16 12:58:29 -04:00
Joachim Protze	26b675d65e	[OpenMP][Tools][NFC] Make an Archer test more robust The execution order of the tasks is not fixed, so there is no ordering for the write accesses. Enforce the ordering that is expected in the check.	2021-10-15 17:32:05 +02:00
Peyton, Jonathan L	acb3b187c4	[OpenMP][host runtime] Add initial hybrid CPU support Detect, through CPUID.1A, and show user different core types through KMP_AFFINITY=verbose mechanism. Offer future runtime optimizations __kmp_is_hybrid_cpu() to know whether running on a hybrid system or not. Differential Revision: https://reviews.llvm.org/D110435	2021-10-14 16:49:42 -05:00
Peyton, Jonathan L	b840d3ab0d	[OpenMP][host runtime] small fixup of RTM CPUID bit check	2021-10-14 16:49:42 -05:00
Peyton, Jonathan L	50b68a3d03	[OpenMP][host runtime] Add support for teams affinity This patch implements teams affinity on the host. The default is spread. A user can specify either spread, close, or primary using KMP_TEAMS_PROC_BIND environment variable. Unlike OMP_PROC_BIND, KMP_TEAMS_PROC_BIND is only a single value and is not a list of values. The values follow the same semantics under the OpenMP specification for parallel regions except T is the number of teams in a league instead of the number of threads in a parallel region. Differential Revision: https://reviews.llvm.org/D109921	2021-10-14 16:30:28 -05:00
AndreyChurbanov	621d7a75b1	[OpenMP] libomp: add atomic functions for new OpenMP 5.1 atomics. Added functions those implement "atomic compare". Though clang does not use library interfaces to implement OpenMP atomics, the functions added for consistency. Also added missed functions for 80-bit floating min/max atomics. Differential Revision: https://reviews.llvm.org/D110109	2021-10-13 21:02:18 +03:00
AndreyChurbanov	6e98ec9b20	[OpenMP] libomp: fix ittnotify usage. Replaced storing of ittnotify domain array index into location info structure (which is now read-only) with storing of (location info address + ittnotify domain + team size) into hash map. Replaced __kmp_itt_barrier_domains and __kmp_itt_imbalance_domains arrays with __kmp_itt_barrier_domains hash map; __kmp_itt_region_domains and __kmp_itt_region_team_size arrays with __kmp_itt_region_domains hash map. Basic functionality did not change (at least tried to not change). The patch fixes https://bugs.llvm.org/show_bug.cgi?id=48644. Differential Revision: https://reviews.llvm.org/D111580	2021-10-13 20:49:05 +03:00
AndreyChurbanov	5e58b63b28	[OpenMP] libomp: fix warning on comparison of integer expressions of different signedness Replaced macro with global variable of correspondent type. Differential Revision: https://reviews.llvm.org/D111562	2021-10-13 20:11:47 +03:00
AndreyChurbanov	f5c0c9179f	[OpenMP] libomp: add OpenMP 5.1 memory allocation routines. Aligned allocation routines added. Fortran interfaces added for all allocation routines. Differential Revision: https://reviews.llvm.org/D110923	2021-10-11 19:25:00 +03:00
Ron Lieberman	d022f39d9f	[libomptarget][amdgpu][NFC] tweak a comment	2021-10-09 12:51:53 -04:00
Joseph Huber	bad44d5f39	[OpenMP] Add RTL function for getting number of threads in block. This patch adds support for the `__kmpc_get_hardware_num_threads_in_block` function that returns the number of threads. This was missing in the new runtime and was used by the AMDGPU plugin which prevented it from using the new runtime. This patchs also unified the interface for getting the thread numbers in the frontend. Originally authored by jdoerfert. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D111475	2021-10-08 22:21:59 -04:00
Joseph Huber	85ad566335	[OpenMP] Avoid calling `isSPMDMode` during RT initialization Until we hit the first barrier we should not call `mapping::isSPMDMode` with all threads. Instead, we now have (and use during initialization) a `mapping::isMainThreadInGenericMode` overload that takes the known SPMD-mode state and one that queries it. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111381	2021-10-08 22:00:41 -04:00
Joseph Huber	208f900527	[Libomptarget] Add an external interface to dynamic shared memory This patch adds an external interface to access the dynamic shared memory buffer in the device runtime. The function introduced is ``llvm_omp_get_dynamic_shared``. This includes a host-side definition that only returns a null pointer so that it can be used when host-fallback is enabled without crashing. Support for dynamic shared memory was also ported to the old device runtime. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110957	2021-10-08 15:36:57 -04:00
Shilei Tian	c060c634ef	[OpenMP][NVPTX] Fix an error in configuring #teams and #threads It must be a copy mistake. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D111407	2021-10-08 11:07:43 -04:00
Shilei Tian	af4599b8ab	[OpenMP][DeviceRTL] Add the support for printf in a freestanding way For NVPTX, `printf` can be used just with a function declaration. For AMDGCN, an function definition is added, but it simply returns. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109728	2021-10-07 22:15:37 -04:00
Johannes Doerfert	44710940af	[OpenMP][FIX] Data race in the SPMD execution of the new runtime We need to synchronize the threads before we destroy the RAII objects that hold the old values and not after to avoid threads executing the parallel region but seeing an inconsistent state. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111369	2021-10-07 21:01:24 -04:00
Jon Chesterfield	1bc3a6e41b	[libomptarget] Reapply `2bc4d48a78` which was accidentally reverted	2021-10-07 20:17:48 +01:00
Jon Chesterfield	0c554a4769	[libomptarget] Move device environment to shared header, remove divergence Follow on to D110006, related to D110957 Where implementations have diverged this resolves to match the new DeviceRTL - replaces definitions of this struct in deviceRTL and plugins with include - changes the dynamic_shared_size field from D110006 to 32 bits - handles stdint being unavailable in DeviceRTL - adds a zero initializer for the field to amdgpu - moves the extern declaration for deviceRTL to target_interface (omptarget.h is more natural, but doesn't work due to include order with debug.h) - Renames the fields everywhere to match the LLVM format used in DeviceRTL - Makes debug_level uint32_t everywhere (previously sometimes int32_t) Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D111069	2021-10-07 12:03:48 +01:00
Michał Górny	0873b9bef4	[openmp] [elf_common] Fix linking against LLVM dylib The hand-rolled linking logic in elf_common does not account for the possibility of using LLVM dylib rather than a dozen static libraries. Since it does not seem to be easily convertible to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB. This is necessary to support stand-alone builds against installed LLVM. Differential Revision: https://reviews.llvm.org/D111038	2021-10-04 09:29:06 +02:00
Martin Storsjö	dec2257f35	[openmp] Fix a typo in a test REQUIRES line Differential Revision: https://reviews.llvm.org/D110963	2021-10-03 23:51:11 +03:00

1 2 3 4 5 ...

2101 Commits