llvm-project

Commit Graph

Author	SHA1	Message	Date
Med Ismail Bennani	797b50d4be	Revert "Use `GNUInstallDirs` to support custom installation dirs. -- LLVM" This reverts commit `6fd2db04d0` since it broke GreenDragon LLDB-Incremental bot: https://green.lab.llvm.org/green/job/lldb-cmake/37560/console Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>	2021-11-02 19:11:44 +01:00
John Ericson	6fd2db04d0	Use `GNUInstallDirs` to support custom installation dirs. -- LLVM This is a new draft of D28234. I previously did the unorthodox thing of pushing to it when I wasn't the original author, but since this version - Uses `GNUInstallDirs`, rather than mimics it, as the original author was hesitant to do but others requested. - Is much broader, effecting many more projects than LLVM itself. I figured it was time to make a new revision. I am using this patch (and many back-ports) as the basis of https://github.com/NixOS/nixpkgs/pull/111487 for my distro (NixOS). It looked like people were generally on board in D28234, but I make note of this here in case extra motivation is useful. --- As pointed out in the original issue, a central tension is that LLVM already has some partial support for these sorts of things. For example `LLVM_LIBDIR_SUFFIX`, or `COMPILER_RT_INSTALL_PATH`. Because it's not quite clear yet what to do about those, we are holding off on changing libdirs and `compiler-rt`. for this initial PR. --- On the advice of @lebedev.ri, I am splitting this up a bit per subproject, starting with LLVM. To allow it to be more easily reviewed. This and the subsequent patch must be landed together, as this will not build alone. But the rest can be landed on their own. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D100810	2021-11-02 10:23:30 -04:00
Shilei Tian	025f549240	[OpenMP][DeviceRTL] Fixed an issue that causes hang in SU3 The synchronization at the end of parallel region cannot make sure all threads exit the scope. As a result, the assertions right after it might be hit, and further the `state::assumeInitialState(IsSPMD)` in `__kmpc_target_deinit` may not hold as well. We either add a synchronization right after the parallel region, or remove the assertions and assuptions. Here we choose the first one as those assertions and assumptions can help optimizations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112861	2021-10-30 14:44:29 -04:00
Kazu Hirata	3cfc1757c5	Ensure newlines at the end of files (NFC)	2021-10-29 20:26:09 -07:00
Joseph Huber	927c74d4da	[OpenMP] Fix assert macro expr Summary: A previous patch changed the check and mistakenly only did `!expr` when this is a macro expansion and could only apply to the left side of an expression.	2021-10-29 17:44:13 -04:00
Joseph Huber	2c6a4e5678	[OpenMP] Use the assertion formatting from assert.h This patch changes the `assert_assume` function used for internal assumptions in the device runtime to use a more standard formatting for the assumption message. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112842	2021-10-29 16:44:01 -04:00
Joseph Huber	35f42340a2	[OpenMP][Docs] Add documentation for device RTL debugging Add documentation for the debugging features in the OpenMP device runtime library. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112010	2021-10-29 14:57:14 -04:00
Joseph Huber	6dd791bca8	[OpenMP] Check output of malloc in the device for debug A common problem is the device running out of global heap memory and crashing due to a nullptr dereference when using the data sharing stack. This explicitly checks that a nullptr was not returned by malloc when debugging field 1 is enabled. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112005	2021-10-29 14:57:12 -04:00
Joseph Huber	74f91741b6	[OpenMP] Use function tracing RAII for runtime functions. This patch adds support for using function tracing features to track the executino of runtime functions in the device runtime library. This is enabled by first compiling the new runtime with `-fopenmp-target-debug=3` and running with `LIBOMPTARGET_DEVICE_RTL_DEBUG=3`. The output only tracks team 0 and thread 0 so there isn't much output when using a generic region. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112002	2021-10-29 14:57:11 -04:00
Jon Chesterfield	4d50803ce4	[libomptarget] Build DeviceRTL for amdgpu Passes same tests as the current deviceRTL. Includes cmake change from D111987. CI is showing a different set of pass/fails to local, committing this without the tests enabled by default while debugging that difference. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227	2021-10-28 12:34:01 +01:00
Jon Chesterfield	22bd75be70	[openmp] Fix a git misfire in `cf37a94c1e`	2021-10-28 01:35:25 +01:00
Jon Chesterfield	6c7b203d1d	Revert "[libomptarget] Build DeviceRTL for amdgpu" - more tests failing on CI than failed locally when writing this patch This reverts commit `33427fdb7b`.	2021-10-28 01:01:53 +01:00
Jon Chesterfield	cf37a94c1e	[openmp] Add amdgpu impl missed from D112153	2021-10-28 00:55:53 +01:00
Jon Chesterfield	33427fdb7b	[libomptarget] Build DeviceRTL for amdgpu Passes same tests as the current deviceRTL. Includes cmake change from D111987. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227	2021-10-28 00:41:45 +01:00
Johannes Doerfert	48877525cf	[OpenMP] Remove obsolete external interface for device RT We do not generate _serialized_parallel calls in device mode, no need for an external API. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D112145	2021-10-27 18:22:35 -05:00
Johannes Doerfert	5102c3c61e	[OpenMP][FIX] Do not adjust the level after the environment was popped Exiting a data environment will reset all values, it is wrong to adjust them afterwards. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112144	2021-10-27 18:22:33 -05:00
Johannes Doerfert	b16aadf0a7	[OpenMP] Introduce aligned synchronization into the new device RT We will later use the fact that a barrier is aligned to reason about thread divergence. For now we introduce the assumption and some more documentation. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112153	2021-10-27 18:22:31 -05:00
Johannes Doerfert	ef922c692f	[OpenMP][FIX] Query proper thread ID information to support nesting The OpenMP thread ID is not the hardware thread ID if we have nesting. We need to ask the runtime properly to ensure correct results. Note that the loop interface is going to change soon so we do not adjust it now but simply ignore the extra argument. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111950	2021-10-27 18:18:44 -05:00
Johannes Doerfert	4c88341d17	[OpenMP][FIX] Do check the level before return team size The team size could/should be an ICV but since we know it is either 1 or a value we can leave it in the team state for now. However, we still need to determine if the current level is nested before we use it. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D111949	2021-10-27 18:18:42 -05:00
Johannes Doerfert	dc72960967	[OpenMP][FIX] Do not dereference a potential nullptr The first thread state in the new GPU runtime doesn't have a previous one and we should not dereference the nullptr placeholder. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111946	2021-10-27 18:18:39 -05:00
AndreyChurbanov	a64797b5b8	[OpenMP][NFC] disable test on power because of -mlong-double-80 option	2021-10-27 16:54:44 +03:00
AndreyChurbanov	c704b25b44	[OpenMP] libomp: Fix possible NULL dereference. According to dlsym description, the value of symbol could be NULL, and there is no error in this case. Thus dlerror will also return NULL in this case. We need to check the value returned by dlerror before printing it. Differential Revision: https://reviews.llvm.org/D112174	2021-10-27 16:54:44 +03:00
Vignesh Balasubramanian	b0277bef97	[OpenMP][OMPD] Implementation of OMPD debugging library - libompd. This is a continuation of the review: https://reviews.llvm.org/D100183 It contains routines that retrieve OpenMP ICV values for OMPD. Reviewed By: @hbae Differential Revision: https://reviews.llvm.org/D100184	2021-10-27 16:31:19 +05:30
Jon Chesterfield	e42e5785ad	[libomptarget][nfc]Generalise DeviceRTL cmake to allow building for amdgpu Essentially moves the foreach over sm integers into a macro and instantiates it for nvptx. NFC in that the macro is not presently instantiated for amdgpu as the corresponding code doesn't compile yet. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111987	2021-10-26 21:18:21 +01:00
Ron Lieberman	be03ef3ed1	[openmp][lit] Add support to OpenMP lit.cfg for ROCR_VISIBLE_DEVICES env-var add support for ROCR_VISIBLE_DEVICES similar to name and purpose as CUDA_VISIBLE_DEVICES Differential Revision: https://reviews.llvm.org/D112503	2021-10-26 13:46:42 +00:00
Georgios Rokos	2feafa2e46	[libomptarget][NFC] Add comment explaining why we pass argument bases and offsets as two separate entities to the plugins.	2021-10-25 14:51:14 -07:00
Shilei Tian	2a30c03c62	[OpenMP][Offloading] Only get trip count if team construct Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D112475	2021-10-25 17:16:14 -04:00
AndreyChurbanov	e38a1deb66	[OpenMP] libomp: disable definitions of 5.1 atomics for non-x86 arch. Declarations of 5.1 atomic entries were added under "#if KMP_ARCH_X86 \|\| KMP_ARCH_X86_64" in kmp_atomic.h, but definitions of the functions missed architecture guard in kmp_atomic.cpp. As a result mangled symbols were available on non-x86 architecture. The patch eliminates these unexpected symbols from the library. Differential Revision: https://reviews.llvm.org/D112261	2021-10-25 21:17:26 +03:00
Vladimir Inđić	f41d08540b	[OpenMP][OMPT] thread_num determination during execution of nested serialized parallel regions __ompt_get_task_info_internal function is adapted to support thread_num determination during the execution of multiple nested serialized parallel regions enclosed by a regular parallel region. Consider the following program that contains parallel region R1 executed by two threads. Let the worker thread T of region R1 executes serialized parallel regions R2 that encloses another serialized parallel region R3. Note that the thread T is the master thread of both R2 and R3 regions. Assume that __ompt_get_task_info_internal function is called with the argument "ancestor_level == 1" during the execution of region R3. The function should determine the "thread_num" of the thread T inside the team of region R2, whose implicit task is at level 1 inside the hierarchy of active tasks. Since the thread T is the master thread of region R2, one should expected that "thread_num" takes a value 0. After the while loop finishes, the following stands: "lwt != NULL", "prev_lwt == NULL", "prev_team" represents the team information about the innermost serialized parallel region R3. This results in executing the assignment "thread_num = prev_team->t.t_master_tid". Note that "prev_team->t.t_master_tid" was initialized at the moment of R2’s creation and represents the "thread_num" of the thread T inside the region R1 which encloses R2. Since the thread T is the worker thread of the region R1, "the thread_num" takes value 1, which is a contradiction. This patch proposes to use "lwt" instead of "prev_lwt" when determining the "thread_num". If "lwt" exists, the task at the requested level belongs to the serialized parallel region. Since the serialized parallel region is executed by one thread only, the "thread_num" takes value 0. Similarly, assume that __ompt_get_task_info_internal function is called with the argument "ancestor_level == 2" during the execution of region R3. The function should determine the "thread_num" of the thread T inside the team of region R1. Since the thread is the worker inside the region R1, one should expected that "thread_num" takes value 1. After the loop finishes, the following stands: "lwt == NULL", "prev_lwt != NULL", "prev_team" represents the team information about the innermost serialized parallel region R3. This leads to execution of the assignment "thread_num = 0", which causes a contradiction. Ignoring the "prev_lwt" leads to executing the assignment "thread_num = prev_team->t.t_master_tid" instead. From the previous explanation, it is obvious that "thread_num" takes value 1. Note that the "prev_lwt" variable is marked as unnecessary and thus removed. This patch introduces the test case which represents the OpenMP program described earlier in the summary. Differential Revision: https://reviews.llvm.org/D110699	2021-10-25 18:21:20 +02:00
Vladimir Inđić	f2410bfb1c	[OpenMP][OMPT][clang] task frame support fixed in __kmpc_fork_call __kmp_fork_call sets the enter_frame of the active task (th_curren_task) before new parallel region begins. After the region is finished, the enter_frame is cleared. The old implementation of __kmpc_fork_call didn’t clear the enter_frame of active task. Also, the way of initializing the enter_frame of the active task was wrong. Consider the following two OpenMP programs. The first program: Let R1 be the serialized parallel region that encloses another serialized parallel region R2. Assume that thread that executes R2 is going to create a new serialized parallel region R3 by executing __kmpc_fork_call. This thread is responsible to set enter_frame of R2's implicit task. Note that the information about R2's implicit task is present inside master_th->th.th_current_task at this moment, while lwt represents the information about R1's implicit task. The old implementation uses lwt and resets enter_frame of R1's implicit task instead of R2's implicit task. The new implementation uses master_th->th.th_current_task instead. The second program: Consider the OpenMP program that contains parallel region R1 which encloses an explicit task T. Assume that thread should create another parallel region R2 during the execution of the T. The __kmpc_fork_call is responsible to create R2 and set enter frame of T whose information is present inside the master_th->th.th_current_task. Old implementation tries to set the frame of parent_team->t.t_implicit_task_taskdata[tid] which corresponds to the implicit task of the R1, instead of T. Differential Revision: https://reviews.llvm.org/D112419	2021-10-25 18:21:19 +02:00
Joachim Protze	7368227965	[OpenMP][Tests] Test omp_get_wtime for invariants As discussed in D108488, testing for invariants of omp_get_wtime would be more reliable than testing for duration of sleep, as return from sleep might be delayed due to system load. Alternatively/in addition, we could compare the time measured by omp_get_wtime to time measured with C++11 chrono (for portability?). Differential Revision: https://reviews.llvm.org/D112458	2021-10-25 18:20:59 +02:00
Joachim Protze	3f229f42b7	[OpenMP][Tests][NFC] Actually check for test outcome The CHECK: line in the test had no effect, because the test does not pipe to FileCheck. Since the test only checks for a single value, encode the result in the return value of the test.	2021-10-25 18:20:12 +02:00
Joachim Protze	047890bc3f	[OpenMP][Tests][NFC] Mark tests trying to link COI as unsupported For some tests with target-related functionality icc 18/19 tries to link libioffload_target.so.5, which fails for missing COI symbols.	2021-10-25 18:20:12 +02:00
Joachim Protze	d7fdd236d5	[OpenMP][Tests][NFC] Replace atomic increment by reduction Also mark the test as unsupported by intel-21, because the test does not terminate	2021-10-25 18:20:12 +02:00
Joachim Protze	38f78dd2e2	[OpenMP][Tools][NFC] Fix C99-style declaration of iteration variables Where possible change to declare the variable before the loop. Where not possible, specifically request -std=c99 (could be limited to specific compilers like icc).	2021-10-25 18:20:12 +02:00
Joachim Protze	d29a7d23ec	[OpenMP][Tools][NFC] Pass intel license ENV to lit	2021-10-25 18:20:11 +02:00
Kazu Hirata	d8e4170b0a	Ensure newlines at the end of files (NFC)	2021-10-23 08:45:29 -07:00
Jon Chesterfield	bf6f955f39	[libomptarget] Run GPU offloading tests on both new and old runtime Implemented by patching python config instead of modifying all the tests so that -generic and XFAIL work as usual. Expectation is for this to be reverted once the old runtime is deleted. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D112225	2021-10-22 23:28:44 +01:00
Vladimir Inđić	ba02586fbe	[OpenMP][OMPT][GOMP] task frame support in KMP_API_NAME_GOMP_PARALLEL_SECTIONS KMP_API_NAME_GOMP_PARALLEL_SECTIONS function was missing the task frame support. This patch introduced a fix responsible to set properly the exit_frame of the innermost implicit task that corresponds to the parallel section construct, as well as the enter_frame of the task that encloses the mentioned implicit task. This patch also introduced a simple test case sections_serialized.c that contains serialized parallel section construct and validates whether the mentioned task frames are set correctly. Differential Revision: https://reviews.llvm.org/D112205	2021-10-22 11:01:10 -05:00
AndreyChurbanov	52f4922ebb	[OpenMP][NFC] skip atomic tests for non-x86 arch	2021-10-21 21:51:33 +03:00
Jon Chesterfield	a602c2b51d	[libomptarget][DeviceRTL] Generalise and simplify cmakelists Step towards building the DeviceRTL for amdgpu. Mostly replaces cuda-specific toolchain finding logic with the generic logic currently found in the amdgpu deviceRTL cmake. Also deletes dead code and changes the default to build on systems without cuda installed, as the library doesn't use cuda and the amdgpu-only systems generally won't have cuda installed. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111983	2021-10-21 16:14:29 +01:00
Nawrin Sultana	99d1ce4a62	[OpenMP] Add GOMP allocator functions This patch adds GOMP_alloc and GOMP_free functions of LIBGOMP. Differential revision: https://reviews.llvm.org/D111673	2021-10-20 11:37:29 -05:00
Joseph Huber	b1ce454930	[OpenMP] Remove macro guards for device debugging The plugin currently uses a macro to check if this is a debug built before assigning the debug kind variable to the device environment struct. This is being deprecated because the new device runtime does not maintain separate debug builds and should always be availible. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112083	2021-10-19 12:21:43 -04:00
Jon Chesterfield	7272982e1d	[libomptarget] Refactor DeviceRTL prior to AMDGPU bringup Subset of D111993. Fix typos, rename read to load. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111999	2021-10-19 08:05:06 +01:00
AndreyChurbanov	63f8099e23	[OpenMP] libomp: add check of task function pointer for NULL. This patch allows to simplify compiler implementation on "taskwait nowait" construct. The "taskwait nowait" is semantically equivalent to the empty task. Instead of creating an empty routine as a task entry, compiler can just send NULL pointer to the runtime. Then the runtime will make all the work with dependences and return because of the absent task routine. Differential Revision: https://reviews.llvm.org/D112015	2021-10-18 19:48:30 +03:00
Jon Chesterfield	251b1e7c25	[libomptarget] Pass OMP_TARGET_OFFLOAD env variable through to tests Useful for OMP_TARGET_OFFLOAD=MANDATORY when testing Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111995	2021-10-18 16:03:03 +01:00
@vladaindjic	59a994e8da	[OpenMP][OMPT] thread_num determination for programs with explicit tasks __ompt_get_task_info_internal is now able to determine the right value of the “thread_num” argument during the execution of an explicit task. During the execution of a while loop that iterates over the ancestor tasks hierarchy, the “prev_team” variable was always set to “team” variable at the beginning of each loop iteration. Assume that the program contains a parallel region which encloses an explicit task executed by the worker thread of the region. Also assume that the tool inquires the “thread_num” of a worker thread for the implicit task that corresponds to the region (task at “ancestor_level == 1”) and expects to receive the value of “thread_num > 0”. After the loop finishes, both “team” and “prev_team” variables are equal and point to the team information of the parallel region. The “thread_num” is set to “prev_team->t.t_master_tid”, that is equal to “team->t.t_master_tid”. In this case, “team->t.t_master_tid” is 0, since the master thread of the region is the initial master thread of the program. This leads to a contradiction. To prevent this, “prev_team” variable is set to “team” variable only at the time when the loop that has already encountered the implicit task (“taskdata” variable contains the information about an implicit task) continues iterating over the implicit task’s ancestors, if any. After the mentioned loop finishes, the “prev_team” variable might be equal to NULL. This means that the task at requested “ancestor_level” belongs to the innermost parallel region, so the “thread_num” will be determined by calling the “__kmp_get_tid”. To prove that this patch works, the test case “explicit_task_thread_num.c” is provided. It contains the example of the program explained earlier in the summary. Differential Revision: https://reviews.llvm.org/D110473	2021-10-18 13:54:22 +02:00
Joachim Protze	c93fb143b9	[OpenMP][Tests][NFC] Work around ICC bug Older intel compilers miss the privatization of nested loop variables for doacross loops. Declaring the variable in the loop makes the test more robust.	2021-10-18 13:54:15 +02:00
Joachim Protze	5918688248	[OpenMP][Tests][NFC] Flagging OMPT tests as XFAIL for Intel compilers With Intel 19 compiler the teams tests fail to link while trying to link liboffload.	2021-10-18 13:50:03 +02:00
Shilei Tian	2c941fa2f9	[OpenMP][deviceRTLs] Fix wrong return value of `__kmpc_is_spmd_exec_mode` D110279 introduced a bug to the device runtime. In `__kmpc_parallel_51`, we detect whether we are already in parallel region by `__kmpc_parallel_level() > __kmpc_is_spmd_exec_mode()`. It is based on the assumption that: - In SPMD mode, parallel level is initialized to 1. - In generic mode, parallel level is initialized to 0. - `__kmpc_is_spmd_exec_mode` returns `1` for SPMD mode, 0 otherwise. Because the return value type of `__kmpc_is_spmd_exec_mode` is `int8_t`, there was an implicit cast from `bool` to `int8_t`. We can make sure it is either 0 or 1 since C++14. In D110279, the return value is the result of an `and` operation, which is 2 in SPMD mode. This breaks the assumption in `__kmpc_parallel_51`. Reviewed By: carlo.bertolli, dpalermo Differential Revision: https://reviews.llvm.org/D111905	2021-10-16 12:58:29 -04:00
Joachim Protze	26b675d65e	[OpenMP][Tools][NFC] Make an Archer test more robust The execution order of the tasks is not fixed, so there is no ordering for the write accesses. Enforce the ordering that is expected in the check.	2021-10-15 17:32:05 +02:00
Peyton, Jonathan L	acb3b187c4	[OpenMP][host runtime] Add initial hybrid CPU support Detect, through CPUID.1A, and show user different core types through KMP_AFFINITY=verbose mechanism. Offer future runtime optimizations __kmp_is_hybrid_cpu() to know whether running on a hybrid system or not. Differential Revision: https://reviews.llvm.org/D110435	2021-10-14 16:49:42 -05:00
Peyton, Jonathan L	b840d3ab0d	[OpenMP][host runtime] small fixup of RTM CPUID bit check	2021-10-14 16:49:42 -05:00
Peyton, Jonathan L	50b68a3d03	[OpenMP][host runtime] Add support for teams affinity This patch implements teams affinity on the host. The default is spread. A user can specify either spread, close, or primary using KMP_TEAMS_PROC_BIND environment variable. Unlike OMP_PROC_BIND, KMP_TEAMS_PROC_BIND is only a single value and is not a list of values. The values follow the same semantics under the OpenMP specification for parallel regions except T is the number of teams in a league instead of the number of threads in a parallel region. Differential Revision: https://reviews.llvm.org/D109921	2021-10-14 16:30:28 -05:00
AndreyChurbanov	621d7a75b1	[OpenMP] libomp: add atomic functions for new OpenMP 5.1 atomics. Added functions those implement "atomic compare". Though clang does not use library interfaces to implement OpenMP atomics, the functions added for consistency. Also added missed functions for 80-bit floating min/max atomics. Differential Revision: https://reviews.llvm.org/D110109	2021-10-13 21:02:18 +03:00
AndreyChurbanov	6e98ec9b20	[OpenMP] libomp: fix ittnotify usage. Replaced storing of ittnotify domain array index into location info structure (which is now read-only) with storing of (location info address + ittnotify domain + team size) into hash map. Replaced __kmp_itt_barrier_domains and __kmp_itt_imbalance_domains arrays with __kmp_itt_barrier_domains hash map; __kmp_itt_region_domains and __kmp_itt_region_team_size arrays with __kmp_itt_region_domains hash map. Basic functionality did not change (at least tried to not change). The patch fixes https://bugs.llvm.org/show_bug.cgi?id=48644. Differential Revision: https://reviews.llvm.org/D111580	2021-10-13 20:49:05 +03:00
AndreyChurbanov	5e58b63b28	[OpenMP] libomp: fix warning on comparison of integer expressions of different signedness Replaced macro with global variable of correspondent type. Differential Revision: https://reviews.llvm.org/D111562	2021-10-13 20:11:47 +03:00
AndreyChurbanov	f5c0c9179f	[OpenMP] libomp: add OpenMP 5.1 memory allocation routines. Aligned allocation routines added. Fortran interfaces added for all allocation routines. Differential Revision: https://reviews.llvm.org/D110923	2021-10-11 19:25:00 +03:00
Ron Lieberman	d022f39d9f	[libomptarget][amdgpu][NFC] tweak a comment	2021-10-09 12:51:53 -04:00
Joseph Huber	bad44d5f39	[OpenMP] Add RTL function for getting number of threads in block. This patch adds support for the `__kmpc_get_hardware_num_threads_in_block` function that returns the number of threads. This was missing in the new runtime and was used by the AMDGPU plugin which prevented it from using the new runtime. This patchs also unified the interface for getting the thread numbers in the frontend. Originally authored by jdoerfert. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D111475	2021-10-08 22:21:59 -04:00
Joseph Huber	85ad566335	[OpenMP] Avoid calling `isSPMDMode` during RT initialization Until we hit the first barrier we should not call `mapping::isSPMDMode` with all threads. Instead, we now have (and use during initialization) a `mapping::isMainThreadInGenericMode` overload that takes the known SPMD-mode state and one that queries it. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111381	2021-10-08 22:00:41 -04:00
Joseph Huber	208f900527	[Libomptarget] Add an external interface to dynamic shared memory This patch adds an external interface to access the dynamic shared memory buffer in the device runtime. The function introduced is ``llvm_omp_get_dynamic_shared``. This includes a host-side definition that only returns a null pointer so that it can be used when host-fallback is enabled without crashing. Support for dynamic shared memory was also ported to the old device runtime. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110957	2021-10-08 15:36:57 -04:00
Shilei Tian	c060c634ef	[OpenMP][NVPTX] Fix an error in configuring #teams and #threads It must be a copy mistake. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D111407	2021-10-08 11:07:43 -04:00
Shilei Tian	af4599b8ab	[OpenMP][DeviceRTL] Add the support for printf in a freestanding way For NVPTX, `printf` can be used just with a function declaration. For AMDGCN, an function definition is added, but it simply returns. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109728	2021-10-07 22:15:37 -04:00
Johannes Doerfert	44710940af	[OpenMP][FIX] Data race in the SPMD execution of the new runtime We need to synchronize the threads before we destroy the RAII objects that hold the old values and not after to avoid threads executing the parallel region but seeing an inconsistent state. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111369	2021-10-07 21:01:24 -04:00
Jon Chesterfield	1bc3a6e41b	[libomptarget] Reapply `2bc4d48a78` which was accidentally reverted	2021-10-07 20:17:48 +01:00
Jon Chesterfield	0c554a4769	[libomptarget] Move device environment to shared header, remove divergence Follow on to D110006, related to D110957 Where implementations have diverged this resolves to match the new DeviceRTL - replaces definitions of this struct in deviceRTL and plugins with include - changes the dynamic_shared_size field from D110006 to 32 bits - handles stdint being unavailable in DeviceRTL - adds a zero initializer for the field to amdgpu - moves the extern declaration for deviceRTL to target_interface (omptarget.h is more natural, but doesn't work due to include order with debug.h) - Renames the fields everywhere to match the LLVM format used in DeviceRTL - Makes debug_level uint32_t everywhere (previously sometimes int32_t) Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D111069	2021-10-07 12:03:48 +01:00
Michał Górny	0873b9bef4	[openmp] [elf_common] Fix linking against LLVM dylib The hand-rolled linking logic in elf_common does not account for the possibility of using LLVM dylib rather than a dozen static libraries. Since it does not seem to be easily convertible to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB. This is necessary to support stand-alone builds against installed LLVM. Differential Revision: https://reviews.llvm.org/D111038	2021-10-04 09:29:06 +02:00
Martin Storsjö	dec2257f35	[openmp] Fix a typo in a test REQUIRES line Differential Revision: https://reviews.llvm.org/D110963	2021-10-03 23:51:11 +03:00
Peyton, Jonathan L	343b9e8590	[OpenMP][host runtime] Introduce kmp_cpuinfo_flags_t to replace integer flags Store CPUID support flags as bits instead of using entire integers. Differential Revision: https://reviews.llvm.org/D110091	2021-10-01 11:08:39 -05:00
Peyton, Jonathan L	957b4c5750	[OpenMP][testing] increase threshold for omp_get_wtime test	2021-10-01 11:07:41 -05:00
Jon Chesterfield	05ba9ff6a6	[libomptarget][amdgpu] Refactor memory pool collection	2021-10-01 14:58:01 +01:00
Jon Chesterfield	72e8a4c45d	[openmp][docs] Describe how the internal components are found Add a FAQ entry about the names of openmp offloading components and how they are searched for. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D109619	2021-09-30 22:05:12 +01:00
Jon Chesterfield	3247329107	[openmp] Add addrspacecast to getOrCreateIdent Fixes 51982. Adds a missing CreatePointerCast and allocates a global in the correct address space. Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\ blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting parts while checking the assertion failure still occurred. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110556	2021-09-30 21:36:31 +01:00
Jon Chesterfield	b75a7481ba	[libomptarget] Apply D110029 to amdgpu Use enum for execution mode. This is partly a port from ROCm and partly a port from D110029. Attempted to make the same choices as ROCm as far as comments etc go to reduce the merge conflicts. There is some cleanup warranted here - in particular I like the cuda patch factoring out the comparisons into named variables - but I'd like to leave that for a follow up patch, keeping this one minimal. Reviewed By: carlo.bertolli Differential Revision: https://reviews.llvm.org/D110845	2021-09-30 21:29:37 +01:00
Dhruva Chakrabarti	6226270253	[libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed. Fixes: SWDEV-275232 (With contributions from Ammar Elwazir, Laurent Morichetti, and Tony Tye) The current code is racy. After the packet is submitted, the GPU will increment the read index. If this wraps around before the memory is read from it'll refer to a signal from an unrelated packet. Change avoids reading from the packet post-submission. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110679	2021-09-29 09:22:07 -07:00
Jon Chesterfield	2bc4d48a78	[libomptarget][amdgpu] Follow on to D110513, empty kernarg pools are not fatal	2021-09-27 22:44:35 +01:00
Jon Chesterfield	738734f655	[libomptarget][amdgpu] Report zero devices if plugin construction fails, instead of segv	2021-09-27 22:13:12 +01:00
Jon Chesterfield	80fa43fe9a	Revert "[openmp] Add addrspacecast to getOrCreateIdent" This reverts commit `1a761e5b7b`. Failed CI, albeit with a different failure mode to BZ51982	2021-09-27 19:27:35 +01:00
Jon Chesterfield	1a761e5b7b	[openmp] Add addrspacecast to getOrCreateIdent Fixes 51982. Minor refactor to remove `return x = y` construct. Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\ blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting parts while checking the assertion failure still occurred. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110556	2021-09-27 19:23:12 +01:00
@vladaindjic	5357a98c82	[OpenMP] libomp: Usage of TASK_TIED constant inside kmp_gsupport.cpp The minor code refactorization introduces the TASK_TIED constant inside kmp_gsupprot.cpp as a replacement for the literal value 1. The mentioned constant is now used in both kmp_tasking.cpp and kmp_gsupport.cpp files. Differential Revision: https://reviews.llvm.org/D110441	2021-09-27 19:45:56 +03:00
Joseph Huber	74d622dea4	[OpenMP] Add new worksharing definitions into device RTL This path defines the newly added `__kmpc_disitrute_static_init` functions in the device runtime library. These functions are currently exact copies of the current worksharing method but can be tuned later. Depends on D110429 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110430	2021-09-27 11:36:41 -04:00
Pushpinder Singh	b1695c2eb8	[AMDGPU][OpenMP] Add memory pool size check to isValidMemoryPool Keeping all the checks in one place for future simplification. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110513	2021-09-27 12:29:00 +00:00
Michael Kruse	1b242dccff	[OpenMP][CMake] Use in-project clang as CUDA->IR compiler for new DeviceRTL. Use the in-project clang, llvm-link and opt if available and unless CMake cache variables specify to use a different compiler. This applies D101265 to the new DeviceRTL's CMakeLists.txt which was copied before D101265 was applied. Fixes the openmp-offloading-cuda-runtime builder which was failing since D110006. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110251	2021-09-27 07:14:19 -05:00
Pushpinder Singh	9d0eb440ff	[libomptarget][nfc][amdgpu] Reorder function to clarify review diff	2021-09-27 09:30:55 +00:00
Jon Chesterfield	726a34f063	[libomptarget][amdgpu] Replace dead exit call with returning error	2021-09-27 09:43:37 +01:00
Vignesh Balu	62fddd5ff5	[OpenMP][OMPD] Implementation of OMPD debugging library - libompd. This is a continuation of the review: https://reviews.llvm.org/D100182 This patch implements the OMPD API as specified in the standard doc. Reviewed By: @hbae Differential Revision: https://reviews.llvm.org/D100183	2021-09-27 12:32:31 +05:30
Jon Chesterfield	8cf93a35d4	[libomptarget][amdgpu] Destruct HSA queues Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D109511	2021-09-26 15:34:21 +01:00
Joseph Huber	d83ca624a1	[OpenMP] Fix data-race in new device RTL This patch fixes a data-race observed when using the new device runtime library. The Internal control variable for the parallel level is read in the `__kmpc_parallel_51` function while it could potentially be written by other threads. This causes data corruption and will cause nondetermistic behaviour in the runtime. This patch fixes this by adding an explicit synchronization before the region starts. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110366	2021-09-23 17:28:07 -04:00
Shilei Tian	423d34f74a	[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit` This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110279	2021-09-22 17:16:41 -04:00
Joseph Huber	60a40cf379	[OpenMP] Fix KeepAlive usage Summary: Functions were called the wrong way around, this didn't keep the symbol alive.	2021-09-22 14:38:19 -04:00
Joseph Huber	277b681ede	[OpenMP] Add function tracing debugging to device RTL This patch adds support for an RAII struct that will print function traces when placed inside of a function declaration. Each successive call will increase the indentation to make it easier to visually inspect. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110202	2021-09-22 12:25:29 -04:00
Shilei Tian	ca999f7191	[OpenMP][Offloading] Use bitset to indicate execution mode instead of value The execution mode of a kernel is stored in a global variable, whose value means: - 0 - SPMD mode - 1 - indicates generic mode - 2 - SPMD mode execution with generic mode semantics We are going to add support for SIMD execution mode. It will be come with another execution mode, such as SIMD-generic mode. As a result, this value-based indicator is not flexible. This patch changes to bitset based solution to encode execution mode. Each position is: [0] - generic mode [1] - SPMD mode [2] - SIMD mode (will be added later) In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode execution with generic mode semantics. In the future after we add the support for SIMD mode, `0b1xx` will be in SIMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110029	2021-09-22 11:40:52 -04:00
Joseph Huber	1cf86df883	[OpenMP] Make sure the Thread ID function is not removed Summary: The thread ID function was reintroduced in D110195, but could potentially be removed by the optimizer. Make the function noinline to preserve the call sites and add it to the externalization RAII so its definition is not removed by the attributor.	2021-09-22 10:13:18 -04:00
Joseph Huber	e95731cca7	[OpenMP] Add thread ID function into new RTL The new device runtime library currently lacks the `kmpc_get_hardware_thread_id_in_block` function which is currently used when doing the SPMDzation optimization. This call would be introduced through the optimization and then cause a linking error because it was not present. This patch adds support for this runtime call. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110195	2021-09-21 17:43:50 -04:00
Giorgis Georgakoudis	ac90dfc43a	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `1d66649adf`. Revert to fix AMG GPU issue.	2021-09-21 13:20:39 -07:00
Usman Nadeem	248342b7c7	[OpenMP][OMPD] Fix compile error when OMPD is not supported Differential Revision: https://reviews.llvm.org/D110120 Change-Id: I9d39dacfab5b7fbab37ee4b4d960d51e0892b24d	2021-09-21 12:45:15 -07:00
Giorgis Georgakoudis	1d66649adf	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6 Differential Revision: https://reviews.llvm.org/D102107	2021-09-21 10:50:04 -07:00
Shilei Tian	49e976c934	[OpenMP][NVPTX] Fix a warning that data argument not used by format string Reviewed By: jhuber6, grokos Differential Revision: https://reviews.llvm.org/D110104	2021-09-20 17:22:14 -04:00
Peyton, Jonathan L	1e45cd75df	[OpenMP][host runtime] Fix indirect lock table race condition The indirect lock table can exhibit a race condition during initializing and setting/unsetting locks. This occurs if the lock table is resized by one thread (during an omp_init_lock) and accessed (during an omp_set\|unset_lock) by another thread. The test runtime/test/lock/omp_init_lock.c test exposed this issue and will fail if run enough times. This patch restructures the lock table so pointer/iterator validity is always kept. Instead of reallocating a single table to a larger size, the lock table begins preallocated to accommodate 8K locks. Each row of the table is allocated as needed with each row allowing 1K locks. If the 8K limit is reached for the initial table, then another table, capable of holding double the number of locks, is allocated and linked as the next table. The indices stored in the user's locks take this linked structure into account when finding the lock within the table. Differential Revision: https://reviews.llvm.org/D109725	2021-09-20 13:01:58 -05:00
Joseph Huber	f1c821fa85	[OpenMP] Add support for dynamic shared memory in new RTL This patch adds support for using dynamic shared memory in the new device runtime. The new function `__kmpc_get_dynamic_shared` will return a pointer to the buffer of dynamic shared memory. Currently the amount of memory allocated is set by an environment variable. In the future this amount will be added to the amount used for the smart stack which will be configured in a similar way. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110006	2021-09-17 21:25:36 -04:00
Joseph Huber	ec02c34b6d	[OpenMP] Add additional fields to device environment This patch adds fields for the device number and number of devices into the device environment struct and debugging values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110004	2021-09-17 21:25:32 -04:00
Joseph Huber	b266bcb135	[OpenMP] Implement __assert_fail in the new device runtime This patch implements the `__assert_fail` function in the new device runtime. This allows users and developers to use the standars assert function inside of the device. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D109886	2021-09-17 21:25:28 -04:00
Shilei Tian	81a1a91c62	[NFC] clang-format -i /openmp/libomptarget/deviceRTLs/interface.h	2021-09-17 12:55:02 -04:00
AndreyChurbanov	59b877d001	[OpenMP] NFC: add type casts to silence gcc warnings	2021-09-17 19:49:40 +03:00
AndreyChurbanov	7f1a6d891e	[OpenMP] libomp: Update third-party sources of ittnotify client code. The third-party ittnotify sources updated from https://github.com/intel/ittapi. Changes applied: - llvm license aded to all files; initial BSD license saved in LICENSE.txt; - clang-formatted; - renamed .c to .cpp, similar to what we did with all our sources; - added #include "kmp_config.h" with definition of INTEL_ITTNOTIFY_PREFIX macro into ittnotify_static.cpp. Differential Revision: https://reviews.llvm.org/D109333	2021-09-17 19:38:34 +03:00
Hansang Bae	ae2a5facce	[OpenMP][libomptarget] Minor fix in x86_64 plugin Call to remove() was passing invalid address for the file name. Differential Revision: https://reviews.llvm.org/D109846	2021-09-15 15:57:06 -05:00
Peyton, Jonathan L	258e27aae1	[OpenMP] Add support for GOMP depobj GOMP depobjs are represented as a two intptr_t array. The first element is the base address of the dependency and the second element is the flag indicating the type the depobj represents. Differential Revision: https://reviews.llvm.org/D108790	2021-09-15 12:47:08 -05:00
Vignesh Balasubramanian	939154125b	[OpenMP] [OMPD] OPENMP_INSTALL_LIBDIR is set for the install dir OPENMP_INSTALL_LIBDIR is set to the installation path of shared and static libompd.This should avoid the mixing of 32 and 64 bit on same path in multi-lib set-up. Reviewed By: @mceier Differential Revision: https://reviews.llvm.org/D109352	2021-09-13 10:25:50 +05:30
Joseph Huber	7eb899cbcd	[OpenMP] Add more verbose remarks for runtime folding We peform runtime folding, but do not currently emit remarks when it is performed. This is because it comes from the runtime library and is beyond the users control. However, people may still wish to view this and similar information easily, so we can enable this behaviour using a special flag to enable verbose remarks. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109627	2021-09-10 17:36:06 -04:00
Ye Luo	2187cbf56f	[OpenMP][libomptarget] Add __tgt_target_return_t enum for __tgt_target_XXX return int The defintion of OFFLOAD_SUCCESS and OFFLOAD_FAIL used in plugin APIs and libomptarget public APIs are not consistent. Create __tgt_target_return_t for libomptarget public APIs. Differential Revision: https://reviews.llvm.org/D109304	2021-09-10 16:11:08 -05:00
Jon Chesterfield	f244af5c9f	[openmp][amdgpu] Update SupportAndFAQ docs	2021-09-10 18:35:29 +01:00
Johannes Doerfert	9f844aeeb4	[OpenMP][Docs] Remove old/outdated webpage This should have happened a long time ago, now that openmp.llvm.org redirects to openmp.llvm.org/docs we completely switched over to the sphinx documentation page instead. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D108588	2021-09-10 12:11:05 -05:00
Jon Chesterfield	6760234e8d	[libomptarget][amdgpu] Precisely manage hsa lifetime The hsa library must be initialized before any calls into it and destructed after the last call into it. There have been a number of bugs in this area related to member variables which would like to use raii to manage resources acquired from hsa. This patch moves the init/shutdown of hsa into a class, such that when used as the first member variable (could be a base), the lifetime of other member variables are reliably scoped within it. This will allow other classes to use raii reliably when used as member variables within the global. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D109512	2021-09-09 17:28:11 +01:00
Jon Chesterfield	2a581710c1	[openmp] No longer use LIBRARY_PATH to find devicertl Given D109057, change test runner to use the libomptarget-x-bc-path argument instead of the LIBRARY_PATH environment variable to find the device library. Also drop the use of LIBRARY_PATH environment variable as it is far too easy to pull in the device library from an unrelated toolchain by accident with the current setup. No loss in flexibility to developers as the clang commandline used here is still available. Reviewed By: jdoerfert, tianshilei1992 Differential Revision: https://reviews.llvm.org/D109061	2021-09-09 17:16:41 +01:00
Jon Chesterfield	d642156f8f	[libomptarget][nfc] Hoist hsa_init into rtl.cpp	2021-09-09 16:09:34 +01:00
Hansang Bae	3976035d68	[OpenMP] Fix line truncation in omp_lib.h Fixed code that exceeds 72-column. Differential Revision: https://reviews.llvm.org/D109469	2021-09-09 09:33:45 -05:00
AndreyChurbanov	d40108e0af	[OpenMP] libomp: runtime part of omp_all_memory task dependence implementation. New omp_all_memory task dependence type is implemented. Library recognizes the new type via either (dependence_address == NULL && dependence_flag == 0x80) or (dependence_address == SIZE_MAX). A task with new dependence type depends on each preceding task with any dependence type (kind of a dependence barrier). Differential Revision: https://reviews.llvm.org/D108574	2021-09-08 16:55:32 +03:00
Ye Luo	2cfe1a09d1	[OpenMP][libomptarget][NFC] Change checkDeviceAndCtors return type to bool. What is exactly needed is only a boolean. Pulling OFFLOAD_SUCCESS/FAIL only adds confusion. Differential Revision: https://reviews.llvm.org/D109303	2021-09-07 13:59:27 -05:00
Hansang Bae	224f51d879	[OpenMP] Add interface for 5.1 scope construct The new interface only marks begin/end of a scope construct for corresponding OMPT events, and we can use existing interfaces for reduction operations. Differential Revision: https://reviews.llvm.org/D108062	2021-09-07 11:22:21 -05:00
Nawrin Sultana	c24da72fa4	[OpenMP] Change monotonicity of dynamic schedule This patch changes the default monotonicity of dynamic schedule from monotonic to non-monotonic when no modifier is specified. Differential Revision: https://reviews.llvm.org/D109026	2021-09-07 08:18:46 -05:00
Ye Luo	c3aecf87d5	[OpenMP][libomptarget] Change device vector elements to unique_ptr type Using std::vector<DeviceTy> requires implementing copy constructor and copied assign operator for DeviceTy. Indeed DeviceTy should never be copied. After changing to std::vector<std::unique_ptr<DeviceTy>>, All the unsafe copy constructor and copy assign operator implementations can be removed. Compilers mark them deleted due to mutex or underlying objects and this is the desired behavior. Differential Revision: https://reviews.llvm.org/D109276	2021-09-06 22:28:49 -05:00
Ye Luo	8e5c1b039e	[OpenMP][libomptarget] Change synchronize_ty return type to int32_t Plugins always return int32_t. Stay consistent with other functions which return error status. Differential Revision: https://reviews.llvm.org/D109341	2021-09-06 21:38:54 -05:00
Ron Lieberman	fdac5adee6	[openmp] NFC add bitcode comment	2021-09-02 18:21:39 -05:00
Jon Chesterfield	201e466eba	[libomptarget][amdgpu] Add gfx90a to build list	2021-09-02 18:11:02 +01:00
Jon Chesterfield	3153bdd547	[libomptarget][amdgpu] Drop env variables Use the same debug print as the rest of libomptarget plugins with the same environment control. Also drop the max queue size debugging hook as I don't believe it is still in use, can bring it back near the rest of the env handling in rtl.cpp if someone objects. That makes most of rt.h and all of utils.cpp unused. Clean that up and simplify control flow in a couple of places. Behaviour change is that debug prints that used to use the old environment variable now use the new one and print in slightly different format, and the removal of the max queue size variable. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D108784	2021-09-02 11:02:39 +01:00
Ye Luo	289a1089cd	[libomptarget] Move HostDataToTargetTy states into StatesTy Use unique_ptr to achieve the effect of mutable. Remove mutable keyword of DynRefCount and HoldRefCount Remove std::shared_ptr from UpdateMtx Reviewed By: tianshilei1992, grokos Differential Revision: https://reviews.llvm.org/D109007	2021-09-01 23:36:05 -05:00
Fangrui Song	4d5220faf9	[OpenMP] Fix -Wunused-but-set-parameter in -DLLVM_ENABLE_ASSERTIONS=off builds. NFC	2021-09-01 17:55:13 -07:00
Joel E. Denny	1f9e437065	[OpenMP][AMDGPU] Remove unneeded XFAILs	2021-09-01 18:00:25 -04:00
Joel E. Denny	786a140650	[OpenMP] Use IsHostPtr where needed in rest of omptarget.cpp As started in D107925, this patch replaces the remaining occurrences of `UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin` in `omptarget.cpp` with `IsHostPtr`. The former condition is broken in the rare case that the device and host happen to use the same address for their mapped allocations. I don't know how to write a test that's likely to reveal this case. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107928	2021-09-01 17:31:42 -04:00
Joel E. Denny	d11bab0b73	[OpenMP] Use IsHostPtr where needed for targetDataBegin As discussed in D105990, without this patch, `targetDataBegin` determines whether to transfer data (as opposed to assuming it's in shared memory) using the condition `!UseUSM \|\| HasCloseModifier`. However, this condition is broken if use of discrete memory was forced by `omp_target_associate_ptr`. This patch extends `unified_shared_memory/associate_ptr.c` to reveal this case, and it fixes it using `!IsHostPtr` in `DeviceTy::getTargetPointer` to replace this condition. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107927	2021-09-01 17:31:42 -04:00
Joel E. Denny	fa6c275505	[OpenMP][NFC] Eliminate CopyMember from targetDataEnd This patch is based on comments in D105990. It is NFC according to the following observations: 1. `CopyMember` is computed as `!IsHostPtr && IsLast`. 2. `DelEntry` is true only if `IsLast` is true. We apply those observations in order: ``` if ((DelEntry \|\| Always \|\| CopyMember) && !IsHostPtr) if ((DelEntry \|\| Always \|\| IsLast) && !IsHostPtr) if ((Always \|\| IsLast) && !IsHostPtr) ``` Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107926	2021-09-01 17:31:42 -04:00
Joel E. Denny	8e4836b2a2	[OpenMP] Use IsHostPtr where needed for targetDataEnd As discussed in D105990, without this patch, `targetDataEnd` determines whether to transfer data or delete a device mapping (as opposed to assuming it's in shared memory) using two different conditions, each of which is broken for some cases: 1. `!(UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin)`: The broken case is rare: the device and host might happen to use the same address for their mapped allocations. I don't know how to write a test that's likely to reveal this case, but this patch does fix it, as discussed below. 2. `!UNIFIED_SHARED_MEMORY \|\| HasCloseModifier`: There are at least two broken cases: 1. The `close` modifier might have been specified on an `omp target enter data` but not the corresponding `omp target exit data`, which thus might falsely assume a mapping is in shared memory. The test `unified_shared_memory/close_enter_exit.c` already has a missing deletion as a result, and this patch adds a check for that. This patch also adds the new test `close_member.c` to reveal a missing transfer and deletion. 2. Use of discrete memory might have been forced by `omp_target_associate_ptr`, as in the test `unified_shared_memory/api.c`. In the current `targetDataEnd` implementation, this condition turns out not be used for this case: because the reference count is infinite, a transfer is possible only with an `always` modifier, and this condition is never used in that case. To ensure it's never used for that case in the future, this patch adds the test `unified_shared_memory/associate_ptr.c`. Fortunately, `DeviceTy::getTgtPtrBegin` already has a solution: it reports whether the allocation was found in shared memory via the variable `IsHostPtr`. After this patch, `HasCloseModifier` is no longer used in `targetDataEnd`, and I wonder if the `close` modifier is ever useful on an `omp target data end`. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107925	2021-09-01 17:31:42 -04:00
Jon Chesterfield	cef1199686	Revert "[openmp] No longer use LIBRARY_PATH to find devicertl" This reverts commit `7a228f872f`. Failing test case under CI	2021-09-01 20:44:12 +01:00
Jon Chesterfield	7a228f872f	[openmp] No longer use LIBRARY_PATH to find devicertl Given D109057, change test runner to use the libomptarget-x-bc-path argument instead of the LIBRARY_PATH environment variable to find the device library. Also drop the use of LIBRARY_PATH environment variable as it is far too easy to pull in the device library from an unrelated toolchain by accident with the current setup. No loss in flexibility to developers as the clang commandline used here is still available. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109061	2021-09-01 20:24:34 +01:00
Jon Chesterfield	718e5a9883	[libomptarget] Set runpath on libomptarget, use that to drop LD_LIBRARY_PATH from test runner Using rpath instead of LD_LIBRARY_PATH to find libomp.so and libomptarget.so lets one rerun the already built test executables without setting environment variables and removes the risk of the test runner picking up different libraries to the developer debugging the failure. rpath usually means runpath, which is not transitive, so set runpath on libomptarget itself so that it can find the plugins located next to it, spelled $ORIGIN. This provides sufficient functionality to drop D102043 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D109071	2021-09-01 18:47:56 +01:00
Jon Chesterfield	f8bcbb82a7	[libomptarget] Normalise a cmake debug string, checking it triggers CI	2021-09-01 14:24:28 +01:00
Vignesh Balasubramanian	b9a27908f9	[OpenMP][OMPD] Implementation of OMPD debugging library - libompd. This is a continuation of the review: https://reviews.llvm.org/D100181 Creates a new directory "libompd" under openmp. "TargetValue" provides operational access to the OpenMP runtime memory for OMPD APIs. With TargetValue, using "pointer" a user can do multiple operations from casting, dereferencing to accessing an element for structure. The member functions are designed to concatenate the operations that are needed to access values from structures. e.g., _a[6]->_b._c would read like : TValue(ctx, "_a").cast("A",2) .getArrayElement(6).access("_b").cast("B").access("_c") For example: If you have a pointer "ThreadHandle" of a running program then you can access/retrieve "threadID" from the memory using TargetValue as below. TValue(context, thread_handle->th) /__kmp_threads[t]->th/ .cast("kmp_base_info_t") .access("th_info") /__kmp_threads[t]->th.th_info/ .cast("kmp_desc_t") .access("ds") /__kmp_threads[t]->th.th_info.ds/ .cast("kmp_desc_base_t") .access("ds_thread") /__kmp_threads[t]->th.th_info.ds.ds_thread/ .cast("kmp_thread_t") .getRawValue(thread_id, 1); Reviewed By: @hbae Differential Revision: https://reviews.llvm.org/D100182	2021-09-01 14:50:16 +05:30
Joel E. Denny	1688b4cf8e	[OpenMP][AMDGPU] XFAIL test where kernels call printf	2021-08-31 22:11:28 -04:00
Joel E. Denny	ec1ebcd302	[OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in runtime (2/2) This patch implements OpenMP runtime support for an original OpenMP extension we have developed to support OpenACC: the `ompx_hold` map type modifier. The previous patch in this series, D106509, implements Clang support and documents the new functionality in detail. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D106510	2021-08-31 16:13:49 -04:00
Joel E. Denny	83ddfa0d22	[OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in Clang (1/2) This patch implements Clang support for an original OpenMP extension we have developed to support OpenACC: the `ompx_hold` map type modifier. The next patch in this series, D106510, implements OpenMP runtime support. Consider the following example: ``` #pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x { foo(); // might have map(delete: x) #pragma omp target map(present, alloc: x) // x is guaranteed to be present printf("%d\n", x); } ``` The `ompx_hold` map type modifier above specifies that the `target data` directive holds onto the mapping for `x` throughout the associated region regardless of any `target exit data` directives executed during the call to `foo`. Thus, the presence assertion for `x` at the enclosed `target` construct cannot fail. (As usual, the standard OpenMP reference count for `x` must also reach zero before the data is unmapped.) Justification for inclusion in Clang and LLVM's OpenMP runtime: * The `ompx_hold` modifier supports OpenACC functionality (structured reference count) that cannot be achieved in standard OpenMP, as of 5.1. * The runtime implementation for `ompx_hold` (next patch) will thus be used by Flang's OpenACC support. * The Clang implementation for `ompx_hold` (this patch) as well as the runtime implementation are required for the Clang OpenACC support being developed as part of the ECP Clacc project, which translates OpenACC to OpenMP at the directive AST level. These patches are the first step in upstreaming OpenACC functionality from Clacc. * The Clang implementation for `ompx_hold` is also used by the tests in the runtime implementation. That syntactic support makes the tests more readable than low-level runtime calls can. Moreover, upstream Flang and Clang do not yet support OpenACC syntax sufficiently for writing the tests. * More generally, the Clang implementation enables a clean separation of concerns between OpenACC and OpenMP development in LLVM. That is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's extended OpenMP implementation and test suite without directly considering OpenACC's language and execution model, which can be handled by LLVM's OpenACC developers. * OpenMP users might find the `ompx_hold` modifier useful, as in the above example. See new documentation introduced by this patch in `openmp/docs` for more detail on the functionality of this extension and its relationship with OpenACC. For example, it explains how the runtime must support two reference counts, as specified by OpenACC. Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new command-line option introduced by this patch, is specified. Reviewed By: ABataev, jdoerfert, protze.joachim, grokos Differential Revision: https://reviews.llvm.org/D106509	2021-08-31 16:13:49 -04:00
Shilei Tian	8442967fe3	[OpenMP] Fix task wait doesn't work as expected in serialized team As discussed in D107121, task wait doesn't work when a regular task T depends on a detached task or a hidden helper task T' in a serialized team. The root cause is, since the team is serialized, the last task will not be tracked by `td_incomplete_child_tasks`. When T' is finished, it first releases its dependences, and then decrements its parent counter. So far so good. For the thread that is running task wait, if at the moment it is still spinning and trying to execute tasks, it is fine because it can detect the new task and execute it. However, if it happends to finish the function `flag.execute_tasks(...)`, it will be broken because `td_incomplete_child_tasks` is 0 now. In this patch, we update the rule to track children tasks a little bit. If the task team encounters a proxy task or a hidden helper task, all following tasks will be tracked. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D107496	2021-08-31 12:15:46 -04:00
Joachim Protze	5ea1c37118	[libomptarget][amdcgn] Only add opt/llvm-link dependency if TARGET is available In some build configurations, the target we depend on is not available for declaring the build dependency. We only need to declare the build dependency, if the build target is available in the same build. Fixes the issue raised in https://reviews.llvm.org/D107156#2969862 This patch should go into release/13 together with D108404 Differential Revision: https://reviews.llvm.org/D108868	2021-08-30 17:32:11 +02:00
Shilei Tian	e8fdacfd81	[OpenMP][NVPTX] Fixed missing variables for CUDA free compilation in NVPTX plugin `CU_EVENT_DEFAULT` is defined in CUDA header. It should be added to `openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` for CUDA free build. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108878	2021-08-28 18:08:10 -04:00
Shilei Tian	29df4ab3f3	[OpenMP][Offloading] Add support for event related interfaces This patch adds the support form event related interfaces, which will be used later to fix data race. See D104418 for more details. Reviewed By: jdoerfert, ye-luo Differential Revision: https://reviews.llvm.org/D108528	2021-08-28 16:24:14 -04:00
George Rokos	a2bd44089e	[libomptarget][NFC] Fixed tests which checked for obsolete string "getOrAllocTgtPtr"	2021-08-28 07:35:42 -07:00
Jon Chesterfield	78f92c3810	[openmp][amdgpu] Initial gfx10 offloading implementation Lets wavefront size be 32 for amdgpu openmp, as well as 64. Fixes up as little as possible to pass that through the libraries. This change is end to end, as opposed to updating clang/devicertl/plugin separately. It can be broken up for review/commit if preferred. Posting as-is so that others with a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are probably bugs remaining as well as the todo: for letting grid values vary more. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108708	2021-08-27 12:34:03 +01:00
George Rokos	3819aae6dd	[libomptarget][NFC] Replaced obsolete name "getOrAllocTgtPtr" with new "getTargetPointer" in debug messages.	2021-08-26 18:01:18 -07:00
Jon Chesterfield	3d85342982	[libomptarget][amdgpu][nfc] Rename variables, delete dead code	2021-08-26 19:58:38 +01:00
Jon Chesterfield	68ab93f4d7	[libomptarget][amdgpu][nfc] Rename source files	2021-08-26 18:29:44 +01:00
Jon Chesterfield	a5f4074d85	[libomptarget][amdgpu] Macro for accessing GPU variables from plugin Lets the amdgpu plugin write to omptarget_device_environment to enable debugging. Intend to use in the near future to record the wavesize that a given deviceRTL was compiled with for running on hardware that supports 32 or 64. Patch sets all the attributes that are useful. Notably .data means the variable is set by writing to host memory before copying to the GPU instead of launching a kernel to update the image. Can simplify the plugin slightly to drop the code for patching after load if this is used consistently. NFC on nvptx, cuda plugin seems to work fine without any annotations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108698	2021-08-26 17:28:18 +01:00
Jon Chesterfield	ba0af885e7	[libomptarget][amdgpu][nfc] Make grid value access match devicertl	2021-08-25 15:11:19 +01:00
Jon Chesterfield	9b2c6c07b5	[libomptarget][amdgpu] Refactor debug printing Move most debug printing in rtl.cpp behind DP() macro Adjust the print output for gpu arch mismatch when the architectures match Convert an assert into graceful failure Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108562	2021-08-25 14:57:51 +01:00
Jon Chesterfield	ba8547775b	[libomptarget][amdgpu] Fix debug build from D104696	2021-08-25 01:27:51 +01:00
Michael Kruse	1275ee3041	[OpenMP][amdgcn] Don't use in-tree clang if not available. The use of `$<TARGET_FILE:clang>` was adapted too broadly from D101265. Fixes llvm.org/PR51579 Also see discussion in D108534. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D108640	2021-08-24 12:50:49 -05:00
Pushpinder Singh	9b8b7c1180	[AMDGPU][Libomptarget] Delete g_atl_machine global With uses of g_atl_machine gone, a significant portion of dead code has been removed. This patch depends on D104691 and D104695. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104696	2021-08-24 07:59:40 +00:00
Jon Chesterfield	d26000e4cc	[openmp][devicertl] Freestanding nvptx via stub printf Compiled nvptx devicertl as freestanding, breaking the dependency on host glibc and gcc-multilibs. Thus build it by default. Comes at the cost of #defining out printf. Tried mapping it onto __builtin_printf but that gets transformed back to printf instead of hitting the cuda/openmp lowering transform. Printf could be preserved by one of: - dropping all the standard headers and ffreestanding - providing a header only printf implementation - changing the compiler handling of printf Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D108349	2021-08-23 23:07:47 +01:00
Jon Chesterfield	842f875c8b	[openmp] Use llvm GridValues from devicertl Add include path to the cmakefiles and set the target_impl enums from the llvm constants instead of copying the values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108391	2021-08-23 20:25:24 +01:00
Peyton, Jonathan L	d39d3a327b	[OpenMP][test] fix omp_get_wtime.c test to be more accommodating The omp_get_wtime.c test fails intermittently if the recorded times are off by too much which can happen when many tests are run in parallel. Instead of failing if one timing is a little off, take average of 100 timings minus the 10 worst. Differential Revision: https://reviews.llvm.org/D108488	2021-08-23 08:13:42 -05:00
Vignesh Balasubramanian	589519b9ab	[OpenMP][OMPD]Code movement required for OMPD These changes don't come under OMPD guard as it is a movement of existing code to capture parallel behavior correctly. "Runtime Entry Points for OMPD" like "ompd_bp_parallel_begin" and "ompd_bp_parallel_begin" should be placed at the correct execution point for the debugging tool to access proper handles/data. Without the below changes, in certain cases, debugging tool will pick the wrong parallel and task handle. Reviewed By: @hbae Differential Revision: https://reviews.llvm.org/D100366	2021-08-20 14:36:22 +05:30
Shilei Tian	1d8d43ae61	[OpenMP] Use `__kmpc_give_task` in `__kmp_push_task` when encountering a hidden helper task This patch replaces the current implementation, overwrites `gtid` and `thread`, with `__kmpc_give_task`. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D106977	2021-08-19 20:49:29 -04:00
Joachim Protze	4bb36df144	[libomptarget][amdcgn] Add build dependency for llvm-link and opt D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime (LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same cmake instance. We could limit the dependency to cases where libomptarget/plugins are really built. But compared to the whole llvm project, building openmp runtime is negligible and postponing the build of OpenMP runtime after the dependencies are ready seems reasonable. The direct dependency introduced in D107156 and D107320 is necessary for the case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp). Differential Revision: https://reviews.llvm.org/D108404	2021-08-20 01:57:58 +02:00
Jennifer Yu	c274b19866	Add implicit map for a list item appears in a reduction clause. A new rule is added in 5.0: If a list item appears in a reduction, lastprivate or linear clause on a combined target construct then it is treated as if it also appears in a map clause with a map-type of tofrom. Currently map clauses for all capture variables are added implicitly. But missing for list item of expression for array elements or array sections. The change is to add implicit map clause for array of elements used in reduction clause. Skip adding map clause if the expression is not mappable. Noted: For linear and lastprivate, since only variable name is accepted, the map has been added though capture variables. To do so: During the mappable checking, if error, ignore diagnose and skip adding implicit map clause. The changes: 1> Add code to generate implicit map in ActOnOpenMPExecutableDirective, for omp 5.0 and up. 2> Add extra default parameter NoDiagnose in ActOnOpenMPMapClause: Use that to skip error as well as skip adding implicit map during the mappable checking. Note: there are only tow places need to be check for NoDiagnose. Rest of them either the check is for < omp 5.0 or the error already generated for reduction clause. Differential Revision: https://reviews.llvm.org/D108132	2021-08-19 12:53:47 -07:00
Jon Chesterfield	ad0f6e1d98	[openmp] Disable the tests that block CI for amdgpu and host offloading.	2021-08-19 20:43:30 +01:00
Jon Chesterfield	6c75ce1b8b	[libomptarget][nfc] Move lanemask_t type into target_impl.h	2021-08-19 18:50:03 +01:00
Jon Chesterfield	77579b99e9	[openmp][nfc] Replace OMPGridValues array with struct [nfc] Replaces enum indices into an array with a struct. Named the fields to match the enum, leaves memory layout and initialization unchanged. Motivation is to later safely remove dead fields and replace redundant ones with (compile time) computation. It should also be possible to factor some common fields into a base and introduce a gfx10 amdgpu instance with less duplication than the arrays of integers require. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108339	2021-08-19 13:25:42 +01:00
Jon Chesterfield	f420939b82	[libomptarget] Apply D106710 to amdgcn devicertl	2021-08-19 01:34:33 +01:00
Jon Chesterfield	c480792b6a	[libomptarget][nfc][devicertl] Delete unused enums	2021-08-19 00:14:34 +01:00
Jon Chesterfield	21d91a8ef3	[libomptarget][devicertl] Replace lanemask with uint64 at interface Use uint64_t for lanemask on all GPU architectures at the interface with clang. Updates tests. The deviceRTL is always linked as IR so the zext and trunc introduced for wave32 architectures will fold after inlining. Simplification partly motivated by amdgpu gfx10 which will be wave32 and is awkward to express in the current arch-dependant typedef interface. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108317	2021-08-18 20:47:33 +01:00
Joseph Huber	edb8acdc6e	[Libomptarget] Correctly default to Generic if exec_mode is not present Currently, the runtime returns an error when the `exec_mode` global is not present. The expected behvaiour is that the region will default to Generic. This prevents global constructors from being called because they do not contain execution mode globals. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108255	2021-08-18 11:24:28 -04:00
Martin Storsjö	f5616a981c	[OpenMP] Fix the usage of sscanf on MinGW KMP_SSCANF only evaluates to sscanf_s within #if KMP_OS_WINDOWS && KMP_MSVC_COMPAT so we need to pass the sscanf_s specific parameters within a similar condition. Differential Revision: https://reviews.llvm.org/D108196	2021-08-17 21:36:09 +03:00
Peyton, Jonathan L	b4a1f441d9	[OpenMP] Add a few small fixes * Add comment to help ensure new construct data are added in two places * Check for division by zero in the loop worksharing code * Check for syntax errors in parrange parsing Differential Revision: https://reviews.llvm.org/D105929	2021-08-16 10:02:49 -05:00
Peyton, Jonathan L	6eeb4c1f32	[OpenMP] Fix incorrect parameters to sscanf_s call On Windows, the documentation states that when using sscanf_s, each %c and %s specifier must also have additional size parameter. This patch adds the size parameter in the one place where %c is used. Differential Revision: https://reviews.llvm.org/D105931	2021-08-16 09:59:21 -05:00
AndreyChurbanov	52cac541d4	[OpenMP] libomp: cleanup: minor fixes to silence static analyzer. Added couple more checks to silence KlocWork static code analyzer. Differential Revision: https://reviews.llvm.org/D107348	2021-08-16 13:39:23 +03:00
AndreyChurbanov	f94da67f49	[OpenMP][NFC] libomp: reduced timeouts in the test from 50 to 2 sec.	2021-08-11 17:58:52 +03:00
George Rokos	df06ec3057	[libomptarget][NFC] Fix compilation issue with GCC Removed redundant assignment from condition which causes gcc to emit the following error: error: operation on ‘MoveData’ may be undefined [-Werror=sequence-point]	2021-08-10 09:43:43 -07:00
Joel E. Denny	2ced1f338a	[OpenMP][NFC] Simplify targetDataEnd conditions for CopyMember targetDataEnd and targetDataBegin compute CopyMember/copy differently, and I don't see why they should. This patch eliminates one of those differences by making a simplifying NFC change to targetDataEnd. The change is NFC as follows. The change only affects the case when `!UNIFIED_SHARED_MEMORY \|\| HasCloseModifier`. In that case, the following points are always true: * The value of CopyMember is relevant later only if DelEntry = false. * DelEntry = false only if one of the following is true: * IsLast = false. In this case, it's always true that CopyMember = false = IsLast. * `MEMBER_OF && !PTR_AND_OBJ` is true. In this case, CopyMember = IsLast. * Thus, if CopyMember is relevant, CopyMember = IsLast. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D105990	2021-08-10 12:29:55 -04:00
Pirama Arumuga Nainar	49fabd9d76	[openmp] Do not use shared memory on Android Android provides ashmem/ASharedMemory support on newer releases, which we can use if requested by openmp users on Android. Also refactor the preprocessor check for using shared memory to kmp_config.h.cmake. Differential Revision: https://reviews.llvm.org/D107181	2021-08-09 09:41:32 -07:00
Dimitry Andric	400cd6d2f0	[libomptarget][amdgpu] use --allow-shlib-undefined to link on FreeBSD On FreeBSD, the `environ` symbol is undefined at link time for shared libraries, but resolved by the dynamic linker at runtime. Therefore, allow the symbol to be undefined when creating a shared library, by using the `--allow-shlib-undefined` linker flag, instead of `-z defs` (a.k.a `--no-undefined`). Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107698	2021-08-08 13:52:44 +02:00
Ye Luo	262289c103	[OpenMP] mark target task untied OpenMP specification Tasking Terminology target task :A mergeable and untied task that ... Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D107686	2021-08-07 12:31:20 -04:00
Dimitry Andric	71ae2e0221	[libomptarget][amdgpu] don't declare Elf_Note on FreeBSD On FreeBSD, the system `<libelf.h>` already declares `struct Elf_Note` indirectly (via `<sys/elf_common.h>`). This results in compile errors when building the libomptarget amdgpu plugin. Avoid redeclaring `struct Elf_Note` on FreeBSD to fix the errors. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107661	2021-08-06 21:45:26 +02:00
Shilei Tian	28939b6ae5	[NFC] Clean up and clang-format openmp/libomptarget/plugins/cuda/src/rtl.cpp	2021-08-05 22:32:28 -04:00
Shilei Tian	680c71b127	[OpenMP] Clean up for hidden helper task This patch makes some clean up for code of hidden helper task. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D107008	2021-08-04 12:36:44 -04:00
Shilei Tian	9f5d6ea52e	[OpenMP] Fix performance regression reported in bug #51235 This patch fixes the "performance regression" reported in https://bugs.llvm.org/show_bug.cgi?id=51235. In fact it has nothing to do with performance. The root cause is, the stolen task is not allowed to execute by another thread because by default it is tied task. Since hidden helper task will always be executed by hidden helper threads, it should be untied. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D107121	2021-08-04 12:34:49 -04:00
Lechen Yu	3bc8ce5dd7	[openmp] Add OMPT initialization in libomptarget When loading libomptarget, the init function in libomptarget/src/rtl.cpp will search for the libomptarget_start_tool function using libdl. libomptarget_start_tool will pass those OMPT callbacks related to target constructs to libomptarget Differential Revision: https://reviews.llvm.org/D99803	2021-08-04 18:00:11 +02:00
AndreyChurbanov	8e29b4b323	[OpenMP] libomp: taskwait depend implementation fixed. Fix for https://bugs.llvm.org/show_bug.cgi?id=49723. Eliminated references from task dependency hash to node allocated on stack, thus eliminated accesses to stale memory. So the node now never freed. Uncommented assertion which triggered when stale memory accessed. Removed unneeded ref count increment for stack allocated node. Differential Revision: https://reviews.llvm.org/D106705	2021-08-03 15:45:20 +03:00
Jon Chesterfield	567c8c7bfd	[libomptarget][nfc] Only set cuda-path for nvptx tests Remove --cuda-path=CUDA_TOOLKIT_ROOT_DIR-NOTFOUND from the invocation of non-nvptx test cases. Better signal to noise ratio on other architectures. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D107074	2021-07-30 23:01:09 +01:00
Jose M Monsalve Diaz	5424ceeda0	[OpenMP] Fixing llvm-omp-device-info compilation with runtimes When using `-DLLVM_ENABLED_RUNTIMES` instead of `-DLLVM_ENABLED_PROJECTS` the `llvm-omp-device-info` tool is not compiled or installed. In general, no llvm tool would be build on runtimes, because the -DLLVM_BUILD_TOOLS flag is removed by the way runtimes compilation calls cmake again. This patch is simple. Just forward the value of this flag to the runtime cmake command. I'm also removing an unnecessary comment in the compilation of the tool Differential Revision: https://reviews.llvm.org/D107177	2021-07-30 13:09:08 -05:00
Shilei Tian	36d53af4a9	[OpenMP][Offloading] Remove task wait in nowait interfaces All `nowait` series of interfaces in `libomptarget` accept four more arguments (`int32_t depNum, void depList, int32_t noAliasDepNum, void noAliasDepList`) compared with their counterparts w/o `nowait`. These extra arguments were expected for dependence resolution, potentially lowered to device side. Current implementation calls `libomp` function `__kmpc_omp_taskwait`. However, the front end simply ignores them, that these four arguments are not emitted at all. As a consequence, the `depNum` and `noAliasDepNum` are garbage, which could lead to unnecessary task wait. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107164	2021-07-30 11:39:46 -04:00
AndreyChurbanov	8b81524c6d	[OpenMP][NFC] libomp: silence warnings on unused variables. Put declarations/definitions of unused variables under corresponding macros to silence clang build warnings. Differential Revision: https://reviews.llvm.org/D106608	2021-07-30 17:04:42 +03:00
Joachim Protze	4ffa1478fd	[libomptarget][amdcgn] Add build dependency for opt This patch should fix the build we observe when building LLVM from scratch. Differential Revision: https://reviews.llvm.org/D107156	2021-07-30 15:45:13 +02:00
Terry Wilmarth	d8e4cb9121	[OpenMP] libomp: Add new experimental barrier: two-level distributed barrier Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper barrier. This barrier is designed to handle fine granularity parallelism where barriers are used frequently with little compute and memory access between barriers. There is no need to use it for codes with few barriers and large granularity compute, or memory intensive applications, as little difference will be seen between this barrier and the default hyper barrier. This barrier is designed to work optimally with a fixed number of threads, and has a significant setup time, so should NOT be used in situations where the number of threads in a team is varied frequently. The two-level distributed barrier is off by default -- hyper barrier is used by default. To use this barrier, you must set all barrier patterns to use this type, because it will not work with other barrier patterns. Thus, to turn it on, the following settings are required: KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER, and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed barrier. Patch fixed for ITTNotify disabled builds and non-x86 builds Co-authored-by: Jonathan Peyton <jonathan.l.peyton@intel.com> Co-authored-by: Vladislav Vinogradov <vlad.vinogradov@intel.com> Differential Revision: https://reviews.llvm.org/D103121	2021-07-29 14:09:26 -05:00
Joachim Protze	4acc2f29a2	[OpenMP][Tools][Tests][NFC] Address flaky archer tests Adding more concurrent threads significantly increases the chance that the data race can be observed during testing.	2021-07-29 17:56:44 +02:00
Jon Chesterfield	a90da62adb	[libomptarget][amdgpu] Update printed plugin name	2021-07-29 14:46:42 +01:00
Jose M Monsalve Diaz	88e66fa60a	[OpenMP] Fixing missing variables when CUDA SDK not in system This patch fixes the error reported in D106751. When there is no CUDA SDK installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE` variables. Using @zsrkmyn sugested fix Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106933	2021-07-27 23:46:15 -05:00
Jose M Monsalve Diaz	313c523995	[OpenMP][Tool] Introducing the `llvm-omp-device-info` tool This patch introduces the `llvm-omp-device-info` tool, which uses the omptarget library and interface to query the device info from all the available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo` Since omptarget usually requires a description structure with executable kernels, I split the initialization of the RTLs and Devices to be able to initialize all possible devices and query each of them. This revision relies on the patch that introduces the print device info. A limitation is that the order in which the devices are initialized, and the corresponding device ID is not necesarily the one seen by OpenMP. The changes are as follows: 1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function 2. Create an `initAllRTLs` method that initializes all available RTLs at runtime 3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information. Example Output: ``` Device (0): print_device_info not implemented Device (1): print_device_info not implemented Device (2): print_device_info not implemented Device (3): print_device_info not implemented Device (4): CUDA Driver Version: 11000 CUDA Device Number: 0 Device Name: Quadro P1000 Global Memory Size: 4236312576 bytes Number of Multiprocessors: 5 Concurrent Copy and Execution: Yes Total Constant Memory: 65536 bytes Max Shared Memory per Block: 49152 bytes Registers per Block: 65536 Warp Size: 32 Threads Maximum Threads per Block: 1024 Maximum Block Dimensions: 1024, 1024, 64 Maximum Grid Dimensions: 2147483647 x 65535 x 65535 Maximum Memory Pitch: 2147483647 bytes Texture Alignment: 512 bytes Clock Rate: 1480500 kHz Execution Timeout: Yes Integrated Device: No Can Map Host Memory: Yes Compute Mode: DEFAULT Concurrent Kernels: Yes ECC Enabled: No Memory Clock Rate: 2505000 kHz Memory Bus Width: 128 bits L2 Cache Size: 1048576 bytes Max Threads Per SMP: 2048 Async Engines: Yes (2) Unified Addressing: Yes Managed Memory: Yes Concurrent Managed Memory: Yes Preemption Supported: Yes Cooperative Launch: Yes Multi-Device Boars: No Compute Capabilities: 61 ``` Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106752	2021-07-27 22:38:35 -04:00
Jose M Monsalve Diaz	d2f85d0910	[OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget` This patch introduces a function in the device's plugin to print the device information. This patch relates to another patch that introduces a CLI tool to obtain the device information from the omplibrary directly. It is inspired by PGI's pgaccelinfo. The modifications are as follows: 1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL. 2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented 3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy` 4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106751	2021-07-27 21:47:57 -04:00
Jose M Monsalve Diaz	5ab6aedda9	[OpenMP] Folding threadLimit and numThreads when single value in kernels The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block` and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and `NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions. The code checks all the kernels, and if their attributes match, the functions are folded. In the future we will explore specializing for multiple values of NumThreads and NumTeams. Depends on D106390 Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D106033	2021-07-27 21:47:12 -04:00
Johannes Doerfert	ed7ec860f0	[OpenMP] Improve alignment handling in the new device runtime	2021-07-27 17:50:27 -05:00
Joseph Huber	e3ee76245e	[Libomptarget] Revert new variable sharing to use the old method The new method of sharing variables introduces a `__kmpc_alloc_shared` call that cannot be removed in the middle end because of its non-constant argument and unconnected free. This patch reverts this to the old method that used a static amount of shared memory for sharing variables. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106905	2021-07-27 18:14:01 -04:00

... 2 3 4 5 6 ...

2170 Commits