llvm-project

Commit Graph

Author	SHA1	Message	Date
AndreyChurbanov	e38a1deb66	[OpenMP] libomp: disable definitions of 5.1 atomics for non-x86 arch. Declarations of 5.1 atomic entries were added under "#if KMP_ARCH_X86 \|\| KMP_ARCH_X86_64" in kmp_atomic.h, but definitions of the functions missed architecture guard in kmp_atomic.cpp. As a result mangled symbols were available on non-x86 architecture. The patch eliminates these unexpected symbols from the library. Differential Revision: https://reviews.llvm.org/D112261	2021-10-25 21:17:26 +03:00
Vladimir Inđić	f41d08540b	[OpenMP][OMPT] thread_num determination during execution of nested serialized parallel regions __ompt_get_task_info_internal function is adapted to support thread_num determination during the execution of multiple nested serialized parallel regions enclosed by a regular parallel region. Consider the following program that contains parallel region R1 executed by two threads. Let the worker thread T of region R1 executes serialized parallel regions R2 that encloses another serialized parallel region R3. Note that the thread T is the master thread of both R2 and R3 regions. Assume that __ompt_get_task_info_internal function is called with the argument "ancestor_level == 1" during the execution of region R3. The function should determine the "thread_num" of the thread T inside the team of region R2, whose implicit task is at level 1 inside the hierarchy of active tasks. Since the thread T is the master thread of region R2, one should expected that "thread_num" takes a value 0. After the while loop finishes, the following stands: "lwt != NULL", "prev_lwt == NULL", "prev_team" represents the team information about the innermost serialized parallel region R3. This results in executing the assignment "thread_num = prev_team->t.t_master_tid". Note that "prev_team->t.t_master_tid" was initialized at the moment of R2’s creation and represents the "thread_num" of the thread T inside the region R1 which encloses R2. Since the thread T is the worker thread of the region R1, "the thread_num" takes value 1, which is a contradiction. This patch proposes to use "lwt" instead of "prev_lwt" when determining the "thread_num". If "lwt" exists, the task at the requested level belongs to the serialized parallel region. Since the serialized parallel region is executed by one thread only, the "thread_num" takes value 0. Similarly, assume that __ompt_get_task_info_internal function is called with the argument "ancestor_level == 2" during the execution of region R3. The function should determine the "thread_num" of the thread T inside the team of region R1. Since the thread is the worker inside the region R1, one should expected that "thread_num" takes value 1. After the loop finishes, the following stands: "lwt == NULL", "prev_lwt != NULL", "prev_team" represents the team information about the innermost serialized parallel region R3. This leads to execution of the assignment "thread_num = 0", which causes a contradiction. Ignoring the "prev_lwt" leads to executing the assignment "thread_num = prev_team->t.t_master_tid" instead. From the previous explanation, it is obvious that "thread_num" takes value 1. Note that the "prev_lwt" variable is marked as unnecessary and thus removed. This patch introduces the test case which represents the OpenMP program described earlier in the summary. Differential Revision: https://reviews.llvm.org/D110699	2021-10-25 18:21:20 +02:00
Vladimir Inđić	f2410bfb1c	[OpenMP][OMPT][clang] task frame support fixed in __kmpc_fork_call __kmp_fork_call sets the enter_frame of the active task (th_curren_task) before new parallel region begins. After the region is finished, the enter_frame is cleared. The old implementation of __kmpc_fork_call didn’t clear the enter_frame of active task. Also, the way of initializing the enter_frame of the active task was wrong. Consider the following two OpenMP programs. The first program: Let R1 be the serialized parallel region that encloses another serialized parallel region R2. Assume that thread that executes R2 is going to create a new serialized parallel region R3 by executing __kmpc_fork_call. This thread is responsible to set enter_frame of R2's implicit task. Note that the information about R2's implicit task is present inside master_th->th.th_current_task at this moment, while lwt represents the information about R1's implicit task. The old implementation uses lwt and resets enter_frame of R1's implicit task instead of R2's implicit task. The new implementation uses master_th->th.th_current_task instead. The second program: Consider the OpenMP program that contains parallel region R1 which encloses an explicit task T. Assume that thread should create another parallel region R2 during the execution of the T. The __kmpc_fork_call is responsible to create R2 and set enter frame of T whose information is present inside the master_th->th.th_current_task. Old implementation tries to set the frame of parent_team->t.t_implicit_task_taskdata[tid] which corresponds to the implicit task of the R1, instead of T. Differential Revision: https://reviews.llvm.org/D112419	2021-10-25 18:21:19 +02:00
Joachim Protze	7368227965	[OpenMP][Tests] Test omp_get_wtime for invariants As discussed in D108488, testing for invariants of omp_get_wtime would be more reliable than testing for duration of sleep, as return from sleep might be delayed due to system load. Alternatively/in addition, we could compare the time measured by omp_get_wtime to time measured with C++11 chrono (for portability?). Differential Revision: https://reviews.llvm.org/D112458	2021-10-25 18:20:59 +02:00
Joachim Protze	3f229f42b7	[OpenMP][Tests][NFC] Actually check for test outcome The CHECK: line in the test had no effect, because the test does not pipe to FileCheck. Since the test only checks for a single value, encode the result in the return value of the test.	2021-10-25 18:20:12 +02:00
Joachim Protze	047890bc3f	[OpenMP][Tests][NFC] Mark tests trying to link COI as unsupported For some tests with target-related functionality icc 18/19 tries to link libioffload_target.so.5, which fails for missing COI symbols.	2021-10-25 18:20:12 +02:00
Joachim Protze	d7fdd236d5	[OpenMP][Tests][NFC] Replace atomic increment by reduction Also mark the test as unsupported by intel-21, because the test does not terminate	2021-10-25 18:20:12 +02:00
Joachim Protze	38f78dd2e2	[OpenMP][Tools][NFC] Fix C99-style declaration of iteration variables Where possible change to declare the variable before the loop. Where not possible, specifically request -std=c99 (could be limited to specific compilers like icc).	2021-10-25 18:20:12 +02:00
Joachim Protze	d29a7d23ec	[OpenMP][Tools][NFC] Pass intel license ENV to lit	2021-10-25 18:20:11 +02:00
Kazu Hirata	d8e4170b0a	Ensure newlines at the end of files (NFC)	2021-10-23 08:45:29 -07:00
Jon Chesterfield	bf6f955f39	[libomptarget] Run GPU offloading tests on both new and old runtime Implemented by patching python config instead of modifying all the tests so that -generic and XFAIL work as usual. Expectation is for this to be reverted once the old runtime is deleted. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D112225	2021-10-22 23:28:44 +01:00
Vladimir Inđić	ba02586fbe	[OpenMP][OMPT][GOMP] task frame support in KMP_API_NAME_GOMP_PARALLEL_SECTIONS KMP_API_NAME_GOMP_PARALLEL_SECTIONS function was missing the task frame support. This patch introduced a fix responsible to set properly the exit_frame of the innermost implicit task that corresponds to the parallel section construct, as well as the enter_frame of the task that encloses the mentioned implicit task. This patch also introduced a simple test case sections_serialized.c that contains serialized parallel section construct and validates whether the mentioned task frames are set correctly. Differential Revision: https://reviews.llvm.org/D112205	2021-10-22 11:01:10 -05:00
AndreyChurbanov	52f4922ebb	[OpenMP][NFC] skip atomic tests for non-x86 arch	2021-10-21 21:51:33 +03:00
Jon Chesterfield	a602c2b51d	[libomptarget][DeviceRTL] Generalise and simplify cmakelists Step towards building the DeviceRTL for amdgpu. Mostly replaces cuda-specific toolchain finding logic with the generic logic currently found in the amdgpu deviceRTL cmake. Also deletes dead code and changes the default to build on systems without cuda installed, as the library doesn't use cuda and the amdgpu-only systems generally won't have cuda installed. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111983	2021-10-21 16:14:29 +01:00
Nawrin Sultana	99d1ce4a62	[OpenMP] Add GOMP allocator functions This patch adds GOMP_alloc and GOMP_free functions of LIBGOMP. Differential revision: https://reviews.llvm.org/D111673	2021-10-20 11:37:29 -05:00
Joseph Huber	b1ce454930	[OpenMP] Remove macro guards for device debugging The plugin currently uses a macro to check if this is a debug built before assigning the debug kind variable to the device environment struct. This is being deprecated because the new device runtime does not maintain separate debug builds and should always be availible. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112083	2021-10-19 12:21:43 -04:00
Jon Chesterfield	7272982e1d	[libomptarget] Refactor DeviceRTL prior to AMDGPU bringup Subset of D111993. Fix typos, rename read to load. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111999	2021-10-19 08:05:06 +01:00
AndreyChurbanov	63f8099e23	[OpenMP] libomp: add check of task function pointer for NULL. This patch allows to simplify compiler implementation on "taskwait nowait" construct. The "taskwait nowait" is semantically equivalent to the empty task. Instead of creating an empty routine as a task entry, compiler can just send NULL pointer to the runtime. Then the runtime will make all the work with dependences and return because of the absent task routine. Differential Revision: https://reviews.llvm.org/D112015	2021-10-18 19:48:30 +03:00
Jon Chesterfield	251b1e7c25	[libomptarget] Pass OMP_TARGET_OFFLOAD env variable through to tests Useful for OMP_TARGET_OFFLOAD=MANDATORY when testing Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111995	2021-10-18 16:03:03 +01:00
@vladaindjic	59a994e8da	[OpenMP][OMPT] thread_num determination for programs with explicit tasks __ompt_get_task_info_internal is now able to determine the right value of the “thread_num” argument during the execution of an explicit task. During the execution of a while loop that iterates over the ancestor tasks hierarchy, the “prev_team” variable was always set to “team” variable at the beginning of each loop iteration. Assume that the program contains a parallel region which encloses an explicit task executed by the worker thread of the region. Also assume that the tool inquires the “thread_num” of a worker thread for the implicit task that corresponds to the region (task at “ancestor_level == 1”) and expects to receive the value of “thread_num > 0”. After the loop finishes, both “team” and “prev_team” variables are equal and point to the team information of the parallel region. The “thread_num” is set to “prev_team->t.t_master_tid”, that is equal to “team->t.t_master_tid”. In this case, “team->t.t_master_tid” is 0, since the master thread of the region is the initial master thread of the program. This leads to a contradiction. To prevent this, “prev_team” variable is set to “team” variable only at the time when the loop that has already encountered the implicit task (“taskdata” variable contains the information about an implicit task) continues iterating over the implicit task’s ancestors, if any. After the mentioned loop finishes, the “prev_team” variable might be equal to NULL. This means that the task at requested “ancestor_level” belongs to the innermost parallel region, so the “thread_num” will be determined by calling the “__kmp_get_tid”. To prove that this patch works, the test case “explicit_task_thread_num.c” is provided. It contains the example of the program explained earlier in the summary. Differential Revision: https://reviews.llvm.org/D110473	2021-10-18 13:54:22 +02:00
Joachim Protze	c93fb143b9	[OpenMP][Tests][NFC] Work around ICC bug Older intel compilers miss the privatization of nested loop variables for doacross loops. Declaring the variable in the loop makes the test more robust.	2021-10-18 13:54:15 +02:00
Joachim Protze	5918688248	[OpenMP][Tests][NFC] Flagging OMPT tests as XFAIL for Intel compilers With Intel 19 compiler the teams tests fail to link while trying to link liboffload.	2021-10-18 13:50:03 +02:00
Shilei Tian	2c941fa2f9	[OpenMP][deviceRTLs] Fix wrong return value of `__kmpc_is_spmd_exec_mode` D110279 introduced a bug to the device runtime. In `__kmpc_parallel_51`, we detect whether we are already in parallel region by `__kmpc_parallel_level() > __kmpc_is_spmd_exec_mode()`. It is based on the assumption that: - In SPMD mode, parallel level is initialized to 1. - In generic mode, parallel level is initialized to 0. - `__kmpc_is_spmd_exec_mode` returns `1` for SPMD mode, 0 otherwise. Because the return value type of `__kmpc_is_spmd_exec_mode` is `int8_t`, there was an implicit cast from `bool` to `int8_t`. We can make sure it is either 0 or 1 since C++14. In D110279, the return value is the result of an `and` operation, which is 2 in SPMD mode. This breaks the assumption in `__kmpc_parallel_51`. Reviewed By: carlo.bertolli, dpalermo Differential Revision: https://reviews.llvm.org/D111905	2021-10-16 12:58:29 -04:00
Joachim Protze	26b675d65e	[OpenMP][Tools][NFC] Make an Archer test more robust The execution order of the tasks is not fixed, so there is no ordering for the write accesses. Enforce the ordering that is expected in the check.	2021-10-15 17:32:05 +02:00
Peyton, Jonathan L	acb3b187c4	[OpenMP][host runtime] Add initial hybrid CPU support Detect, through CPUID.1A, and show user different core types through KMP_AFFINITY=verbose mechanism. Offer future runtime optimizations __kmp_is_hybrid_cpu() to know whether running on a hybrid system or not. Differential Revision: https://reviews.llvm.org/D110435	2021-10-14 16:49:42 -05:00
Peyton, Jonathan L	b840d3ab0d	[OpenMP][host runtime] small fixup of RTM CPUID bit check	2021-10-14 16:49:42 -05:00
Peyton, Jonathan L	50b68a3d03	[OpenMP][host runtime] Add support for teams affinity This patch implements teams affinity on the host. The default is spread. A user can specify either spread, close, or primary using KMP_TEAMS_PROC_BIND environment variable. Unlike OMP_PROC_BIND, KMP_TEAMS_PROC_BIND is only a single value and is not a list of values. The values follow the same semantics under the OpenMP specification for parallel regions except T is the number of teams in a league instead of the number of threads in a parallel region. Differential Revision: https://reviews.llvm.org/D109921	2021-10-14 16:30:28 -05:00
AndreyChurbanov	621d7a75b1	[OpenMP] libomp: add atomic functions for new OpenMP 5.1 atomics. Added functions those implement "atomic compare". Though clang does not use library interfaces to implement OpenMP atomics, the functions added for consistency. Also added missed functions for 80-bit floating min/max atomics. Differential Revision: https://reviews.llvm.org/D110109	2021-10-13 21:02:18 +03:00
AndreyChurbanov	6e98ec9b20	[OpenMP] libomp: fix ittnotify usage. Replaced storing of ittnotify domain array index into location info structure (which is now read-only) with storing of (location info address + ittnotify domain + team size) into hash map. Replaced __kmp_itt_barrier_domains and __kmp_itt_imbalance_domains arrays with __kmp_itt_barrier_domains hash map; __kmp_itt_region_domains and __kmp_itt_region_team_size arrays with __kmp_itt_region_domains hash map. Basic functionality did not change (at least tried to not change). The patch fixes https://bugs.llvm.org/show_bug.cgi?id=48644. Differential Revision: https://reviews.llvm.org/D111580	2021-10-13 20:49:05 +03:00
AndreyChurbanov	5e58b63b28	[OpenMP] libomp: fix warning on comparison of integer expressions of different signedness Replaced macro with global variable of correspondent type. Differential Revision: https://reviews.llvm.org/D111562	2021-10-13 20:11:47 +03:00
AndreyChurbanov	f5c0c9179f	[OpenMP] libomp: add OpenMP 5.1 memory allocation routines. Aligned allocation routines added. Fortran interfaces added for all allocation routines. Differential Revision: https://reviews.llvm.org/D110923	2021-10-11 19:25:00 +03:00
Ron Lieberman	d022f39d9f	[libomptarget][amdgpu][NFC] tweak a comment	2021-10-09 12:51:53 -04:00
Joseph Huber	bad44d5f39	[OpenMP] Add RTL function for getting number of threads in block. This patch adds support for the `__kmpc_get_hardware_num_threads_in_block` function that returns the number of threads. This was missing in the new runtime and was used by the AMDGPU plugin which prevented it from using the new runtime. This patchs also unified the interface for getting the thread numbers in the frontend. Originally authored by jdoerfert. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D111475	2021-10-08 22:21:59 -04:00
Joseph Huber	85ad566335	[OpenMP] Avoid calling `isSPMDMode` during RT initialization Until we hit the first barrier we should not call `mapping::isSPMDMode` with all threads. Instead, we now have (and use during initialization) a `mapping::isMainThreadInGenericMode` overload that takes the known SPMD-mode state and one that queries it. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111381	2021-10-08 22:00:41 -04:00
Joseph Huber	208f900527	[Libomptarget] Add an external interface to dynamic shared memory This patch adds an external interface to access the dynamic shared memory buffer in the device runtime. The function introduced is ``llvm_omp_get_dynamic_shared``. This includes a host-side definition that only returns a null pointer so that it can be used when host-fallback is enabled without crashing. Support for dynamic shared memory was also ported to the old device runtime. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110957	2021-10-08 15:36:57 -04:00
Shilei Tian	c060c634ef	[OpenMP][NVPTX] Fix an error in configuring #teams and #threads It must be a copy mistake. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D111407	2021-10-08 11:07:43 -04:00
Shilei Tian	af4599b8ab	[OpenMP][DeviceRTL] Add the support for printf in a freestanding way For NVPTX, `printf` can be used just with a function declaration. For AMDGCN, an function definition is added, but it simply returns. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109728	2021-10-07 22:15:37 -04:00
Johannes Doerfert	44710940af	[OpenMP][FIX] Data race in the SPMD execution of the new runtime We need to synchronize the threads before we destroy the RAII objects that hold the old values and not after to avoid threads executing the parallel region but seeing an inconsistent state. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111369	2021-10-07 21:01:24 -04:00
Jon Chesterfield	1bc3a6e41b	[libomptarget] Reapply `2bc4d48a78` which was accidentally reverted	2021-10-07 20:17:48 +01:00
Jon Chesterfield	0c554a4769	[libomptarget] Move device environment to shared header, remove divergence Follow on to D110006, related to D110957 Where implementations have diverged this resolves to match the new DeviceRTL - replaces definitions of this struct in deviceRTL and plugins with include - changes the dynamic_shared_size field from D110006 to 32 bits - handles stdint being unavailable in DeviceRTL - adds a zero initializer for the field to amdgpu - moves the extern declaration for deviceRTL to target_interface (omptarget.h is more natural, but doesn't work due to include order with debug.h) - Renames the fields everywhere to match the LLVM format used in DeviceRTL - Makes debug_level uint32_t everywhere (previously sometimes int32_t) Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D111069	2021-10-07 12:03:48 +01:00
Michał Górny	0873b9bef4	[openmp] [elf_common] Fix linking against LLVM dylib The hand-rolled linking logic in elf_common does not account for the possibility of using LLVM dylib rather than a dozen static libraries. Since it does not seem to be easily convertible to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB. This is necessary to support stand-alone builds against installed LLVM. Differential Revision: https://reviews.llvm.org/D111038	2021-10-04 09:29:06 +02:00
Martin Storsjö	dec2257f35	[openmp] Fix a typo in a test REQUIRES line Differential Revision: https://reviews.llvm.org/D110963	2021-10-03 23:51:11 +03:00
Peyton, Jonathan L	343b9e8590	[OpenMP][host runtime] Introduce kmp_cpuinfo_flags_t to replace integer flags Store CPUID support flags as bits instead of using entire integers. Differential Revision: https://reviews.llvm.org/D110091	2021-10-01 11:08:39 -05:00
Peyton, Jonathan L	957b4c5750	[OpenMP][testing] increase threshold for omp_get_wtime test	2021-10-01 11:07:41 -05:00
Jon Chesterfield	05ba9ff6a6	[libomptarget][amdgpu] Refactor memory pool collection	2021-10-01 14:58:01 +01:00
Jon Chesterfield	72e8a4c45d	[openmp][docs] Describe how the internal components are found Add a FAQ entry about the names of openmp offloading components and how they are searched for. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D109619	2021-09-30 22:05:12 +01:00
Jon Chesterfield	3247329107	[openmp] Add addrspacecast to getOrCreateIdent Fixes 51982. Adds a missing CreatePointerCast and allocates a global in the correct address space. Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\ blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting parts while checking the assertion failure still occurred. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110556	2021-09-30 21:36:31 +01:00
Jon Chesterfield	b75a7481ba	[libomptarget] Apply D110029 to amdgpu Use enum for execution mode. This is partly a port from ROCm and partly a port from D110029. Attempted to make the same choices as ROCm as far as comments etc go to reduce the merge conflicts. There is some cleanup warranted here - in particular I like the cuda patch factoring out the comparisons into named variables - but I'd like to leave that for a follow up patch, keeping this one minimal. Reviewed By: carlo.bertolli Differential Revision: https://reviews.llvm.org/D110845	2021-09-30 21:29:37 +01:00
Dhruva Chakrabarti	6226270253	[libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed. Fixes: SWDEV-275232 (With contributions from Ammar Elwazir, Laurent Morichetti, and Tony Tye) The current code is racy. After the packet is submitted, the GPU will increment the read index. If this wraps around before the memory is read from it'll refer to a signal from an unrelated packet. Change avoids reading from the packet post-submission. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110679	2021-09-29 09:22:07 -07:00
Jon Chesterfield	2bc4d48a78	[libomptarget][amdgpu] Follow on to D110513, empty kernarg pools are not fatal	2021-09-27 22:44:35 +01:00
Jon Chesterfield	738734f655	[libomptarget][amdgpu] Report zero devices if plugin construction fails, instead of segv	2021-09-27 22:13:12 +01:00
Jon Chesterfield	80fa43fe9a	Revert "[openmp] Add addrspacecast to getOrCreateIdent" This reverts commit `1a761e5b7b`. Failed CI, albeit with a different failure mode to BZ51982	2021-09-27 19:27:35 +01:00
Jon Chesterfield	1a761e5b7b	[openmp] Add addrspacecast to getOrCreateIdent Fixes 51982. Minor refactor to remove `return x = y` construct. Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\ blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting parts while checking the assertion failure still occurred. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110556	2021-09-27 19:23:12 +01:00
@vladaindjic	5357a98c82	[OpenMP] libomp: Usage of TASK_TIED constant inside kmp_gsupport.cpp The minor code refactorization introduces the TASK_TIED constant inside kmp_gsupprot.cpp as a replacement for the literal value 1. The mentioned constant is now used in both kmp_tasking.cpp and kmp_gsupport.cpp files. Differential Revision: https://reviews.llvm.org/D110441	2021-09-27 19:45:56 +03:00
Joseph Huber	74d622dea4	[OpenMP] Add new worksharing definitions into device RTL This path defines the newly added `__kmpc_disitrute_static_init` functions in the device runtime library. These functions are currently exact copies of the current worksharing method but can be tuned later. Depends on D110429 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110430	2021-09-27 11:36:41 -04:00
Pushpinder Singh	b1695c2eb8	[AMDGPU][OpenMP] Add memory pool size check to isValidMemoryPool Keeping all the checks in one place for future simplification. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110513	2021-09-27 12:29:00 +00:00
Michael Kruse	1b242dccff	[OpenMP][CMake] Use in-project clang as CUDA->IR compiler for new DeviceRTL. Use the in-project clang, llvm-link and opt if available and unless CMake cache variables specify to use a different compiler. This applies D101265 to the new DeviceRTL's CMakeLists.txt which was copied before D101265 was applied. Fixes the openmp-offloading-cuda-runtime builder which was failing since D110006. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110251	2021-09-27 07:14:19 -05:00
Pushpinder Singh	9d0eb440ff	[libomptarget][nfc][amdgpu] Reorder function to clarify review diff	2021-09-27 09:30:55 +00:00
Jon Chesterfield	726a34f063	[libomptarget][amdgpu] Replace dead exit call with returning error	2021-09-27 09:43:37 +01:00
Vignesh Balu	62fddd5ff5	[OpenMP][OMPD] Implementation of OMPD debugging library - libompd. This is a continuation of the review: https://reviews.llvm.org/D100182 This patch implements the OMPD API as specified in the standard doc. Reviewed By: @hbae Differential Revision: https://reviews.llvm.org/D100183	2021-09-27 12:32:31 +05:30
Jon Chesterfield	8cf93a35d4	[libomptarget][amdgpu] Destruct HSA queues Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D109511	2021-09-26 15:34:21 +01:00
Joseph Huber	d83ca624a1	[OpenMP] Fix data-race in new device RTL This patch fixes a data-race observed when using the new device runtime library. The Internal control variable for the parallel level is read in the `__kmpc_parallel_51` function while it could potentially be written by other threads. This causes data corruption and will cause nondetermistic behaviour in the runtime. This patch fixes this by adding an explicit synchronization before the region starts. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110366	2021-09-23 17:28:07 -04:00
Shilei Tian	423d34f74a	[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit` This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110279	2021-09-22 17:16:41 -04:00
Joseph Huber	60a40cf379	[OpenMP] Fix KeepAlive usage Summary: Functions were called the wrong way around, this didn't keep the symbol alive.	2021-09-22 14:38:19 -04:00
Joseph Huber	277b681ede	[OpenMP] Add function tracing debugging to device RTL This patch adds support for an RAII struct that will print function traces when placed inside of a function declaration. Each successive call will increase the indentation to make it easier to visually inspect. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110202	2021-09-22 12:25:29 -04:00
Shilei Tian	ca999f7191	[OpenMP][Offloading] Use bitset to indicate execution mode instead of value The execution mode of a kernel is stored in a global variable, whose value means: - 0 - SPMD mode - 1 - indicates generic mode - 2 - SPMD mode execution with generic mode semantics We are going to add support for SIMD execution mode. It will be come with another execution mode, such as SIMD-generic mode. As a result, this value-based indicator is not flexible. This patch changes to bitset based solution to encode execution mode. Each position is: [0] - generic mode [1] - SPMD mode [2] - SIMD mode (will be added later) In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode execution with generic mode semantics. In the future after we add the support for SIMD mode, `0b1xx` will be in SIMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110029	2021-09-22 11:40:52 -04:00
Joseph Huber	1cf86df883	[OpenMP] Make sure the Thread ID function is not removed Summary: The thread ID function was reintroduced in D110195, but could potentially be removed by the optimizer. Make the function noinline to preserve the call sites and add it to the externalization RAII so its definition is not removed by the attributor.	2021-09-22 10:13:18 -04:00
Joseph Huber	e95731cca7	[OpenMP] Add thread ID function into new RTL The new device runtime library currently lacks the `kmpc_get_hardware_thread_id_in_block` function which is currently used when doing the SPMDzation optimization. This call would be introduced through the optimization and then cause a linking error because it was not present. This patch adds support for this runtime call. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110195	2021-09-21 17:43:50 -04:00
Giorgis Georgakoudis	ac90dfc43a	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `1d66649adf`. Revert to fix AMG GPU issue.	2021-09-21 13:20:39 -07:00
Usman Nadeem	248342b7c7	[OpenMP][OMPD] Fix compile error when OMPD is not supported Differential Revision: https://reviews.llvm.org/D110120 Change-Id: I9d39dacfab5b7fbab37ee4b4d960d51e0892b24d	2021-09-21 12:45:15 -07:00
Giorgis Georgakoudis	1d66649adf	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6 Differential Revision: https://reviews.llvm.org/D102107	2021-09-21 10:50:04 -07:00
Shilei Tian	49e976c934	[OpenMP][NVPTX] Fix a warning that data argument not used by format string Reviewed By: jhuber6, grokos Differential Revision: https://reviews.llvm.org/D110104	2021-09-20 17:22:14 -04:00
Peyton, Jonathan L	1e45cd75df	[OpenMP][host runtime] Fix indirect lock table race condition The indirect lock table can exhibit a race condition during initializing and setting/unsetting locks. This occurs if the lock table is resized by one thread (during an omp_init_lock) and accessed (during an omp_set\|unset_lock) by another thread. The test runtime/test/lock/omp_init_lock.c test exposed this issue and will fail if run enough times. This patch restructures the lock table so pointer/iterator validity is always kept. Instead of reallocating a single table to a larger size, the lock table begins preallocated to accommodate 8K locks. Each row of the table is allocated as needed with each row allowing 1K locks. If the 8K limit is reached for the initial table, then another table, capable of holding double the number of locks, is allocated and linked as the next table. The indices stored in the user's locks take this linked structure into account when finding the lock within the table. Differential Revision: https://reviews.llvm.org/D109725	2021-09-20 13:01:58 -05:00
Joseph Huber	f1c821fa85	[OpenMP] Add support for dynamic shared memory in new RTL This patch adds support for using dynamic shared memory in the new device runtime. The new function `__kmpc_get_dynamic_shared` will return a pointer to the buffer of dynamic shared memory. Currently the amount of memory allocated is set by an environment variable. In the future this amount will be added to the amount used for the smart stack which will be configured in a similar way. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D110006	2021-09-17 21:25:36 -04:00
Joseph Huber	ec02c34b6d	[OpenMP] Add additional fields to device environment This patch adds fields for the device number and number of devices into the device environment struct and debugging values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110004	2021-09-17 21:25:32 -04:00
Joseph Huber	b266bcb135	[OpenMP] Implement __assert_fail in the new device runtime This patch implements the `__assert_fail` function in the new device runtime. This allows users and developers to use the standars assert function inside of the device. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D109886	2021-09-17 21:25:28 -04:00
Shilei Tian	81a1a91c62	[NFC] clang-format -i /openmp/libomptarget/deviceRTLs/interface.h	2021-09-17 12:55:02 -04:00
AndreyChurbanov	59b877d001	[OpenMP] NFC: add type casts to silence gcc warnings	2021-09-17 19:49:40 +03:00
AndreyChurbanov	7f1a6d891e	[OpenMP] libomp: Update third-party sources of ittnotify client code. The third-party ittnotify sources updated from https://github.com/intel/ittapi. Changes applied: - llvm license aded to all files; initial BSD license saved in LICENSE.txt; - clang-formatted; - renamed .c to .cpp, similar to what we did with all our sources; - added #include "kmp_config.h" with definition of INTEL_ITTNOTIFY_PREFIX macro into ittnotify_static.cpp. Differential Revision: https://reviews.llvm.org/D109333	2021-09-17 19:38:34 +03:00
Hansang Bae	ae2a5facce	[OpenMP][libomptarget] Minor fix in x86_64 plugin Call to remove() was passing invalid address for the file name. Differential Revision: https://reviews.llvm.org/D109846	2021-09-15 15:57:06 -05:00
Peyton, Jonathan L	258e27aae1	[OpenMP] Add support for GOMP depobj GOMP depobjs are represented as a two intptr_t array. The first element is the base address of the dependency and the second element is the flag indicating the type the depobj represents. Differential Revision: https://reviews.llvm.org/D108790	2021-09-15 12:47:08 -05:00
Vignesh Balasubramanian	939154125b	[OpenMP] [OMPD] OPENMP_INSTALL_LIBDIR is set for the install dir OPENMP_INSTALL_LIBDIR is set to the installation path of shared and static libompd.This should avoid the mixing of 32 and 64 bit on same path in multi-lib set-up. Reviewed By: @mceier Differential Revision: https://reviews.llvm.org/D109352	2021-09-13 10:25:50 +05:30
Joseph Huber	7eb899cbcd	[OpenMP] Add more verbose remarks for runtime folding We peform runtime folding, but do not currently emit remarks when it is performed. This is because it comes from the runtime library and is beyond the users control. However, people may still wish to view this and similar information easily, so we can enable this behaviour using a special flag to enable verbose remarks. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109627	2021-09-10 17:36:06 -04:00
Ye Luo	2187cbf56f	[OpenMP][libomptarget] Add __tgt_target_return_t enum for __tgt_target_XXX return int The defintion of OFFLOAD_SUCCESS and OFFLOAD_FAIL used in plugin APIs and libomptarget public APIs are not consistent. Create __tgt_target_return_t for libomptarget public APIs. Differential Revision: https://reviews.llvm.org/D109304	2021-09-10 16:11:08 -05:00
Jon Chesterfield	f244af5c9f	[openmp][amdgpu] Update SupportAndFAQ docs	2021-09-10 18:35:29 +01:00
Johannes Doerfert	9f844aeeb4	[OpenMP][Docs] Remove old/outdated webpage This should have happened a long time ago, now that openmp.llvm.org redirects to openmp.llvm.org/docs we completely switched over to the sphinx documentation page instead. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D108588	2021-09-10 12:11:05 -05:00
Jon Chesterfield	6760234e8d	[libomptarget][amdgpu] Precisely manage hsa lifetime The hsa library must be initialized before any calls into it and destructed after the last call into it. There have been a number of bugs in this area related to member variables which would like to use raii to manage resources acquired from hsa. This patch moves the init/shutdown of hsa into a class, such that when used as the first member variable (could be a base), the lifetime of other member variables are reliably scoped within it. This will allow other classes to use raii reliably when used as member variables within the global. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D109512	2021-09-09 17:28:11 +01:00
Jon Chesterfield	2a581710c1	[openmp] No longer use LIBRARY_PATH to find devicertl Given D109057, change test runner to use the libomptarget-x-bc-path argument instead of the LIBRARY_PATH environment variable to find the device library. Also drop the use of LIBRARY_PATH environment variable as it is far too easy to pull in the device library from an unrelated toolchain by accident with the current setup. No loss in flexibility to developers as the clang commandline used here is still available. Reviewed By: jdoerfert, tianshilei1992 Differential Revision: https://reviews.llvm.org/D109061	2021-09-09 17:16:41 +01:00
Jon Chesterfield	d642156f8f	[libomptarget][nfc] Hoist hsa_init into rtl.cpp	2021-09-09 16:09:34 +01:00
Hansang Bae	3976035d68	[OpenMP] Fix line truncation in omp_lib.h Fixed code that exceeds 72-column. Differential Revision: https://reviews.llvm.org/D109469	2021-09-09 09:33:45 -05:00
AndreyChurbanov	d40108e0af	[OpenMP] libomp: runtime part of omp_all_memory task dependence implementation. New omp_all_memory task dependence type is implemented. Library recognizes the new type via either (dependence_address == NULL && dependence_flag == 0x80) or (dependence_address == SIZE_MAX). A task with new dependence type depends on each preceding task with any dependence type (kind of a dependence barrier). Differential Revision: https://reviews.llvm.org/D108574	2021-09-08 16:55:32 +03:00
Ye Luo	2cfe1a09d1	[OpenMP][libomptarget][NFC] Change checkDeviceAndCtors return type to bool. What is exactly needed is only a boolean. Pulling OFFLOAD_SUCCESS/FAIL only adds confusion. Differential Revision: https://reviews.llvm.org/D109303	2021-09-07 13:59:27 -05:00
Hansang Bae	224f51d879	[OpenMP] Add interface for 5.1 scope construct The new interface only marks begin/end of a scope construct for corresponding OMPT events, and we can use existing interfaces for reduction operations. Differential Revision: https://reviews.llvm.org/D108062	2021-09-07 11:22:21 -05:00
Nawrin Sultana	c24da72fa4	[OpenMP] Change monotonicity of dynamic schedule This patch changes the default monotonicity of dynamic schedule from monotonic to non-monotonic when no modifier is specified. Differential Revision: https://reviews.llvm.org/D109026	2021-09-07 08:18:46 -05:00
Ye Luo	c3aecf87d5	[OpenMP][libomptarget] Change device vector elements to unique_ptr type Using std::vector<DeviceTy> requires implementing copy constructor and copied assign operator for DeviceTy. Indeed DeviceTy should never be copied. After changing to std::vector<std::unique_ptr<DeviceTy>>, All the unsafe copy constructor and copy assign operator implementations can be removed. Compilers mark them deleted due to mutex or underlying objects and this is the desired behavior. Differential Revision: https://reviews.llvm.org/D109276	2021-09-06 22:28:49 -05:00
Ye Luo	8e5c1b039e	[OpenMP][libomptarget] Change synchronize_ty return type to int32_t Plugins always return int32_t. Stay consistent with other functions which return error status. Differential Revision: https://reviews.llvm.org/D109341	2021-09-06 21:38:54 -05:00
Ron Lieberman	fdac5adee6	[openmp] NFC add bitcode comment	2021-09-02 18:21:39 -05:00
Jon Chesterfield	201e466eba	[libomptarget][amdgpu] Add gfx90a to build list	2021-09-02 18:11:02 +01:00
Jon Chesterfield	3153bdd547	[libomptarget][amdgpu] Drop env variables Use the same debug print as the rest of libomptarget plugins with the same environment control. Also drop the max queue size debugging hook as I don't believe it is still in use, can bring it back near the rest of the env handling in rtl.cpp if someone objects. That makes most of rt.h and all of utils.cpp unused. Clean that up and simplify control flow in a couple of places. Behaviour change is that debug prints that used to use the old environment variable now use the new one and print in slightly different format, and the removal of the max queue size variable. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D108784	2021-09-02 11:02:39 +01:00
Ye Luo	289a1089cd	[libomptarget] Move HostDataToTargetTy states into StatesTy Use unique_ptr to achieve the effect of mutable. Remove mutable keyword of DynRefCount and HoldRefCount Remove std::shared_ptr from UpdateMtx Reviewed By: tianshilei1992, grokos Differential Revision: https://reviews.llvm.org/D109007	2021-09-01 23:36:05 -05:00

1 2 3 4 5 ...

2043 Commits