llvm-project

Commit Graph

Author	SHA1	Message	Date
Vladislav Vinogradov	eab1fd389b	[omp] Fix build without ITT after D103121 changes Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D104638	2021-06-21 18:17:52 +03:00
Vyacheslav Zakharin	aad9e48c5f	[NFC][libomptarget] Remove redundant libelf dependency for elf_common. Differential Revision: https://reviews.llvm.org/D104549	2021-06-21 07:19:55 -07:00
Pushpinder Singh	7a97cd9da7	[AMDGPU][Libomptarget] Remove redundant functions There does not seem to be any use of these functions. They just put the value to a local which is never used again. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104512	2021-06-21 06:13:24 +00:00
Shilei Tian	ec97866454	[OpenMP] Make bug49334.cpp more reproducible `bug49334.cpp` cannot detect data race in `libomptarget` efficiently. It is reported that with `N = 256` and `BS = 16`, the data race can be reproduced more steadily. The next coming pathces will fix it so this patch is expected to fail now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104552	2021-06-18 18:35:41 -04:00
Asher Mancinelli	5c189d30e6	[OpenMP] Update FAQ for enabling cuda offloading Add an FAQ entry and add a few lines to an existing one. Document the use of `GCC_INSTALL_PREFIX` for pointing clang to correct GCC installation for two-stage build. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D104474	2021-06-18 11:55:45 -06:00
Vyacheslav Zakharin	836992ab9a	[NFC][libomptarget] Build elf_common with PIC. Differential Revision: https://reviews.llvm.org/D104545	2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin	c5b7c7c8f7	[NFC][libomptarget] Fixed -DLLVM_ENABLE_RUNTIMES="openmp" build. Differential Revision: https://reviews.llvm.org/D104535	2021-06-18 09:20:10 -07:00
Terry Wilmarth	25073a4ecf	[OpenMP] Add Two-level Distributed Barrier Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper barrier. This barrier is designed to handle fine granularity parallelism where barriers are used frequently with little compute and memory access between barriers. There is no need to use it for codes with few barriers and large granularity compute, or memory intensive applications, as little difference will be seen between this barrier and the default hyper barrier. This barrier is designed to work optimally with a fixed number of threads, and has a significant setup time, so should NOT be used in situations where the number of threads in a team is varied frequently. The two-level distributed barrier is off by default -- hyper barrier is used by default. To use this barrier, you must set all barrier patterns to use this type, because it will not work with other barrier patterns. Thus, to turn it on, the following settings are required: KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER, and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed barrier. Differential Revision: https://reviews.llvm.org/D103121	2021-06-16 15:34:55 -05:00
Vyacheslav Zakharin	b5c4fc0f23	[NFC][libomptarget] Reduce the dependency on libelf This change-set removes libelf usage from elf_common part of the plugins. libelf is still used in x86_64 generic plugin code and in some plugins (e.g. amdgpu) - these will have to be cleaned up in separate checkins. Differential Revision: https://reviews.llvm.org/D103545	2021-06-16 08:34:23 -07:00
AndreyChurbanov	610fea65e2	[OpenMP] libomp: fixed implementation of OMP 5.1 inoutset task dependence type Refactored code of dependence processing and added new inoutset dependence type. Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps. All dependence flags library gets so far and corresponding dependence types: 1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET. Differential Revision: https://reviews.llvm.org/D97085	2021-06-16 14:47:29 +03:00
Joachim Protze	d2a7871b5e	[OpenMP][NFC] Add back suppression of warning Commit `cff215565e` did not fix all unused variables in different builds, so adding back the suppression for now.	2021-06-16 10:14:59 +02:00
Joachim Protze	cff215565e	[OpenMP] Remove unused variables from libomp code Several variables were left unused as a result of different patches removing their use. Two variables have some use: `poll_count` is used by the KMP_BLOCKING macro only under certain conditions. Adding (void) to tell the compiler to ignore the unused variable. `padding` is a dummy stack allocation with no intent to be used. Also adding (void) to make the compiler ignore the unused variable. Differential Revision: https://reviews.llvm.org/D104303	2021-06-16 09:33:46 +02:00
Peyton, Jonathan L	56da28240f	[OpenMP] Add GOMP 5.0 version symbols to API * Add GOMP versioned pause functions * Add GOMP versioned affinity format functions To do the affinity format functions, only attach versioned symbols to the APPEND Fortran entries (e.g., omp_set_affinity_format_) since GOMP only exports two symbols (one for Fortran, one for C). Our affinity format functions have three symbols. e.g., with omp_set_affinity_format: 1) omp_set_affinity_format (Fortran interface) 2) omp_set_affinity_format_ (Fortran interface) 3) ompc_set_affinity_format (C interface) Have the GOMP version of the C symbol alias the ompc_* 3) version instead of the Fortran unappended version 1). Differential Revision: https://reviews.llvm.org/D103647	2021-06-15 16:25:00 -05:00
Peyton, Jonathan L	92baf414db	[OpenMP] Fix affinity determine capable algorithm on Linux Remove strange checks for syscall() arguments where mask is NULL. Valgrind reports these as error usages for the syscall. Instead, just check if CACHE_LINE bytes is long enough. If not, then search for the size. Also, by limiting the first size detection attempt to CACHE_LINE bytes, instead of 1MB, we don't use more than one cache line for the mask size. Before this patch, sometimes the returned mask size was 640 bytes (10 cache lines) because the initial call to getaffinity() was limited only by the internal kernel mask size which can be very large. Differential Revision: https://reviews.llvm.org/D103637	2021-06-15 16:21:30 -05:00
Peyton, Jonathan L	0ddde4d865	[OpenMP] Lazily assign root affinity Lazily set affinity for root threads. Previously, the root thread executing middle initialization would attempt to assign affinity to other existing root threads. This was not working properly as the set_system_affinity() function wasn't setting the affinity for the target thread. Instead, the middle init thread was resetting the its own affinity using the target thread's affinity mask. Differential Revision: https://reviews.llvm.org/D103625	2021-06-15 16:21:06 -05:00
Pushpinder Singh	cadcaf3f46	[AMDGPU][Libomptarget] Drop dead code related to g_atl_machine This patch includes some changes which deletes the code accessing g_atl_machine global. Some accesses related to memory_pools are still remaining. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103813	2021-06-15 05:21:35 +00:00
Ron Lieberman	91f147792e	[libomptarget][amdgpu] Remove stray fprintf in rtl.cpp remove unintended fprintf in rtl.cpp Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D104003	2021-06-10 01:57:30 +00:00
AndreyChurbanov	9ce2e5e700	Revert "[OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type" This reverts commit `a1f550e052`. Revert in order to fix backwards compatibility breakage caused by type size change for task dependence flag.	2021-06-09 17:38:38 +03:00
Joachim Protze	639b397931	[OpenMP][Tools] Fix Archer handling of task dependencies The current handling of dependencies in Archer has two flaws: - annotation of dependency synchronization is not limited to sibling tasks - annotation of in/out dependencies is based on the assumption, that dependency variables will rarely be byte-sized variables. This patch introduces a map in the generating task to manage the dependency variables for the child tasks. The map is only accesses from the generating task, so no locking is necessary. This also limits the dependency-based synchronization to sibling tasks. This patch also introduces proper handling for new dependency types such as mutexinoutset and inoutset. Differential Revision: https://reviews.llvm.org/D103608	2021-06-09 13:36:20 +02:00
Joachim Protze	08d8f1a958	[OpenMP][Tools] Cleanup memory pool used in Archer The main motivation for reusing objects is that it helps to avoid creating and leaking synchronization clocks in TSan. The reused object will reuse the synchronization clock in TSan. Before, new and delete operators were overloaded to get and return memory for the object from/to the object pool. This patch replaces the operator overloading with explicit static New/Delete functions. Objects for parallel regions and implicit tasks will always be recruited and returned to the thread-local object pool. Only for explicit task, there is a chance that an other thread completes the task and will free the object. This patch optimizes the thread-local New/Delete calls by avoiding locks and only lock if the pool is empty. Remote threads return the object into a separate queue. The chunk size for allocations is now decided based on page size. The objects will also be aligned to cache lines avoiding false sharing. This is the first patch in a series to provide better tasking support. Differential Revision: https://reviews.llvm.org/D103606	2021-06-09 13:36:19 +02:00
Joachim Protze	82e4e50531	[OpenMP][Tools] Fix Archer for MACOS Archer uses weak symbol overloads of TSan functions to enable loading the tool even if the application is not built with TSan. For MACOS the tool collects the function pointer at runtime. When adding the function entry/exit markers, we missed to add the functions in the MACOS codepath. This patch also replaces the repeated function lookup by a single initial function lookup and fixes the disabling logic in RunningOnValgrind. Differential Revision: https://reviews.llvm.org/D103607	2021-06-09 13:36:19 +02:00
Brendon Cahoon	294efbbd3e	Reland "[AMDGPU] Add gfx1013 target" This reverts commit `211e584fa2`. Fixed a use-after-free error that caused the sanitizers to fail.	2021-06-08 21:15:35 -04:00
Joseph Huber	df965513a9	[OpenMP] Add an information flag for device data transfers This patch adds an information flag that indicated when data is being copied to and from the device. This will be helpful for finding redundant or unnecessary data transfers in applications. Reviewed By: jdoerfert, grokos Differential Revision: https://reviews.llvm.org/D103927	2021-06-08 20:23:27 -04:00
Brendon Cahoon	211e584fa2	Revert "[AMDGPU] Add gfx1013 target" This reverts commit `ea10a86984`. A sanitizer buildbot reports an error.	2021-06-08 16:29:41 -04:00
Brendon Cahoon	ea10a86984	[AMDGPU] Add gfx1013 target Differential Revision: https://reviews.llvm.org/D103663	2021-06-08 12:49:49 -04:00
Vignesh Balasubramanian	f61602b0d3	[OpenMP][OMPD] Implementation of OMPD debugging library - libompd. This is the first of seven patches that implements OMPD, a debugging interface to support debugging of OpenMP programs. It contains support code required in "openmp/runtime" for OMPD implementation. Reviewed By: @hbae Differential Revision: https://reviews.llvm.org/D100181	2021-06-08 16:44:22 +05:30
Peyton, Jonathan L	d70e1f1276	[OpenMP][runtime] add .clang-tidy file Use same checks as compiler-rt which removes checks for readability-* and llvm-header style. Differential Revision: https://reviews.llvm.org/D103711	2021-06-07 13:56:39 -05:00
AndreyChurbanov	a1f550e052	[OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type Refactored code of dependence processing and added new inoutset dependence type. Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps. Size of type of the dependence flag changed from 1 to 4 bytes in clang. All dependence flags library gets so far and corresponding dependence types: 1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET. Differential Revision: https://reviews.llvm.org/D97085	2021-06-07 21:42:51 +03:00
Bryan Chan	54f059c900	[OpenMP] Check loc for NULL before dereferencing it The ident_t * argument in __kmp_get_monotonicity was being used without a customary NULL check, causing the function to crash in a Debug build. Release builds were not affected thanks to dead store elimination.	2021-06-07 10:45:48 -04:00
Pushpinder Singh	4f8bc7caf4	[AMDGPU][Libomptarget] Remove atlc global This global struct used to hold various flags for monitoring the initialization of hsa. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103795	2021-06-07 11:09:01 +00:00
Pushpinder Singh	f5f329a371	[AMDGPU][Libomptarget] Rework logic for locating kernarg pools Previous logic was to always use the first kernarg pool found to allocate kernel args. This patch changes this to use only the kernarg pool which has non-zero size. This logic is also reworked to not use any globals. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103600	2021-06-07 06:41:37 +00:00
Terry Wilmarth	8ec9aa236e	[OpenMP] Add experimental nesting mode feature Nesting mode is a new experimental feature in the OpenMP runtime. It allows a user to set up nesting for an application in a way that corresponds to the hardware topology levels on the machine an application is being run on. For example, if a machine has 2 sockets, each with 12 cores, then use of nesting mode could set up an outer level of nesting that uses 2 threads per parallel region, and an inner level of nesting that uses 12 threads per parallel region. Nesting mode is controlled with the KMP_NESTING_MODE environment variable as follows: 1) KMP_NESTING_MODE = 0: Nesting mode is off (default); max-active-levels-var is set to 1 (the default -- nesting is off, nested parallel regions are serialized). 2) KMP_NESTING_MODE = 1: Nesting mode is on, and a number of threads will be assigned for each level discovered in the machine topology; max-active-levels-var is set to the number of levels discovered. 3) KMP_NESTING_MODE = n, n>1: [Note: this option is experimental and may change or be removed in the future.] Nesting mode is on, and a number of threads will be assigned for each topology level discovered on the machine, up to k<=n levels (since there may be fewer than n levels discovered in the topology), and beyond the kth level, nested parallel regions will be serialized; NOTE: max-active-levels-var is 1 (the default -- nesting is off, and nested parallel regions are serialized until the user changes max-active-levels-var. If the user sets OMP_NUM_THREADS or OMP_MAX_ACTIVE_LEVELS, they will override KMP_NESTING_MODE settings for the associated environment variables. The detected topology may be limited by an affinity mask setting on the initial thread, or if the user sets KMP_HW_SUBSET. See also: KMP_HOT_TEAMS_MAX_LEVEL for controlling use of hot teams for nested parallel regions. Note that this feature only sets numbers of threads used at nesting levels. The user should make use of OMP_PLACES and OMP_PROC_BIND or KMP_AFFINITY for affinitizing those threads, if desired. Differential Revision: https://reviews.llvm.org/D102188	2021-06-04 16:01:11 -05:00
Peyton, Jonathan L	56dd158c32	[OpenMP] fix spelling error in message-converter.pl	2021-06-04 11:20:32 -05:00
Peyton, Jonathan L	f7655f3df3	[OpenMP] Fix improper printf format specifier	2021-06-02 11:04:48 -05:00
Hansang Bae	7ba4e96ede	[OpenMP] Use new task type/flag for taskwait depend events. Differential Revision: https://reviews.llvm.org/D103464	2021-06-02 10:16:38 -05:00
Pushpinder Singh	b25546a4b4	[AMDGPU][Libomptarget][NFC] Remove bunch of dead structs Dropped structs are atmi_machine_t, atmi_device_t and atmi_memory_t Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103509	2021-06-02 10:40:51 +00:00
Pushpinder Singh	2368170a8d	[AMDGPU][Libomptarget][NFC] Remove atmi_place_t atmi_place_t has been replaced with int DeviceId. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103508	2021-06-02 10:35:28 +00:00
Peyton, Jonathan L	2020c981fa	[OpenMP] Add L2-Tile equivalence for KNL When on KNL and L2 or Tile layer is detected, manually add the corresponding layer which is equivalent. Differential Revision: https://reviews.llvm.org/D102865	2021-06-01 14:17:13 -05:00
Hansang Bae	cf5c94ef08	[OpenMP] Define named constants for interop's foreign runtime ID Also added missing Fortran definitions for interop support. Differential Revision: https://reviews.llvm.org/D102883	2021-06-01 13:06:59 -05:00
Pushpinder Singh	fb113264a8	[AMDGPU][Libomptarget] Remove g_atmi_machine global Turns out the only purpose of this class was verify if device ID was in range or not which could be done easily by using g_atl_machine. Still getting rid of g_atl_machine is pending which would be done in a later patch. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103443	2021-06-01 12:34:24 +00:00
Pushpinder Singh	4fc3286951	[AMDGPU][Libomptarget][NFC] Split host and device malloc This patch splits the code path for host and device malloc. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103389	2021-05-31 12:09:18 +00:00
Pushpinder Singh	8b79dfb302	[AMDGPU][Libomptarget][NFC] Remove atmi_mem_place_t This struct was used to specify the device on which memory was being allocated/free in atmi_malloc/free. It has now been replaced with int DeviceId. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103239	2021-05-27 11:53:18 +00:00
Jon Chesterfield	2fdf8bbd19	[libomptarget][nfc][amdgpu] Factor out setting upper bounds Refactor suggested in D103037 to help avoid similar copy-paste errors. Change is mechanical. Some parts of this would be more robust with unsigned. Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D103090	2021-05-26 19:57:49 +01:00
Jon Chesterfield	c5c1ec7945	[libomptarget][nfc][amdgpu] Refactor uses of KernelInfoTable Suggested in D103059. Use a single lookup instead of two, more const, less mutation. Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D103093	2021-05-26 19:25:25 +01:00
Jon Chesterfield	07f59baad6	[libomptarget][nfc][amdgpu] Remove atmi_status_t type ATMI_STATUS_UNKNOWN was unused, deleted references to it. Replaced ATMI_STATUS_{SUCCESS,ERROR} with HSA_STATUS_{SUCCESS,ERROR} Replaced atmi_status_t with hsa_status_t Otherwise no change. In particular, conversions between atmi_status_t and hsa_status_t will now be conversions between hsa_status_t and itself. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D103115	2021-05-26 17:02:19 +01:00
Pushpinder Singh	a2d6ef5876	[AMDGPU][Libomptarget] Inline atmi_init/atmi_finalize After D102847, these functions can be inlined. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103075	2021-05-26 10:50:08 +00:00
Pushpinder Singh	cc8661ac4a	[AMDGPU][Libomptarget] Delete g_atmi_initialized This patch drops g_atmi_initialized and inlines the Initialize & Finalize methods from Runtime class. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102847	2021-05-26 10:46:54 +00:00
Pushpinder Singh	7648b6978e	[AMDGPU][Libomptarget] Move Kernel/Symbol info tables to RTLDeviceInfoTy Two globals KernelInfoTable & SymbolInfoTable are moved into RTLDeviceInfoTy class. This builds on the top of D102691. [2/2] Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102692	2021-05-26 10:02:28 +00:00
Jon Chesterfield	df005fa364	[libomptarget][nfc] Move hostcall required test to rtl [libomptarget][nfc] Move hostcall required test to rtl Remove a global, fix minor race. First of N patches to bring up hostcall. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103058	2021-05-25 22:43:17 +01:00
Pushpinder Singh	b0d68c7141	[AMDGPU][Libomptarget] Mark lambda_by_value test as XFAIL Reason: Missing printf definition Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103078	2021-05-25 12:16:54 +00:00

1 2 3 4 5 ...

1722 Commits