llvm-project

Commit Graph

Author	SHA1	Message	Date
Michał Górny	2b0d95fb58	[openmp] [test] Add missing <limits> include to capacity_nthreads Differential Revision: https://reviews.llvm.org/D105474	2021-07-06 20:39:53 +02:00
Jon Chesterfield	ddfb074a80	[libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global [libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global Folds some duplicates logic into a helper function, passes the new environment struct into getLaunchVals which no longer reads the DeviceInfo global. Implemented on top of D105237 Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D105239	2021-07-06 17:06:38 +01:00
Atmn Patel	21e92612c0	[Libomptarget] Experimental Remote Plugin Fixes D97883 introduced a compile-time error in the experimental remote offloading libomptarget plugin, this patch fixes it and resolves a number of inconsistencies in the plugin as well: 1. Non-functional Asynchronous API 2. Unnecessarily verbose debug printing 3. Misc. code clean ups This is not intended to make any functional changes to the plugin. Differential Revision: https://reviews.llvm.org/D105325	2021-07-02 12:38:34 -04:00
Hansang Bae	f1b9ce2736	[OpenMP] Fix a few issues with hidden helper task This patch includes the following changes to address a few issues when using hidden helper task. - Assertion is triggered when there are inadvertent calls to hidden helper functions on non-Linux OS - Added deinit code in __kmp_internal_end_library function to fix random shutdown crashes - Moved task data access into the lock-guarded region in __kmp_push_task Differential Revision: https://reviews.llvm.org/D105308	2021-07-01 17:10:32 -05:00
Shilei Tian	369216ab31	[OpenMP][Offloading] Refined return value of `DeviceTy::getOrAllocTgtPtr` `DeviceTy::getOrAllocTgtPtr` just returns a target pointer. In addition, two bool values (`IsNew` and `IsHostPtr`) are passed by reference to make the change in the function available in callee. In this patch, a struct, which contains the target pointer, two flags, and an iterator to the map table entry corresponding to the queried host pointer, will be returned. In addition to make the logic clearer regarding the two bool values, this paves the way for the next patch to fix the data race in `bug49334.cpp` by attaching an event to the map table entry (and that's why we need the iterator). Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104382	2021-07-01 12:32:03 -04:00
Jon Chesterfield	db89414da4	[libomptarget][nfc] Move grid size computation Change getLaunchVals to return the integers used for launch Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D105237	2021-07-01 12:53:04 +01:00
Dhruva Chakrabarti	98c36f0079	Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size" This reverts commit `2240b41ee4`. A value of 0 for KernDescVal WG_Size implies it is unknown, so it should be set to the default. The above change was made without this assumption. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105250	2021-06-30 17:15:00 -07:00
Jon Chesterfield	4b0926b044	[libomptarget][nfc] Replace out arguments with struct return A step towards making this function adequately self contained that it can be tested easily. No functional change intended here, left variable names unchanged. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D105229	2021-06-30 22:40:07 +01:00
Jon Chesterfield	d86b0073cf	[libomptarget][amdgpu][nfc] Fix build warnings, drop some headers Removes stdarg header, drops uses of iostream, fix some format string errors. Also changes a C style struct to C++ style to avoid a warning from clang/ Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D104923	2021-06-30 22:23:36 +01:00
Shilei Tian	24a36ce58b	[OpenMP][Offloading] Replace all calls to `isSPMDMode` with `__kmpc_is_spmd_exec_mode` In our ongoing work, we are using `AbstractAttributor` to deduct execution model of device functions, and potententially remove unnecessary function calls to `__kmpc_is_spmd_exec_mode`. In current device runtime, we have mixed use of `isSPMDMode` and `__kmpc_is_spmd_exec_mode`, but in fact in `__kmpc_is_spmd_exec_mode` it simply calls `isSPMDMode`. Since all functions starting with `__kmpc` is C function, which doesn't have things like name mangling. It is more optimization friendly. In this patch, we simply replaced all calls to `isSPMDMode` with `__kmpc_is_spmd_exec_mode` to pave the way for the optimization. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105211	2021-06-30 15:39:57 -04:00
Dhruva Chakrabarti	e0b713a035	[libomptarget] [amdgpu] Change default number of teams per computation unit This patch is related to https://reviews.llvm.org/D98832. Based on discussions there, I decided to separate out the teams default as this patch. This change is to increase the number of teams per computation unit so as to provide more wavefronts for hiding latency. This change improves performance for some programs, including 20-50% for some Stream benchmarks. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D99003	2021-06-29 15:34:35 -07:00
Dhruva Chakrabarti	2240b41ee4	[libomptarget] [amdgpu] Fix default setting of max flat workgroup size When max flat workgroup size is not specified, it is set to the default workgroup size. This prevents kernel launch with a workgroup size larger than the default. The fix is to ignore a size of 0 and treat it as unspecified. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105073	2021-06-29 13:47:24 -07:00
Johannes Doerfert	4eb90e893f	Revert "[OpenMP] Add Two-level Distributed Barrier" This reverts commit `25073a4ecf`. This breaks non-x86 OpenMP builds for a while now. Until a solution is ready to be upstreamed we revert the feature and unblock those builds. See: https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821 and https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821 The currently proposed fix (D104788) seems not to be ready yet: https://reviews.llvm.org/D104788#2841928	2021-06-29 09:38:27 -05:00
Johannes Doerfert	bc8bb3df35	Revert "[omp] Fix build without ITT after D103121 changes" This reverts commit `eab1fd389b`. This commit fixed a problem with `25073a4ecf` (D103121) which is the one we actually need to revert to unblock non-X86 builds of OpenMP. Can be reapplied, or merged into, D103121 as it goes in again.	2021-06-29 09:38:27 -05:00
Joseph Huber	2190c48fde	[OpenMP][Documentation] Add FAQ entry for CMake module This patch adds documentation for using the CMake find module for OpenMP target offloading provided by LLVM. It also removes the requirement for AMD's architecture to be set as this isn't necessary for upstream LLVM. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105051	2021-06-28 17:05:07 -04:00
Joseph Huber	c9f3240c9d	[OpenMP][Documentation] Add OpenMPOpt optimization section Add some information about the optimizations currently provided by OpenMPOpt. Every optimization performed should eventually be listed here. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105050	2021-06-28 17:05:03 -04:00
Pushpinder Singh	20df2c7052	[AMDGPU][Libomptarget] Collect allocatable memory pools using HSA The logic is almost similar to that of system.cpp with one change that instead of adding all the memory pools to a device struct it only keeps a single pool. The existing approach also always allocated memory on the first HSA pool found for a GPU. This depends on D104691. The goal of this series of patches is to remove _atl_machine global. The next patch will drop g_atl_machine entirely. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104695	2021-06-28 11:28:04 +00:00
Jon Chesterfield	f66b8fdc0a	[libomptarget][amdgpu] Build openmp for two more targets [libomptarget][amdgpu] Build openmp for two more targets The 4800U APU is a gfx902 and the MI100 accelerator is a gfx908. Both numbers are listed in ROCT topology.c Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D104922	2021-06-25 19:02:03 +01:00
Jon Chesterfield	96f6873dff	[OpenMP][NFC] Drop unused headers from amdgpu plugin	2021-06-25 12:08:56 +01:00
AndreyChurbanov	b2787945f9	[OpenMP][NFC] libomp: fix wrong debug assertion. Normalized bounds of chunk of iterations to steal from are inclusive, so upper bound should not be decremented in expression to check. Problem was in attempt to steal iterations 0:0, that caused assertion after wrong decrement. Reported in comment to https://reviews.llvm.org/D103648. Differential Revision: https://reviews.llvm.org/D104880	2021-06-25 02:02:14 +03:00
Aakanksha Patil	3453f3dd46	[AMDGPU] Add gfx1035 target Differential Revision: https://reviews.llvm.org/D104804	2021-06-24 14:32:41 -04:00
Joel E. Denny	9fa5e3280d	[OpenMP] Fix delete map type in ref count debug messages For example, without this patch: ``` $ cat test.c int main() { int x; #pragma omp target enter data map(alloc: x) #pragma omp target enter data map(alloc: x) #pragma omp target enter data map(alloc: x) #pragma omp target exit data map(delete: x) ; return 0; } $ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c $ LIBOMPTARGET_DEBUG=1 ./a.out \|& grep 'Creating\\|Mapping exists\\|last' Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=1, Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (incremented), Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=3 (incremented), Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (decremented) Libomptarget --> There are 4 bytes allocated at target address 0x00000000013bb040 - is not last ``` `RefCount` is reported as decremented to 2, but it ought to be reset because of the `delete` map type, and `is not last` is incorrect. This patch migrates the reset of reference counts from `DeviceTy::deallocTgtPtr` to `DeviceTy::getTgtPtrBegin`, which then correctly reports the reset. Based on the `IsLast` result from `DeviceTy::getTgtPtrBegin`, `targetDataEnd` then correctly reports `is last` for any deletion. `DeviceTy::deallocTgtPtr` is responsible only for the final reference count decrement and mapping removal. An obscure side effect of this patch is that a `delete` map type when the reference count is infinite yields `DelEntry=IsLast=false` in `targetDataEnd` and so no longer results in a `DeviceTy::deallocTgtPtr` call. Without this patch, that call is a no-op anyway besides some unnecessary locking and mapping table lookups. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104560	2021-06-23 09:57:19 -04:00
Joel E. Denny	48421ac441	[OpenMP] Improve ref count debug messages For example, without this patch: ``` $ cat test.c int main() { int x; #pragma omp target enter data map(alloc: x) #pragma omp target exit data map(release: x) ; return 0; } $ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c $ LIBOMPTARGET_DEBUG=1 ./a.out \|& grep 'Creating\\|Mapping exists' Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, updated RefCount=1 ``` There are two problems in this example: * `RefCount` is not reported when a mapping is created, but it might be 1 or infinite. In this case, because it's created by `omp target enter data`, it's 1. Seeing that would make later `RefCount` messages easier to understand. * `RefCount` is still 1 at the `omp target exit data`, but it's reported as `updated`. The reason it's still 1 is that, upon deletions, the reference count is generally not updated in `DeviceTy::getTgtPtrBegin`, where the report is produced. Instead, it's zeroed later in `DeviceTy::deallocTgtPtr`, where it's actually removed from the mapping table. This patch makes the following changes: * Report the reference count when creating a mapping. * Where an existing mapping is reported, always report a reference count action: * `update suppressed` when `UpdateRefCount=false` * `incremented` * `decremented` * `deferred final decrement`, which replaces the misleading `updated` in the above example * Add comments to `DeviceTy::getTgtPtrBegin` to explain why it does not zero the reference count. (Please advise if these comments miss the point.) * For unified shared memory, don't report confusing messages like `RefCount=` or `RefCount= updated` given that reference counts are irrelevant in this case. Instead, just report `for unified shared memory`. * Use `INFO` not `DP` consistently for `Mapping exists` messages. * Fix device table dumps to print `INF` instead of `-1` for an infinite reference count. Reviewed By: jhuber6, grokos Differential Revision: https://reviews.llvm.org/D104559	2021-06-23 09:57:19 -04:00
Joseph Huber	72d4cd627c	[OpenMP] Introduce an CMake find module for OpenMP Target support This introduces a CMake find module for detecting target offloading support in a compiler. The goal is to make it easier to incorporate target offloading into a cmake project. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D104710	2021-06-22 23:01:38 -04:00
Joseph Huber	422adaa879	[OpenMP] Add thread limit environment variable support to plugins The OpenMP 5.1 standard defines the environment variable `OMP_TEAMS_THREAD_LIMIT` to limit the number of threads that will be run in a single block. This patch adds support for this into the AMDGPU and CUDA plugins. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103923	2021-06-22 16:25:40 -04:00
Shilei Tian	0029059074	[NFC][OpenMP][Offloading] Unified the construction of mapping table entry This patch unifies construction of mapping table entry to use `emplace`. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104580	2021-06-22 12:38:47 -04:00
Joseph Huber	244e98ff48	[Libomptarget] Improve device runtime implementation for globalized variables. Currently the runtime implementation of `__kmpc_alloc_shared` is extremely slow because it allocated memory for each thread individually. This patch adds a small buffer for the threads to share data and will greatly improve performance for builds where all globalization could not be optimized out. If the shared buffer is full, then memory will not only be allocated per-warp rather than per-thread. Depends on D97680 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104666	2021-06-22 11:52:49 -04:00
Joseph Huber	952a0f2385	[Libomptarget] Introduce new globalization runtime calls Summary: This patch introduces the new globalization runtime to be used by D97680. These runtime calls will replace the __kmpc_data_sharing_push_stack and __kmpc_data_sharing_pop_stack functions. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102532	2021-06-22 10:05:42 -04:00
AndreyChurbanov	5dd4d0d46f	[OpenMP] libomp: fix dynamic loop dispatcher Restructured dynamic loop dispatcher code. Fixed use of dispatch buffers for nonmonotonic dynamic (static_steal) schedule: - eliminated possibility of stealing iterations of the wrong loop when victim thread changed its buffer to work on another loop; - fixed race when victim thread changed its buffer to work in nested parallel; - eliminated "static" property of the schedule, that is now a single thread can execute whole loop. Differential Revision: https://reviews.llvm.org/D103648	2021-06-22 16:29:01 +03:00
Pushpinder Singh	9d110f9159	[AMDGPU][Libomptarget] Move allow_access_to_all_gpu_agents to rtl.cpp Moving this method helps eliminate a use of g_atl_machine. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104691	2021-06-22 11:44:52 +00:00
Vladislav Vinogradov	eab1fd389b	[omp] Fix build without ITT after D103121 changes Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D104638	2021-06-21 18:17:52 +03:00
Vyacheslav Zakharin	aad9e48c5f	[NFC][libomptarget] Remove redundant libelf dependency for elf_common. Differential Revision: https://reviews.llvm.org/D104549	2021-06-21 07:19:55 -07:00
Pushpinder Singh	7a97cd9da7	[AMDGPU][Libomptarget] Remove redundant functions There does not seem to be any use of these functions. They just put the value to a local which is never used again. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104512	2021-06-21 06:13:24 +00:00
Shilei Tian	ec97866454	[OpenMP] Make bug49334.cpp more reproducible `bug49334.cpp` cannot detect data race in `libomptarget` efficiently. It is reported that with `N = 256` and `BS = 16`, the data race can be reproduced more steadily. The next coming pathces will fix it so this patch is expected to fail now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104552	2021-06-18 18:35:41 -04:00
Asher Mancinelli	5c189d30e6	[OpenMP] Update FAQ for enabling cuda offloading Add an FAQ entry and add a few lines to an existing one. Document the use of `GCC_INSTALL_PREFIX` for pointing clang to correct GCC installation for two-stage build. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D104474	2021-06-18 11:55:45 -06:00
Vyacheslav Zakharin	836992ab9a	[NFC][libomptarget] Build elf_common with PIC. Differential Revision: https://reviews.llvm.org/D104545	2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin	c5b7c7c8f7	[NFC][libomptarget] Fixed -DLLVM_ENABLE_RUNTIMES="openmp" build. Differential Revision: https://reviews.llvm.org/D104535	2021-06-18 09:20:10 -07:00
Terry Wilmarth	25073a4ecf	[OpenMP] Add Two-level Distributed Barrier Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper barrier. This barrier is designed to handle fine granularity parallelism where barriers are used frequently with little compute and memory access between barriers. There is no need to use it for codes with few barriers and large granularity compute, or memory intensive applications, as little difference will be seen between this barrier and the default hyper barrier. This barrier is designed to work optimally with a fixed number of threads, and has a significant setup time, so should NOT be used in situations where the number of threads in a team is varied frequently. The two-level distributed barrier is off by default -- hyper barrier is used by default. To use this barrier, you must set all barrier patterns to use this type, because it will not work with other barrier patterns. Thus, to turn it on, the following settings are required: KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER, and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed barrier. Differential Revision: https://reviews.llvm.org/D103121	2021-06-16 15:34:55 -05:00
Vyacheslav Zakharin	b5c4fc0f23	[NFC][libomptarget] Reduce the dependency on libelf This change-set removes libelf usage from elf_common part of the plugins. libelf is still used in x86_64 generic plugin code and in some plugins (e.g. amdgpu) - these will have to be cleaned up in separate checkins. Differential Revision: https://reviews.llvm.org/D103545	2021-06-16 08:34:23 -07:00
AndreyChurbanov	610fea65e2	[OpenMP] libomp: fixed implementation of OMP 5.1 inoutset task dependence type Refactored code of dependence processing and added new inoutset dependence type. Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps. All dependence flags library gets so far and corresponding dependence types: 1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET. Differential Revision: https://reviews.llvm.org/D97085	2021-06-16 14:47:29 +03:00
Joachim Protze	d2a7871b5e	[OpenMP][NFC] Add back suppression of warning Commit `cff215565e` did not fix all unused variables in different builds, so adding back the suppression for now.	2021-06-16 10:14:59 +02:00
Joachim Protze	cff215565e	[OpenMP] Remove unused variables from libomp code Several variables were left unused as a result of different patches removing their use. Two variables have some use: `poll_count` is used by the KMP_BLOCKING macro only under certain conditions. Adding (void) to tell the compiler to ignore the unused variable. `padding` is a dummy stack allocation with no intent to be used. Also adding (void) to make the compiler ignore the unused variable. Differential Revision: https://reviews.llvm.org/D104303	2021-06-16 09:33:46 +02:00
Peyton, Jonathan L	56da28240f	[OpenMP] Add GOMP 5.0 version symbols to API * Add GOMP versioned pause functions * Add GOMP versioned affinity format functions To do the affinity format functions, only attach versioned symbols to the APPEND Fortran entries (e.g., omp_set_affinity_format_) since GOMP only exports two symbols (one for Fortran, one for C). Our affinity format functions have three symbols. e.g., with omp_set_affinity_format: 1) omp_set_affinity_format (Fortran interface) 2) omp_set_affinity_format_ (Fortran interface) 3) ompc_set_affinity_format (C interface) Have the GOMP version of the C symbol alias the ompc_* 3) version instead of the Fortran unappended version 1). Differential Revision: https://reviews.llvm.org/D103647	2021-06-15 16:25:00 -05:00
Peyton, Jonathan L	92baf414db	[OpenMP] Fix affinity determine capable algorithm on Linux Remove strange checks for syscall() arguments where mask is NULL. Valgrind reports these as error usages for the syscall. Instead, just check if CACHE_LINE bytes is long enough. If not, then search for the size. Also, by limiting the first size detection attempt to CACHE_LINE bytes, instead of 1MB, we don't use more than one cache line for the mask size. Before this patch, sometimes the returned mask size was 640 bytes (10 cache lines) because the initial call to getaffinity() was limited only by the internal kernel mask size which can be very large. Differential Revision: https://reviews.llvm.org/D103637	2021-06-15 16:21:30 -05:00
Peyton, Jonathan L	0ddde4d865	[OpenMP] Lazily assign root affinity Lazily set affinity for root threads. Previously, the root thread executing middle initialization would attempt to assign affinity to other existing root threads. This was not working properly as the set_system_affinity() function wasn't setting the affinity for the target thread. Instead, the middle init thread was resetting the its own affinity using the target thread's affinity mask. Differential Revision: https://reviews.llvm.org/D103625	2021-06-15 16:21:06 -05:00
Pushpinder Singh	cadcaf3f46	[AMDGPU][Libomptarget] Drop dead code related to g_atl_machine This patch includes some changes which deletes the code accessing g_atl_machine global. Some accesses related to memory_pools are still remaining. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103813	2021-06-15 05:21:35 +00:00
Ron Lieberman	91f147792e	[libomptarget][amdgpu] Remove stray fprintf in rtl.cpp remove unintended fprintf in rtl.cpp Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D104003	2021-06-10 01:57:30 +00:00
AndreyChurbanov	9ce2e5e700	Revert "[OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type" This reverts commit `a1f550e052`. Revert in order to fix backwards compatibility breakage caused by type size change for task dependence flag.	2021-06-09 17:38:38 +03:00
Joachim Protze	639b397931	[OpenMP][Tools] Fix Archer handling of task dependencies The current handling of dependencies in Archer has two flaws: - annotation of dependency synchronization is not limited to sibling tasks - annotation of in/out dependencies is based on the assumption, that dependency variables will rarely be byte-sized variables. This patch introduces a map in the generating task to manage the dependency variables for the child tasks. The map is only accesses from the generating task, so no locking is necessary. This also limits the dependency-based synchronization to sibling tasks. This patch also introduces proper handling for new dependency types such as mutexinoutset and inoutset. Differential Revision: https://reviews.llvm.org/D103608	2021-06-09 13:36:20 +02:00
Joachim Protze	08d8f1a958	[OpenMP][Tools] Cleanup memory pool used in Archer The main motivation for reusing objects is that it helps to avoid creating and leaking synchronization clocks in TSan. The reused object will reuse the synchronization clock in TSan. Before, new and delete operators were overloaded to get and return memory for the object from/to the object pool. This patch replaces the operator overloading with explicit static New/Delete functions. Objects for parallel regions and implicit tasks will always be recruited and returned to the thread-local object pool. Only for explicit task, there is a chance that an other thread completes the task and will free the object. This patch optimizes the thread-local New/Delete calls by avoiding locks and only lock if the pool is empty. Remote threads return the object into a separate queue. The chunk size for allocations is now decided based on page size. The objects will also be aligned to cache lines avoiding false sharing. This is the first patch in a series to provide better tasking support. Differential Revision: https://reviews.llvm.org/D103606	2021-06-09 13:36:19 +02:00

1 2 3 4 5 ...

1752 Commits