llvm-project/openmp/libomptarget/deviceRTLs
Joseph Huber 244e98ff48 [Libomptarget] Improve device runtime implementation for globalized variables.
Currently the runtime implementation of `__kmpc_alloc_shared` is extremely slow because it allocated memory for each thread individually. This patch adds a small buffer for the threads to share data and will greatly improve performance for builds where all globalization could not be optimized out. If the shared buffer is full, then memory will not only be allocated per-warp rather than per-thread.

Depends on D97680

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104666
2021-06-22 11:52:49 -04:00
..
amdgcn [libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation 2021-05-21 16:09:22 +01:00
common [Libomptarget] Improve device runtime implementation for globalized variables. 2021-06-22 11:52:49 -04:00
nvptx [OpenMP][CMake] Use in-project clang as CUDA->IR compiler. 2021-04-30 12:45:52 -05:00
CMakeLists.txt [libomptarget] Enable AMDGPU devicertl 2021-04-24 02:24:44 +01:00
interface.h [Libomptarget] Improve device runtime implementation for globalized variables. 2021-06-22 11:52:49 -04:00
target_interface.h [libomptarget][nfc] Drop unused DEVICE macro 2021-03-15 20:12:50 +00:00