forked from OSchip/llvm-project
[libomptarget][amdgpu] Fix truncation error for partial wavefront
[libomptarget][amdgpu] Fix truncation error for partial wavefront The partial barrier implementation involves one wavefront resetting and N-1 waiting. This change future proofs against launching with a number of threads that is not a multiple of the wavefront size. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102407
This commit is contained in:
parent
b049870d3b
commit
10de217209
|
@ -56,7 +56,7 @@ static void pteam_mem_barrier(uint32_t num_threads, uint32_t * barrier_state)
|
|||
{
|
||||
__atomic_thread_fence(__ATOMIC_ACQUIRE);
|
||||
|
||||
uint32_t num_waves = num_threads / WARPSIZE;
|
||||
uint32_t num_waves = (num_threads + WARPSIZE - 1) / WARPSIZE;
|
||||
|
||||
// Partial barrier implementation for amdgcn.
|
||||
// Uses two 16 bit unsigned counters. One for the number of waves to have
|
||||
|
|
Loading…
Reference in New Issue