[libomptarget][amdgpu] Fix truncation error for partial wavefront

[libomptarget][amdgpu] Fix truncation error for partial wavefront

The partial barrier implementation involves one wavefront resetting and N-1
waiting. This change future proofs against launching with a number of threads
that is not a multiple of the wavefront size.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102407
This commit is contained in:
Jon Chesterfield 2021-05-13 17:31:57 +01:00
parent b049870d3b
commit 10de217209
1 changed files with 1 additions and 1 deletions

View File

@ -56,7 +56,7 @@ static void pteam_mem_barrier(uint32_t num_threads, uint32_t * barrier_state)
{
__atomic_thread_fence(__ATOMIC_ACQUIRE);
uint32_t num_waves = num_threads / WARPSIZE;
uint32_t num_waves = (num_threads + WARPSIZE - 1) / WARPSIZE;
// Partial barrier implementation for amdgcn.
// Uses two 16 bit unsigned counters. One for the number of waves to have