Commit Graph

14 Commits

Author SHA1 Message Date
Austin Kerbow 2db700215a [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic
Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If
there is a need to have highly optimized codegen and kernel developers have
knowledge of inter-wave runtime behavior which is unknown to the compiler this
builtin can be used to tune scheduling.

This intrinsic creates a barrier between scheduling regions. The immediate
parameter is a mask to determine the types of instructions that should be
prevented from crossing the sched_barrier. In this initial patch, there are only
two variations. A mask of 0 means that no instructions may be scheduled across
the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing
instructions may cross the sched_barrier.

Note that this intrinsic is only meant to work with the scheduling passes. Any
other transformations that may move code will not be impacted in the ways
described above.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D124700
2022-05-11 13:22:51 -07:00
Austin Kerbow 62bcfcb5a5 [AMDGPU] Add llvm.amdgcn.s.setprio intrinsic
Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D120976
2022-03-12 22:15:42 -08:00
Yaxun (Sam) Liu 25942d7c49 [AMDGPU] Allow relaxed/consume memory order for atomic inc/dec
Reviewed by: Jon Chesterfield

Differential Revision: https://reviews.llvm.org/D100144
2021-04-09 09:23:41 -04:00
Saiyedul Islam 0882c9d4fc [AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)

As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 06:36:25 +00:00
Saiyedul Islam 675cefbf60 [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)

First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.

Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert

Reviewed By: arsenm, sameerds

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D80804
2020-06-09 17:02:58 +00:00
Matt Arsenault 97f3f0bab0 AMDGPU: Add intrinsic for s_setreg
This will be more useful with fenv access implemented.
2020-05-28 14:26:38 -04:00
Saiyedul Islam 06bdffb2bb [AMDGPU] Expose llvm fence instruction as clang intrinsic
Expose llvm fence instruction as clang builtin for AMDGPU target

__builtin_amdgcn_fence(unsigned int memoryOrdering, const char *syncScope)

The first argument of this builtin is one of the memory-ordering specifiers
__ATOMIC_ACQUIRE, __ATOMIC_RELEASE, __ATOMIC_ACQ_REL, or __ATOMIC_SEQ_CST
following C++11 memory model semantics. This is mapped to corresponding
LLVM atomic memory ordering for the fence instruction using LLVM atomic C
ABI. The second argument is an AMDGPU-specific synchronization scope
defined as string.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D75917
2020-04-27 09:39:03 +05:30
Yaxun Liu aae1e87f4b AMDGPU: add __builtin_amdgcn_update_dpp
Emit llvm.amdgcn.update.dpp for both __builtin_amdgcn_mov_dpp and
__builtin_amdgcn_update_dpp. The first argument to
llvm.amdgcn.update.dpp will be undef for __builtin_amdgcn_mov_dpp.

Differential Revision: https://reviews.llvm.org/D52320

llvm-svn: 344665
2018-10-17 02:32:26 +00:00
Daniil Fukalov 1b14a3ad3d [AMDGPU] fixes for lds f32 builtins
1. added restrictions to memory scope, order and volatile parameters
2. added custom processing for these builtins - currently is not used code,
   needed to switch off GCCBuiltin link to the builtins (ongoing change to llvm
   tree)
3. builtins renamed as requested

Differential Revision: https://reviews.llvm.org/D43281

llvm-svn: 332848
2018-05-21 16:18:07 +00:00
Yaxun Liu 4d86799219 [AMDGPU] Add builtin functions readlane ds_permute mov_dpp
Differential Revision: https://reviews.llvm.org/D30551

llvm-svn: 297436
2017-03-10 01:30:46 +00:00
Jan Vesely 9488560bb8 AMDGPU: export s_sendmsg{halt} instrinsics
Differential Revision: https://reviews.llvm.org/D30366

llvm-svn: 296241
2017-02-25 04:20:24 +00:00
Jan Vesely d26dbb389f AMDGPU: export s_waitcnt builtin
Differential Revision: https://reviews.llvm.org/D30359

llvm-svn: 296239
2017-02-25 04:20:20 +00:00
Matt Arsenault 24b5ae4497 AMDGPU: Add builtin for getreg intrinsic
llvm-svn: 292636
2017-01-20 19:24:22 +00:00
Konstantin Zhuravlyov 81a78bb864 [AMDGPU] Add f16 builtin functions (VI+)
Differential Revision: https://reviews.llvm.org/D26476

llvm-svn: 286741
2016-11-13 02:37:05 +00:00