2021-10-07 10:57:08 +08:00
|
|
|
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
// REQUIRES: amdgpu-registered-target
|
|
|
|
// RUN: %clang_cc1 %s -x hip -fcuda-is-device -emit-llvm -O0 -o - \
|
|
|
|
// RUN: -triple=amdgcn-amd-amdhsa | opt -S | FileCheck %s
|
|
|
|
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z29test_non_volatile_parameter32Pj(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[PTR_ADDR:%.*]] = alloca i32*, align 8, addrspace(5)
|
|
|
|
// CHECK-NEXT: [[PTR_ADDR_ASCAST:%.*]] = addrspacecast i32* addrspace(5)* [[PTR_ADDR]] to i32**
|
|
|
|
// CHECK-NEXT: [[RES:%.*]] = alloca i32, align 4, addrspace(5)
|
|
|
|
// CHECK-NEXT: [[RES_ASCAST:%.*]] = addrspacecast i32 addrspace(5)* [[RES]] to i32*
|
|
|
|
// CHECK-NEXT: store i32* [[PTR:%.*]], i32** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i32*, i32** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = load i32*, i32** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* [[TMP1]], align 4
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.amdgcn.atomic.inc.i32.p0i32(i32* [[TMP0]], i32 [[TMP2]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP3]], i32* [[RES_ASCAST]], align 4
|
|
|
|
// CHECK-NEXT: [[TMP4:%.*]] = load i32*, i32** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP5:%.*]] = load i32*, i32** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP6:%.*]] = load i32, i32* [[TMP5]], align 4
|
|
|
|
// CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32* [[TMP4]], i32 [[TMP6]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP7]], i32* [[RES_ASCAST]], align 4
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__attribute__((device)) void test_non_volatile_parameter32(__UINT32_TYPE__ *ptr) {
|
|
|
|
__UINT32_TYPE__ res;
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
res = __builtin_amdgcn_atomic_inc32(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
|
|
|
|
res = __builtin_amdgcn_atomic_dec32(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
}
|
|
|
|
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z29test_non_volatile_parameter64Py(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[PTR_ADDR:%.*]] = alloca i64*, align 8, addrspace(5)
|
|
|
|
// CHECK-NEXT: [[PTR_ADDR_ASCAST:%.*]] = addrspacecast i64* addrspace(5)* [[PTR_ADDR]] to i64**
|
|
|
|
// CHECK-NEXT: [[RES:%.*]] = alloca i64, align 8, addrspace(5)
|
|
|
|
// CHECK-NEXT: [[RES_ASCAST:%.*]] = addrspacecast i64 addrspace(5)* [[RES]] to i64*
|
|
|
|
// CHECK-NEXT: store i64* [[PTR:%.*]], i64** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i64*, i64** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = load i64*, i64** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i64, i64* [[TMP1]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.amdgcn.atomic.inc.i64.p0i64(i64* [[TMP0]], i64 [[TMP2]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP3]], i64* [[RES_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP4:%.*]] = load i64*, i64** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP5:%.*]] = load i64*, i64** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP6:%.*]] = load i64, i64* [[TMP5]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64* [[TMP4]], i64 [[TMP6]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP7]], i64* [[RES_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__attribute__((device)) void test_non_volatile_parameter64(__UINT64_TYPE__ *ptr) {
|
|
|
|
__UINT64_TYPE__ res;
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
res = __builtin_amdgcn_atomic_inc64(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
|
|
|
|
res = __builtin_amdgcn_atomic_dec64(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
}
|
|
|
|
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z25test_volatile_parameter32PVj(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[PTR_ADDR:%.*]] = alloca i32*, align 8, addrspace(5)
|
|
|
|
// CHECK-NEXT: [[PTR_ADDR_ASCAST:%.*]] = addrspacecast i32* addrspace(5)* [[PTR_ADDR]] to i32**
|
|
|
|
// CHECK-NEXT: [[RES:%.*]] = alloca i32, align 4, addrspace(5)
|
|
|
|
// CHECK-NEXT: [[RES_ASCAST:%.*]] = addrspacecast i32 addrspace(5)* [[RES]] to i32*
|
|
|
|
// CHECK-NEXT: store i32* [[PTR:%.*]], i32** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i32*, i32** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = load i32*, i32** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load volatile i32, i32* [[TMP1]], align 4
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.amdgcn.atomic.inc.i32.p0i32(i32* [[TMP0]], i32 [[TMP2]], i32 7, i32 2, i1 true)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP3]], i32* [[RES_ASCAST]], align 4
|
|
|
|
// CHECK-NEXT: [[TMP4:%.*]] = load i32*, i32** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP5:%.*]] = load i32*, i32** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP6:%.*]] = load volatile i32, i32* [[TMP5]], align 4
|
|
|
|
// CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32* [[TMP4]], i32 [[TMP6]], i32 7, i32 2, i1 true)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP7]], i32* [[RES_ASCAST]], align 4
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__attribute__((device)) void test_volatile_parameter32(volatile __UINT32_TYPE__ *ptr) {
|
|
|
|
__UINT32_TYPE__ res;
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
res = __builtin_amdgcn_atomic_inc32(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
|
|
|
|
res = __builtin_amdgcn_atomic_dec32(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
}
|
|
|
|
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z25test_volatile_parameter64PVy(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[PTR_ADDR:%.*]] = alloca i64*, align 8, addrspace(5)
|
|
|
|
// CHECK-NEXT: [[PTR_ADDR_ASCAST:%.*]] = addrspacecast i64* addrspace(5)* [[PTR_ADDR]] to i64**
|
|
|
|
// CHECK-NEXT: [[RES:%.*]] = alloca i64, align 8, addrspace(5)
|
|
|
|
// CHECK-NEXT: [[RES_ASCAST:%.*]] = addrspacecast i64 addrspace(5)* [[RES]] to i64*
|
|
|
|
// CHECK-NEXT: store i64* [[PTR:%.*]], i64** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i64*, i64** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = load i64*, i64** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load volatile i64, i64* [[TMP1]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.amdgcn.atomic.inc.i64.p0i64(i64* [[TMP0]], i64 [[TMP2]], i32 7, i32 2, i1 true)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP3]], i64* [[RES_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP4:%.*]] = load i64*, i64** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP5:%.*]] = load i64*, i64** [[PTR_ADDR_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP6:%.*]] = load volatile i64, i64* [[TMP5]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64* [[TMP4]], i64 [[TMP6]], i32 7, i32 2, i1 true)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP7]], i64* [[RES_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__attribute__((device)) void test_volatile_parameter64(volatile __UINT64_TYPE__ *ptr) {
|
|
|
|
__UINT64_TYPE__ res;
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
res = __builtin_amdgcn_atomic_inc64(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
|
|
|
|
res = __builtin_amdgcn_atomic_dec64(ptr, *ptr, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
}
|
|
|
|
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z13test_shared32v(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* addrspacecast (i32 addrspace(3)* @_ZZ13test_shared32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.atomic.inc.i32.p0i32(i32* addrspacecast (i32 addrspace(3)* @_ZZ13test_shared32vE3val to i32*), i32 [[TMP0]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP1]], i32* addrspacecast (i32 addrspace(3)* @_ZZ13test_shared32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* addrspacecast (i32 addrspace(3)* @_ZZ13test_shared32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32* addrspacecast (i32 addrspace(3)* @_ZZ13test_shared32vE3val to i32*), i32 [[TMP2]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP3]], i32* addrspacecast (i32 addrspace(3)* @_ZZ13test_shared32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
__attribute__((device)) void test_shared32() {
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__attribute__((shared)) __UINT32_TYPE__ val;
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_inc32(&val, val, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec32(&val, val, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
}
|
|
|
|
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z13test_shared64v(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i64, i64* addrspacecast (i64 addrspace(3)* @_ZZ13test_shared64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.amdgcn.atomic.inc.i64.p0i64(i64* addrspacecast (i64 addrspace(3)* @_ZZ13test_shared64vE3val to i64*), i64 [[TMP0]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP1]], i64* addrspacecast (i64 addrspace(3)* @_ZZ13test_shared64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i64, i64* addrspacecast (i64 addrspace(3)* @_ZZ13test_shared64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64* addrspacecast (i64 addrspace(3)* @_ZZ13test_shared64vE3val to i64*), i64 [[TMP2]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP3]], i64* addrspacecast (i64 addrspace(3)* @_ZZ13test_shared64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
__attribute__((device)) void test_shared64() {
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__attribute__((shared)) __UINT64_TYPE__ val;
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_inc64(&val, val, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec64(&val, val, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
}
|
|
|
|
|
2020-11-11 21:39:00 +08:00
|
|
|
__attribute__((device)) __UINT32_TYPE__ global_val32;
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z13test_global32v(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* addrspacecast (i32 addrspace(1)* @global_val32 to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.atomic.inc.i32.p0i32(i32* addrspacecast (i32 addrspace(1)* @global_val32 to i32*), i32 [[TMP0]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP1]], i32* addrspacecast (i32 addrspace(1)* @global_val32 to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* addrspacecast (i32 addrspace(1)* @global_val32 to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32* addrspacecast (i32 addrspace(1)* @global_val32 to i32*), i32 [[TMP2]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP3]], i32* addrspacecast (i32 addrspace(1)* @global_val32 to i32*), align 4
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
__attribute__((device)) void test_global32() {
|
|
|
|
global_val32 = __builtin_amdgcn_atomic_inc32(&global_val32, global_val32, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
|
|
|
|
global_val32 = __builtin_amdgcn_atomic_dec32(&global_val32, global_val32, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
}
|
|
|
|
|
2020-11-11 21:39:00 +08:00
|
|
|
__attribute__((device)) __UINT64_TYPE__ global_val64;
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z13test_global64v(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i64, i64* addrspacecast (i64 addrspace(1)* @global_val64 to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.amdgcn.atomic.inc.i64.p0i64(i64* addrspacecast (i64 addrspace(1)* @global_val64 to i64*), i64 [[TMP0]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP1]], i64* addrspacecast (i64 addrspace(1)* @global_val64 to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i64, i64* addrspacecast (i64 addrspace(1)* @global_val64 to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64* addrspacecast (i64 addrspace(1)* @global_val64 to i64*), i64 [[TMP2]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP3]], i64* addrspacecast (i64 addrspace(1)* @global_val64 to i64*), align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
__attribute__((device)) void test_global64() {
|
|
|
|
global_val64 = __builtin_amdgcn_atomic_inc64(&global_val64, global_val64, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
|
|
|
|
global_val64 = __builtin_amdgcn_atomic_dec64(&global_val64, global_val64, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
}
|
|
|
|
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__attribute__((constant)) __UINT32_TYPE__ cval32;
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z15test_constant32v(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[LOCAL_VAL:%.*]] = alloca i32, align 4, addrspace(5)
|
|
|
|
// CHECK-NEXT: [[LOCAL_VAL_ASCAST:%.*]] = addrspacecast i32 addrspace(5)* [[LOCAL_VAL]] to i32*
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* addrspacecast (i32 addrspace(4)* @cval32 to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.atomic.inc.i32.p0i32(i32* addrspacecast (i32 addrspace(4)* @cval32 to i32*), i32 [[TMP0]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP1]], i32* [[LOCAL_VAL_ASCAST]], align 4
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* addrspacecast (i32 addrspace(4)* @cval32 to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32* addrspacecast (i32 addrspace(4)* @cval32 to i32*), i32 [[TMP2]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP3]], i32* [[LOCAL_VAL_ASCAST]], align 4
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
__attribute__((device)) void test_constant32() {
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__UINT32_TYPE__ local_val;
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
|
|
|
|
local_val = __builtin_amdgcn_atomic_inc32(&cval32, cval32, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
|
|
|
|
local_val = __builtin_amdgcn_atomic_dec32(&cval32, cval32, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
}
|
|
|
|
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__attribute__((constant)) __UINT64_TYPE__ cval64;
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z15test_constant64v(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[LOCAL_VAL:%.*]] = alloca i64, align 8, addrspace(5)
|
|
|
|
// CHECK-NEXT: [[LOCAL_VAL_ASCAST:%.*]] = addrspacecast i64 addrspace(5)* [[LOCAL_VAL]] to i64*
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i64, i64* addrspacecast (i64 addrspace(4)* @cval64 to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.amdgcn.atomic.inc.i64.p0i64(i64* addrspacecast (i64 addrspace(4)* @cval64 to i64*), i64 [[TMP0]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP1]], i64* [[LOCAL_VAL_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i64, i64* addrspacecast (i64 addrspace(4)* @cval64 to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64* addrspacecast (i64 addrspace(4)* @cval64 to i64*), i64 [[TMP2]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP3]], i64* [[LOCAL_VAL_ASCAST]], align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
__attribute__((device)) void test_constant64() {
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__UINT64_TYPE__ local_val;
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
|
|
|
|
local_val = __builtin_amdgcn_atomic_inc64(&cval64, cval64, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
|
|
|
|
local_val = __builtin_amdgcn_atomic_dec64(&cval64, cval64, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
}
|
|
|
|
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z12test_order32v(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.atomic.inc.i32.p0i32(i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), i32 [[TMP0]], i32 2, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP1]], i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.amdgcn.atomic.inc.i32.p0i32(i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), i32 [[TMP2]], i32 4, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP3]], i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP4:%.*]] = load i32, i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.atomic.inc.i32.p0i32(i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), i32 [[TMP4]], i32 4, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP5]], i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP6:%.*]] = load i32, i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), i32 [[TMP6]], i32 5, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP7]], i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP8:%.*]] = load i32, i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), i32 [[TMP8]], i32 6, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP9]], i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP10:%.*]] = load i32, i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), i32 [[TMP10]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP11]], i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_order32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
__attribute__((device)) void test_order32() {
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__attribute__((shared)) __UINT32_TYPE__ val;
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
|
2021-04-09 05:44:29 +08:00
|
|
|
val = __builtin_amdgcn_atomic_inc32(&val, val, __ATOMIC_RELAXED, "workgroup");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_inc32(&val, val, __ATOMIC_CONSUME, "workgroup");
|
|
|
|
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
val = __builtin_amdgcn_atomic_inc32(&val, val, __ATOMIC_ACQUIRE, "workgroup");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec32(&val, val, __ATOMIC_RELEASE, "workgroup");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec32(&val, val, __ATOMIC_ACQ_REL, "workgroup");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec32(&val, val, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
}
|
|
|
|
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z12test_order64v(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i64, i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.amdgcn.atomic.inc.i64.p0i64(i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), i64 [[TMP0]], i32 2, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP1]], i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i64, i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), i64 [[TMP2]], i32 4, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP3]], i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP4:%.*]] = load i64, i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.amdgcn.atomic.inc.i64.p0i64(i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), i64 [[TMP4]], i32 4, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP5]], i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP6:%.*]] = load i64, i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), i64 [[TMP6]], i32 5, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP7]], i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP8:%.*]] = load i64, i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP9:%.*]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), i64 [[TMP8]], i32 6, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP9]], i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP10:%.*]] = load i64, i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP11:%.*]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), i64 [[TMP10]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP11]], i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_order64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
__attribute__((device)) void test_order64() {
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__attribute__((shared)) __UINT64_TYPE__ val;
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
|
2021-04-09 05:44:29 +08:00
|
|
|
val = __builtin_amdgcn_atomic_inc64(&val, val, __ATOMIC_RELAXED, "workgroup");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec64(&val, val, __ATOMIC_CONSUME, "workgroup");
|
|
|
|
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
val = __builtin_amdgcn_atomic_inc64(&val, val, __ATOMIC_ACQUIRE, "workgroup");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec64(&val, val, __ATOMIC_RELEASE, "workgroup");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec64(&val, val, __ATOMIC_ACQ_REL, "workgroup");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec64(&val, val, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
}
|
|
|
|
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z12test_scope32v(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i32, i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_scope32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.atomic.inc.i32.p0i32(i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_scope32vE3val to i32*), i32 [[TMP0]], i32 7, i32 1, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP1]], i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_scope32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i32, i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_scope32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_scope32vE3val to i32*), i32 [[TMP2]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP3]], i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_scope32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP4:%.*]] = load i32, i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_scope32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_scope32vE3val to i32*), i32 [[TMP4]], i32 7, i32 3, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP5]], i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_scope32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP6:%.*]] = load i32, i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_scope32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.amdgcn.atomic.dec.i32.p0i32(i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_scope32vE3val to i32*), i32 [[TMP6]], i32 7, i32 4, i1 false)
|
|
|
|
// CHECK-NEXT: store i32 [[TMP7]], i32* addrspacecast (i32 addrspace(3)* @_ZZ12test_scope32vE3val to i32*), align 4
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
__attribute__((device)) void test_scope32() {
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__attribute__((shared)) __UINT32_TYPE__ val;
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_inc32(&val, val, __ATOMIC_SEQ_CST, "");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec32(&val, val, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec32(&val, val, __ATOMIC_SEQ_CST, "agent");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec32(&val, val, __ATOMIC_SEQ_CST, "wavefront");
|
|
|
|
}
|
|
|
|
|
2021-10-07 10:57:08 +08:00
|
|
|
// CHECK-LABEL: @_Z12test_scope64v(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i64, i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_scope64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.amdgcn.atomic.inc.i64.p0i64(i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_scope64vE3val to i64*), i64 [[TMP0]], i32 7, i32 1, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP1]], i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_scope64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i64, i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_scope64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_scope64vE3val to i64*), i64 [[TMP2]], i32 7, i32 2, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP3]], i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_scope64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP4:%.*]] = load i64, i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_scope64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_scope64vE3val to i64*), i64 [[TMP4]], i32 7, i32 3, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP5]], i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_scope64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP6:%.*]] = load i64, i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_scope64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.amdgcn.atomic.dec.i64.p0i64(i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_scope64vE3val to i64*), i64 [[TMP6]], i32 7, i32 4, i1 false)
|
|
|
|
// CHECK-NEXT: store i64 [[TMP7]], i64* addrspacecast (i64 addrspace(3)* @_ZZ12test_scope64vE3val to i64*), align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
__attribute__((device)) void test_scope64() {
|
[AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values
builtin_amdgcn_atomic_inc32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(uint *Ptr, uint Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(uint64_t *Ptr, uint64_t Val, unsigned MemoryOrdering, const char *SyncScope)
As AMDGCN IR instrinsic for atomic inc/dec does unsigned comparison,
these clang builtins should also take unsigned types instead of signed
int types.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83121
2020-07-07 14:13:43 +08:00
|
|
|
__attribute__((shared)) __UINT64_TYPE__ val;
|
[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
2020-05-29 22:16:07 +08:00
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_inc64(&val, val, __ATOMIC_SEQ_CST, "");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec64(&val, val, __ATOMIC_SEQ_CST, "workgroup");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec64(&val, val, __ATOMIC_SEQ_CST, "agent");
|
|
|
|
|
|
|
|
val = __builtin_amdgcn_atomic_dec64(&val, val, __ATOMIC_SEQ_CST, "wavefront");
|
|
|
|
}
|