llvm-project/clang/test/CodeGenCUDA/builtins-amdgcn.cu

// RUN: %clang_cc1 -triple amdgcn -fcuda-is-device -emit-llvm %s -o - | FileCheck %s
#include "Inputs/cuda.h"

// CHECK-LABEL: @_Z16use_dispatch_ptrPi(
// CHECK: %2 = call i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()
// CHECK: %3 = addrspacecast i8 addrspace(4)* %2 to i8 addrspace(4)**
__global__ void use_dispatch_ptr(int* out) {
  const int* dispatch_ptr = (const int*)__builtin_amdgcn_dispatch_ptr();
  *out = *dispatch_ptr;
}

// CHECK-LABEL: @_Z12test_ds_fmaxf(
// CHECK: call float @llvm.amdgcn.ds.fmax(float addrspace(3)* @_ZZ12test_ds_fmaxfE6shared, float %2, i32 0, i32 0, i1 false)
__global__
void test_ds_fmax(float src) {
  __shared__ float shared;
  volatile float x = __builtin_amdgcn_ds_fmaxf(&shared, src, 0, 0, false);
}
Try to make builtin address space declarations not useless The way address space declarations for builtins currently work is nearly useless. The code assumes the address spaces used for builtins is a confusingly named "target address space" from user code using __attribute__((address_space(N))) that matches the builtin declaration. There's no way to use this to declare a builtin that returns a language specific address space. The terminology used is highly cofusing since it has nothing to do with the the address space selected by the target to use for a language address space. This feature is essentially unused as-is. AMDGPU and NVPTX are the only in-tree targets attempting to use this. The AMDGPU builtins certainly do not behave as intended (i.e. all of the builtins returning pointers can never compile because the numbered address space never matches the expected named address space). The NVPTX builtins are missing tests for some, and the others seem to rely on an implicit addrspacecast. Change the used address space for builtins based on a target hook to allow using a language address space for a builtin. This allows the same builtin declaration to be used for multiple languages with similarly purposed address spaces (e.g. the same AMDGPU builtin can be used in OpenCL and CUDA even though the constant address spaces are arbitarily different). This breaks the possibility of using arbitrary numbered address spaces alongside the named address spaces for builtins. If this is an issue we probably need to introduce another builtin declaration character to distinguish language address spaces from so-called "target address spaces". llvm-svn: 338707 2018-08-02 20:14:28 +08:00			`// RUN: %clang_cc1 -triple amdgcn -fcuda-is-device -emit-llvm %s -o - \| FileCheck %s`
			`#include "Inputs/cuda.h"`

			`// CHECK-LABEL: @_Z16use_dispatch_ptrPi(`
			`// CHECK: %2 = call i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()`
			`// CHECK: %3 = addrspacecast i8 addrspace(4)* %2 to i8 addrspace(4)**`
			`__global__ void use_dispatch_ptr(int* out) {`
			`const int* dispatch_ptr = (const int*)__builtin_amdgcn_dispatch_ptr();`
			`out = dispatch_ptr;`
			`}`

			`// CHECK-LABEL: @_Z12test_ds_fmaxf(`
			`// CHECK: call float @llvm.amdgcn.ds.fmax(float addrspace(3)* @_ZZ12test_ds_fmaxfE6shared, float %2, i32 0, i32 0, i1 false)`
			`__global__`
			`void test_ds_fmax(float src) {`
			`__shared__ float shared;`
			`volatile float x = __builtin_amdgcn_ds_fmaxf(&shared, src, 0, 0, false);`
			`}`