llvm-project/clang/test/CodeGenCUDA/convergent.cu

// REQUIRES: x86-registered-target
// REQUIRES: nvptx-registered-target

// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
// RUN:   -disable-llvm-passes -o - %s | FileCheck -allow-deprecated-dag-overlap -check-prefix DEVICE %s

// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
// RUN:   -disable-llvm-passes -o - %s | \
// RUN:  FileCheck -allow-deprecated-dag-overlap -check-prefix HOST %s

#include "Inputs/cuda.h"

// DEVICE: Function Attrs:
// DEVICE-SAME: convergent
// DEVICE-NEXT: define void @_Z3foov
__device__ void foo() {}

// HOST: Function Attrs:
// HOST-NOT: convergent
// HOST-NEXT: define void @_Z3barv
// DEVICE: Function Attrs:
// DEVICE-SAME: convergent
// DEVICE-NEXT: define void @_Z3barv
__host__ __device__ void baz();
__host__ __device__ void bar() {
  // DEVICE: call void @_Z3bazv() [[CALL_ATTR:#[0-9]+]]
  baz();
  // DEVICE: call i32 asm "trap;", "=l"() [[ASM_ATTR:#[0-9]+]]
  int x;
  asm ("trap;" : "=l"(x));
  // DEVICE: call void asm sideeffect "trap;", ""() [[ASM_ATTR:#[0-9]+]]
  asm volatile ("trap;");
}

// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
// DEVICE: attributes [[BAZ_ATTR]] = {
// DEVICE-SAME: convergent
// DEVICE-SAME: }
// DEVICE-DAG: attributes [[CALL_ATTR]] = { convergent
// DEVICE-DAG: attributes [[ASM_ATTR]] = { convergent

// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
// HOST: attributes [[BAZ_ATTR]] = {
// HOST-NOT: convergent
// NOST-SAME: }
[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent. Summary: This is important for e.g. the following case: void sync() { __syncthreads(); } void foo() { do_something(); sync(); do_something_else(): } Without this change, if the optimizer does not inline sync() (which it won't because __syncthreads is also marked as noduplicate, for now anyway), it is free to perform optimizations on sync() that it would not be able to perform on __syncthreads(), because sync() is not marked as convergent. Similarly, we need a notion of convergent calls, since in the case when we can't statically determine a call's target(s), we need to know whether it's safe to perform optimizations around the call. This change is conservative; the optimizer will remove these attrs where it can, see r260318, r260319. Reviewers: majnemer Subscribers: cfe-commits, jhen, echristo, tra Differential Revision: http://reviews.llvm.org/D17056 llvm-svn: 261779 2016-02-25 05:55:11 +08:00			`// REQUIRES: x86-registered-target`
			`// REQUIRES: nvptx-registered-target`

			`// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \`
[FileCheck] Add -allow-deprecated-dag-overlap to failing clang tests See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47172 llvm-svn: 336844 2018-07-12 04:26:20 +08:00			`// RUN: -disable-llvm-passes -o - %s \| FileCheck -allow-deprecated-dag-overlap -check-prefix DEVICE %s`
[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent. Summary: This is important for e.g. the following case: void sync() { __syncthreads(); } void foo() { do_something(); sync(); do_something_else(): } Without this change, if the optimizer does not inline sync() (which it won't because __syncthreads is also marked as noduplicate, for now anyway), it is free to perform optimizations on sync() that it would not be able to perform on __syncthreads(), because sync() is not marked as convergent. Similarly, we need a notion of convergent calls, since in the case when we can't statically determine a call's target(s), we need to know whether it's safe to perform optimizations around the call. This change is conservative; the optimizer will remove these attrs where it can, see r260318, r260319. Reviewers: majnemer Subscribers: cfe-commits, jhen, echristo, tra Differential Revision: http://reviews.llvm.org/D17056 llvm-svn: 261779 2016-02-25 05:55:11 +08:00
			`// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \`
			`// RUN: -disable-llvm-passes -o - %s \| \`
[FileCheck] Add -allow-deprecated-dag-overlap to failing clang tests See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47172 llvm-svn: 336844 2018-07-12 04:26:20 +08:00			`// RUN: FileCheck -allow-deprecated-dag-overlap -check-prefix HOST %s`
[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent. Summary: This is important for e.g. the following case: void sync() { __syncthreads(); } void foo() { do_something(); sync(); do_something_else(): } Without this change, if the optimizer does not inline sync() (which it won't because __syncthreads is also marked as noduplicate, for now anyway), it is free to perform optimizations on sync() that it would not be able to perform on __syncthreads(), because sync() is not marked as convergent. Similarly, we need a notion of convergent calls, since in the case when we can't statically determine a call's target(s), we need to know whether it's safe to perform optimizations around the call. This change is conservative; the optimizer will remove these attrs where it can, see r260318, r260319. Reviewers: majnemer Subscribers: cfe-commits, jhen, echristo, tra Differential Revision: http://reviews.llvm.org/D17056 llvm-svn: 261779 2016-02-25 05:55:11 +08:00
			`#include "Inputs/cuda.h"`

			`// DEVICE: Function Attrs:`
			`// DEVICE-SAME: convergent`
			`// DEVICE-NEXT: define void @_Z3foov`
			`__device__ void foo() {}`

			`// HOST: Function Attrs:`
			`// HOST-NOT: convergent`
			`// HOST-NEXT: define void @_Z3barv`
			`// DEVICE: Function Attrs:`
			`// DEVICE-SAME: convergent`
			`// DEVICE-NEXT: define void @_Z3barv`
			`__host__ __device__ void baz();`
			`__host__ __device__ void bar() {`
			`// DEVICE: call void @_Z3bazv() [[CALL_ATTR:#[0-9]+]]`
			`baz();`
[CUDA] Conservatively mark inline asm as convergent. Summary: This is particularly important because a some convergent CUDA intrinsics (e.g. __shfl_down) are implemented in terms of inline asm. Reviewers: tra Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D20836 llvm-svn: 271336 2016-06-01 05:27:13 +08:00			`// DEVICE: call i32 asm "trap;", "=l"() [[ASM_ATTR:#[0-9]+]]`
			`int x;`
			`asm ("trap;" : "=l"(x));`
			`// DEVICE: call void asm sideeffect "trap;", ""() [[ASM_ATTR:#[0-9]+]]`
			`asm volatile ("trap;");`
[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent. Summary: This is important for e.g. the following case: void sync() { __syncthreads(); } void foo() { do_something(); sync(); do_something_else(): } Without this change, if the optimizer does not inline sync() (which it won't because __syncthreads is also marked as noduplicate, for now anyway), it is free to perform optimizations on sync() that it would not be able to perform on __syncthreads(), because sync() is not marked as convergent. Similarly, we need a notion of convergent calls, since in the case when we can't statically determine a call's target(s), we need to know whether it's safe to perform optimizations around the call. This change is conservative; the optimizer will remove these attrs where it can, see r260318, r260319. Reviewers: majnemer Subscribers: cfe-commits, jhen, echristo, tra Differential Revision: http://reviews.llvm.org/D17056 llvm-svn: 261779 2016-02-25 05:55:11 +08:00			`}`

			`// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]`
			`// DEVICE: attributes [[BAZ_ATTR]] = {`
			`// DEVICE-SAME: convergent`
			`// DEVICE-SAME: }`
[CUDA] Mark device functions as nounwind. Summary: This prevents clang from emitting 'invoke's and catch statements. Things previously mostly worked thanks to TryToMarkNoThrow() in CodeGenFunction. But this is not a proper IPO, and it doesn't properly handle cases like mutual recursion. Fixes bug 30593. Reviewers: tra Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D25166 llvm-svn: 283272 2016-10-05 07:41:49 +08:00			`// DEVICE-DAG: attributes [[CALL_ATTR]] = { convergent`
			`// DEVICE-DAG: attributes [[ASM_ATTR]] = { convergent`
[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent. Summary: This is important for e.g. the following case: void sync() { __syncthreads(); } void foo() { do_something(); sync(); do_something_else(): } Without this change, if the optimizer does not inline sync() (which it won't because __syncthreads is also marked as noduplicate, for now anyway), it is free to perform optimizations on sync() that it would not be able to perform on __syncthreads(), because sync() is not marked as convergent. Similarly, we need a notion of convergent calls, since in the case when we can't statically determine a call's target(s), we need to know whether it's safe to perform optimizations around the call. This change is conservative; the optimizer will remove these attrs where it can, see r260318, r260319. Reviewers: majnemer Subscribers: cfe-commits, jhen, echristo, tra Differential Revision: http://reviews.llvm.org/D17056 llvm-svn: 261779 2016-02-25 05:55:11 +08:00
			`// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]`
			`// HOST: attributes [[BAZ_ATTR]] = {`
			`// HOST-NOT: convergent`
			`// NOST-SAME: }`