[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent.
Summary:
This is important for e.g. the following case:
void sync() { __syncthreads(); }
void foo() {
do_something();
sync();
do_something_else():
}
Without this change, if the optimizer does not inline sync() (which it
won't because __syncthreads is also marked as noduplicate, for now
anyway), it is free to perform optimizations on sync() that it would not
be able to perform on __syncthreads(), because sync() is not marked as
convergent.
Similarly, we need a notion of convergent calls, since in the case when
we can't statically determine a call's target(s), we need to know
whether it's safe to perform optimizations around the call.
This change is conservative; the optimizer will remove these attrs where
it can, see r260318, r260319.
Reviewers: majnemer
Subscribers: cfe-commits, jhen, echristo, tra
Differential Revision: http://reviews.llvm.org/D17056
llvm-svn: 261779
2016-02-25 05:55:11 +08:00
|
|
|
// REQUIRES: x86-registered-target
|
|
|
|
// REQUIRES: nvptx-registered-target
|
|
|
|
|
|
|
|
// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
|
2018-07-12 04:26:20 +08:00
|
|
|
// RUN: -disable-llvm-passes -o - %s | FileCheck -allow-deprecated-dag-overlap -check-prefix DEVICE %s
|
[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent.
Summary:
This is important for e.g. the following case:
void sync() { __syncthreads(); }
void foo() {
do_something();
sync();
do_something_else():
}
Without this change, if the optimizer does not inline sync() (which it
won't because __syncthreads is also marked as noduplicate, for now
anyway), it is free to perform optimizations on sync() that it would not
be able to perform on __syncthreads(), because sync() is not marked as
convergent.
Similarly, we need a notion of convergent calls, since in the case when
we can't statically determine a call's target(s), we need to know
whether it's safe to perform optimizations around the call.
This change is conservative; the optimizer will remove these attrs where
it can, see r260318, r260319.
Reviewers: majnemer
Subscribers: cfe-commits, jhen, echristo, tra
Differential Revision: http://reviews.llvm.org/D17056
llvm-svn: 261779
2016-02-25 05:55:11 +08:00
|
|
|
|
|
|
|
// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
|
|
|
|
// RUN: -disable-llvm-passes -o - %s | \
|
2018-07-12 04:26:20 +08:00
|
|
|
// RUN: FileCheck -allow-deprecated-dag-overlap -check-prefix HOST %s
|
[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent.
Summary:
This is important for e.g. the following case:
void sync() { __syncthreads(); }
void foo() {
do_something();
sync();
do_something_else():
}
Without this change, if the optimizer does not inline sync() (which it
won't because __syncthreads is also marked as noduplicate, for now
anyway), it is free to perform optimizations on sync() that it would not
be able to perform on __syncthreads(), because sync() is not marked as
convergent.
Similarly, we need a notion of convergent calls, since in the case when
we can't statically determine a call's target(s), we need to know
whether it's safe to perform optimizations around the call.
This change is conservative; the optimizer will remove these attrs where
it can, see r260318, r260319.
Reviewers: majnemer
Subscribers: cfe-commits, jhen, echristo, tra
Differential Revision: http://reviews.llvm.org/D17056
llvm-svn: 261779
2016-02-25 05:55:11 +08:00
|
|
|
|
|
|
|
#include "Inputs/cuda.h"
|
|
|
|
|
|
|
|
// DEVICE: Function Attrs:
|
|
|
|
// DEVICE-SAME: convergent
|
2020-12-31 16:27:11 +08:00
|
|
|
// DEVICE-NEXT: define{{.*}} void @_Z3foov
|
[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent.
Summary:
This is important for e.g. the following case:
void sync() { __syncthreads(); }
void foo() {
do_something();
sync();
do_something_else():
}
Without this change, if the optimizer does not inline sync() (which it
won't because __syncthreads is also marked as noduplicate, for now
anyway), it is free to perform optimizations on sync() that it would not
be able to perform on __syncthreads(), because sync() is not marked as
convergent.
Similarly, we need a notion of convergent calls, since in the case when
we can't statically determine a call's target(s), we need to know
whether it's safe to perform optimizations around the call.
This change is conservative; the optimizer will remove these attrs where
it can, see r260318, r260319.
Reviewers: majnemer
Subscribers: cfe-commits, jhen, echristo, tra
Differential Revision: http://reviews.llvm.org/D17056
llvm-svn: 261779
2016-02-25 05:55:11 +08:00
|
|
|
__device__ void foo() {}
|
|
|
|
|
|
|
|
// HOST: Function Attrs:
|
|
|
|
// HOST-NOT: convergent
|
2020-12-31 16:27:11 +08:00
|
|
|
// HOST-NEXT: define{{.*}} void @_Z3barv
|
[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent.
Summary:
This is important for e.g. the following case:
void sync() { __syncthreads(); }
void foo() {
do_something();
sync();
do_something_else():
}
Without this change, if the optimizer does not inline sync() (which it
won't because __syncthreads is also marked as noduplicate, for now
anyway), it is free to perform optimizations on sync() that it would not
be able to perform on __syncthreads(), because sync() is not marked as
convergent.
Similarly, we need a notion of convergent calls, since in the case when
we can't statically determine a call's target(s), we need to know
whether it's safe to perform optimizations around the call.
This change is conservative; the optimizer will remove these attrs where
it can, see r260318, r260319.
Reviewers: majnemer
Subscribers: cfe-commits, jhen, echristo, tra
Differential Revision: http://reviews.llvm.org/D17056
llvm-svn: 261779
2016-02-25 05:55:11 +08:00
|
|
|
// DEVICE: Function Attrs:
|
|
|
|
// DEVICE-SAME: convergent
|
2020-12-31 16:27:11 +08:00
|
|
|
// DEVICE-NEXT: define{{.*}} void @_Z3barv
|
[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent.
Summary:
This is important for e.g. the following case:
void sync() { __syncthreads(); }
void foo() {
do_something();
sync();
do_something_else():
}
Without this change, if the optimizer does not inline sync() (which it
won't because __syncthreads is also marked as noduplicate, for now
anyway), it is free to perform optimizations on sync() that it would not
be able to perform on __syncthreads(), because sync() is not marked as
convergent.
Similarly, we need a notion of convergent calls, since in the case when
we can't statically determine a call's target(s), we need to know
whether it's safe to perform optimizations around the call.
This change is conservative; the optimizer will remove these attrs where
it can, see r260318, r260319.
Reviewers: majnemer
Subscribers: cfe-commits, jhen, echristo, tra
Differential Revision: http://reviews.llvm.org/D17056
llvm-svn: 261779
2016-02-25 05:55:11 +08:00
|
|
|
__host__ __device__ void baz();
|
|
|
|
__host__ __device__ void bar() {
|
|
|
|
// DEVICE: call void @_Z3bazv() [[CALL_ATTR:#[0-9]+]]
|
|
|
|
baz();
|
2016-06-01 05:27:13 +08:00
|
|
|
// DEVICE: call i32 asm "trap;", "=l"() [[ASM_ATTR:#[0-9]+]]
|
|
|
|
int x;
|
|
|
|
asm ("trap;" : "=l"(x));
|
|
|
|
// DEVICE: call void asm sideeffect "trap;", ""() [[ASM_ATTR:#[0-9]+]]
|
|
|
|
asm volatile ("trap;");
|
[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent.
Summary:
This is important for e.g. the following case:
void sync() { __syncthreads(); }
void foo() {
do_something();
sync();
do_something_else():
}
Without this change, if the optimizer does not inline sync() (which it
won't because __syncthreads is also marked as noduplicate, for now
anyway), it is free to perform optimizations on sync() that it would not
be able to perform on __syncthreads(), because sync() is not marked as
convergent.
Similarly, we need a notion of convergent calls, since in the case when
we can't statically determine a call's target(s), we need to know
whether it's safe to perform optimizations around the call.
This change is conservative; the optimizer will remove these attrs where
it can, see r260318, r260319.
Reviewers: majnemer
Subscribers: cfe-commits, jhen, echristo, tra
Differential Revision: http://reviews.llvm.org/D17056
llvm-svn: 261779
2016-02-25 05:55:11 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
|
|
|
|
// DEVICE: attributes [[BAZ_ATTR]] = {
|
|
|
|
// DEVICE-SAME: convergent
|
|
|
|
// DEVICE-SAME: }
|
2016-10-05 07:41:49 +08:00
|
|
|
// DEVICE-DAG: attributes [[CALL_ATTR]] = { convergent
|
|
|
|
// DEVICE-DAG: attributes [[ASM_ATTR]] = { convergent
|
[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent.
Summary:
This is important for e.g. the following case:
void sync() { __syncthreads(); }
void foo() {
do_something();
sync();
do_something_else():
}
Without this change, if the optimizer does not inline sync() (which it
won't because __syncthreads is also marked as noduplicate, for now
anyway), it is free to perform optimizations on sync() that it would not
be able to perform on __syncthreads(), because sync() is not marked as
convergent.
Similarly, we need a notion of convergent calls, since in the case when
we can't statically determine a call's target(s), we need to know
whether it's safe to perform optimizations around the call.
This change is conservative; the optimizer will remove these attrs where
it can, see r260318, r260319.
Reviewers: majnemer
Subscribers: cfe-commits, jhen, echristo, tra
Differential Revision: http://reviews.llvm.org/D17056
llvm-svn: 261779
2016-02-25 05:55:11 +08:00
|
|
|
|
|
|
|
// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
|
|
|
|
// HOST: attributes [[BAZ_ATTR]] = {
|
|
|
|
// HOST-NOT: convergent
|
2021-06-28 16:13:02 +08:00
|
|
|
// HOST-SAME: }
|