llvm-project/clang/test/CodeGenCoroutines/coro-dest-slot.cpp

// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s

#include "Inputs/coroutine.h"

using namespace std::experimental;

struct coro {
  struct promise_type {
    coro get_return_object();
    suspend_always initial_suspend();
    suspend_never final_suspend() noexcept;
    void return_void();
    static void unhandled_exception();
  };
};

extern "C" coro f(int) { co_return; }
// Verify that cleanup.dest.slot is eliminated in a coroutine.
// CHECK-LABEL: f(
// CHECK: %[[INIT_SUSPEND:.+]] = call i8 @llvm.coro.suspend(
// CHECK-NEXT: switch i8 %[[INIT_SUSPEND]], label
// CHECK-NEXT:   i8 0, label %[[INIT_READY:.+]]
// CHECK-NEXT:   i8 1, label %[[INIT_CLEANUP:.+]]
// CHECK-NEXT: ]
// CHECK: %[[CLEANUP_DEST0:.+]] = phi i32 [ 0, %[[INIT_READY]] ], [ 2, %[[INIT_CLEANUP]] ]

// CHECK: %[[FINAL_SUSPEND:.+]] = call i8 @llvm.coro.suspend(
// CHECK-NEXT: switch i8 %{{.*}}, label %coro.ret [
// CHECK-NEXT:   i8 0, label %[[FINAL_READY:.+]]
// CHECK-NEXT:   i8 1, label %[[FINAL_CLEANUP:.+]]
// CHECK-NEXT: ]

// CHECK: call void @_ZNSt12experimental13coroutines_v113suspend_never12await_resumeEv(
// CHECK: %[[CLEANUP_DEST1:.+]] = phi i32 [ 0, %[[FINAL_READY]] ], [ 2, %[[FINAL_CLEANUP]] ]
// CHECK: %[[CLEANUP_DEST2:.+]] = phi i32 [ %[[CLEANUP_DEST0]], %{{.+}} ], [ %[[CLEANUP_DEST1]], %{{.+}} ], [ 0, %{{.+}} ]
// CHECK: call i8* @llvm.coro.free(
// CHECK: switch i32 %[[CLEANUP_DEST2]], label %{{.+}} [
// CHECK-NEXT: i32 0
// CHECK-NEXT: i32 2
// CHECK-NEXT: ]
[coroutines] Promote cleanup.dest.slot allocas to registers to avoid storing it in the coroutine frame Summary: We don't want to store cleanup dest slot saved into the coroutine frame (as some of the cleanup code may access them after coroutine frame destroyed). This is an alternative to https://reviews.llvm.org/D37093 It is possible to do this for all functions, but, cursory check showed that in -O0, we get slightly longer function (by 1-3 instructions), thus, we are only limiting cleanup.dest.slot elimination to coroutines. Reviewers: rjmccall, hfinkel, eric_niebler Reviewed By: eric_niebler Subscribers: EricWF, cfe-commits Differential Revision: https://reviews.llvm.org/D39768 llvm-svn: 317981 2017-11-12 01:00:43 +08:00			`// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -emit-llvm %s -o - -disable-llvm-passes \| FileCheck %s`

			`#include "Inputs/coroutine.h"`

			`using namespace std::experimental;`

			`struct coro {`
			`struct promise_type {`
			`coro get_return_object();`
			`suspend_always initial_suspend();`
[Coroutines] Ensure co_await promise.final_suspend() does not throw Summary: This patch addresses https://bugs.llvm.org/show_bug.cgi?id=46256 The spec of coroutine requires that the expression co_await promise.final_suspend() shall not be potentially-throwing. To check this, we recursively look at every call (including Call, MemberCall, OperatorCall and Constructor) in all code generated by the final suspend, and ensure that the callees are declared with noexcept. We also look at any returned data type that requires explicit destruction, and check their destructors for noexcept. This patch does not check declarations with dependent types yet, which will be done in future patches. Updated all tests to add noexcept to the required functions, and added a dedicated test for this patch. This patch might start to cause existing codebase fail to compile because most people may not have been strict in tagging all the related functions noexcept. Reviewers: lewissbaker, modocache, junparser Reviewed By: modocache Subscribers: arphaman, junparser, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D82029 2020-06-16 07:27:41 +08:00			`suspend_never final_suspend() noexcept;`
[coroutines] Promote cleanup.dest.slot allocas to registers to avoid storing it in the coroutine frame Summary: We don't want to store cleanup dest slot saved into the coroutine frame (as some of the cleanup code may access them after coroutine frame destroyed). This is an alternative to https://reviews.llvm.org/D37093 It is possible to do this for all functions, but, cursory check showed that in -O0, we get slightly longer function (by 1-3 instructions), thus, we are only limiting cleanup.dest.slot elimination to coroutines. Reviewers: rjmccall, hfinkel, eric_niebler Reviewed By: eric_niebler Subscribers: EricWF, cfe-commits Differential Revision: https://reviews.llvm.org/D39768 llvm-svn: 317981 2017-11-12 01:00:43 +08:00			`void return_void();`
			`static void unhandled_exception();`
			`};`
			`};`

			`extern "C" coro f(int) { co_return; }`
			`// Verify that cleanup.dest.slot is eliminated in a coroutine.`
			`// CHECK-LABEL: f(`
[Coroutine][Clang] Force emit lifetime intrinsics for Coroutines tl;dr Correct implementation of Corouintes requires having lifetime intrinsics available. Coroutine functions are functions that can be suspended and resumed latter. To do so, data that need to stay alive after suspension must be put on the heap (i.e. the coroutine frame). The optimizer is responsible for analyzing each AllocaInst and figure out whether it should be put on the stack or the frame. In most cases, for data that we are unable to accurately analyze lifetime, we can just conservatively put them on the heap. Unfortunately, there exists a few cases where certain data MUST be put on the stack, not on the heap. Without lifetime intrinsics, we are unable to correctly analyze those data's lifetime. To dig into more details, there exists cases where at certain code points, the current coroutine frame may have already been destroyed. Hence no frame access would be allowed beyond that point. The following is a common code pattern called "Symmetric Transfer" in coroutine: ``` auto tmp = await_suspend(); __builtin_coro_resume(tmp.address()); return; ``` In the above code example, `await_suspend()` returns a new coroutine handle, which we will obtain the address and then resume that coroutine. This essentially "transfered" from the current coroutine to a different coroutine. During the call to `await_suspend()`, the current coroutine may be destroyed, which should be fine because we are not accessing any data afterwards. However when LLVM is emitting IR for the above code, it needs to emit an AllocaInst for `tmp`. It will then call the `address` function on tmp. `address` function is a member function of coroutine, and there is no way for the LLVM optimizer to know that it does not capture the `tmp` pointer. So when the optimizer looks at it, it has to conservatively assume that `tmp` may escape and hence put it on the heap. Furthermore, in some cases `address` call would be inlined, which will generate a bunch of store/load instructions that move the `tmp` pointer around. Those stores will also make the compiler to think that `tmp` might escape. To summarize, it's really difficult for the mid-end to figure out that the `tmp` data is short-lived. I made some attempt in D98638, but it appears to be way too complex and is basically doing the same thing as inserting lifetime intrinsics in coroutines. Also, for reference, we already force emitting lifetime intrinsics in O0 for AlwaysInliner: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Passes/PassBuilder.cpp#L1893 Differential Revision: https://reviews.llvm.org/D99227 2021-03-26 04:46:20 +08:00			`// CHECK: %[[INIT_SUSPEND:.+]] = call i8 @llvm.coro.suspend(`
			`// CHECK-NEXT: switch i8 %[[INIT_SUSPEND]], label`
			`// CHECK-NEXT: i8 0, label %[[INIT_READY:.+]]`
			`// CHECK-NEXT: i8 1, label %[[INIT_CLEANUP:.+]]`
			`// CHECK-NEXT: ]`
			`// CHECK: %[[CLEANUP_DEST0:.+]] = phi i32 [ 0, %[[INIT_READY]] ], [ 2, %[[INIT_CLEANUP]] ]`

			`// CHECK: %[[FINAL_SUSPEND:.+]] = call i8 @llvm.coro.suspend(`
[NFC][Coroutines] Fix two tests by removing hardcoded SSA value. 2021-05-10 10:04:07 +08:00			`// CHECK-NEXT: switch i8 %{{.*}}, label %coro.ret [`
[Coroutine][Clang] Force emit lifetime intrinsics for Coroutines tl;dr Correct implementation of Corouintes requires having lifetime intrinsics available. Coroutine functions are functions that can be suspended and resumed latter. To do so, data that need to stay alive after suspension must be put on the heap (i.e. the coroutine frame). The optimizer is responsible for analyzing each AllocaInst and figure out whether it should be put on the stack or the frame. In most cases, for data that we are unable to accurately analyze lifetime, we can just conservatively put them on the heap. Unfortunately, there exists a few cases where certain data MUST be put on the stack, not on the heap. Without lifetime intrinsics, we are unable to correctly analyze those data's lifetime. To dig into more details, there exists cases where at certain code points, the current coroutine frame may have already been destroyed. Hence no frame access would be allowed beyond that point. The following is a common code pattern called "Symmetric Transfer" in coroutine: ``` auto tmp = await_suspend(); __builtin_coro_resume(tmp.address()); return; ``` In the above code example, `await_suspend()` returns a new coroutine handle, which we will obtain the address and then resume that coroutine. This essentially "transfered" from the current coroutine to a different coroutine. During the call to `await_suspend()`, the current coroutine may be destroyed, which should be fine because we are not accessing any data afterwards. However when LLVM is emitting IR for the above code, it needs to emit an AllocaInst for `tmp`. It will then call the `address` function on tmp. `address` function is a member function of coroutine, and there is no way for the LLVM optimizer to know that it does not capture the `tmp` pointer. So when the optimizer looks at it, it has to conservatively assume that `tmp` may escape and hence put it on the heap. Furthermore, in some cases `address` call would be inlined, which will generate a bunch of store/load instructions that move the `tmp` pointer around. Those stores will also make the compiler to think that `tmp` might escape. To summarize, it's really difficult for the mid-end to figure out that the `tmp` data is short-lived. I made some attempt in D98638, but it appears to be way too complex and is basically doing the same thing as inserting lifetime intrinsics in coroutines. Also, for reference, we already force emitting lifetime intrinsics in O0 for AlwaysInliner: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Passes/PassBuilder.cpp#L1893 Differential Revision: https://reviews.llvm.org/D99227 2021-03-26 04:46:20 +08:00			`// CHECK-NEXT: i8 0, label %[[FINAL_READY:.+]]`
			`// CHECK-NEXT: i8 1, label %[[FINAL_CLEANUP:.+]]`
			`// CHECK-NEXT: ]`

[coroutines] Promote cleanup.dest.slot allocas to registers to avoid storing it in the coroutine frame Summary: We don't want to store cleanup dest slot saved into the coroutine frame (as some of the cleanup code may access them after coroutine frame destroyed). This is an alternative to https://reviews.llvm.org/D37093 It is possible to do this for all functions, but, cursory check showed that in -O0, we get slightly longer function (by 1-3 instructions), thus, we are only limiting cleanup.dest.slot elimination to coroutines. Reviewers: rjmccall, hfinkel, eric_niebler Reviewed By: eric_niebler Subscribers: EricWF, cfe-commits Differential Revision: https://reviews.llvm.org/D39768 llvm-svn: 317981 2017-11-12 01:00:43 +08:00			`// CHECK: call void @_ZNSt12experimental13coroutines_v113suspend_never12await_resumeEv(`
[Coroutine][Clang] Force emit lifetime intrinsics for Coroutines tl;dr Correct implementation of Corouintes requires having lifetime intrinsics available. Coroutine functions are functions that can be suspended and resumed latter. To do so, data that need to stay alive after suspension must be put on the heap (i.e. the coroutine frame). The optimizer is responsible for analyzing each AllocaInst and figure out whether it should be put on the stack or the frame. In most cases, for data that we are unable to accurately analyze lifetime, we can just conservatively put them on the heap. Unfortunately, there exists a few cases where certain data MUST be put on the stack, not on the heap. Without lifetime intrinsics, we are unable to correctly analyze those data's lifetime. To dig into more details, there exists cases where at certain code points, the current coroutine frame may have already been destroyed. Hence no frame access would be allowed beyond that point. The following is a common code pattern called "Symmetric Transfer" in coroutine: ``` auto tmp = await_suspend(); __builtin_coro_resume(tmp.address()); return; ``` In the above code example, `await_suspend()` returns a new coroutine handle, which we will obtain the address and then resume that coroutine. This essentially "transfered" from the current coroutine to a different coroutine. During the call to `await_suspend()`, the current coroutine may be destroyed, which should be fine because we are not accessing any data afterwards. However when LLVM is emitting IR for the above code, it needs to emit an AllocaInst for `tmp`. It will then call the `address` function on tmp. `address` function is a member function of coroutine, and there is no way for the LLVM optimizer to know that it does not capture the `tmp` pointer. So when the optimizer looks at it, it has to conservatively assume that `tmp` may escape and hence put it on the heap. Furthermore, in some cases `address` call would be inlined, which will generate a bunch of store/load instructions that move the `tmp` pointer around. Those stores will also make the compiler to think that `tmp` might escape. To summarize, it's really difficult for the mid-end to figure out that the `tmp` data is short-lived. I made some attempt in D98638, but it appears to be way too complex and is basically doing the same thing as inserting lifetime intrinsics in coroutines. Also, for reference, we already force emitting lifetime intrinsics in O0 for AlwaysInliner: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Passes/PassBuilder.cpp#L1893 Differential Revision: https://reviews.llvm.org/D99227 2021-03-26 04:46:20 +08:00			`// CHECK: %[[CLEANUP_DEST1:.+]] = phi i32 [ 0, %[[FINAL_READY]] ], [ 2, %[[FINAL_CLEANUP]] ]`
			`// CHECK: %[[CLEANUP_DEST2:.+]] = phi i32 [ %[[CLEANUP_DEST0]], %{{.+}} ], [ %[[CLEANUP_DEST1]], %{{.+}} ], [ 0, %{{.+}} ]`
[coroutines] Promote cleanup.dest.slot allocas to registers to avoid storing it in the coroutine frame Summary: We don't want to store cleanup dest slot saved into the coroutine frame (as some of the cleanup code may access them after coroutine frame destroyed). This is an alternative to https://reviews.llvm.org/D37093 It is possible to do this for all functions, but, cursory check showed that in -O0, we get slightly longer function (by 1-3 instructions), thus, we are only limiting cleanup.dest.slot elimination to coroutines. Reviewers: rjmccall, hfinkel, eric_niebler Reviewed By: eric_niebler Subscribers: EricWF, cfe-commits Differential Revision: https://reviews.llvm.org/D39768 llvm-svn: 317981 2017-11-12 01:00:43 +08:00			`// CHECK: call i8* @llvm.coro.free(`
[Coroutine][Clang] Force emit lifetime intrinsics for Coroutines tl;dr Correct implementation of Corouintes requires having lifetime intrinsics available. Coroutine functions are functions that can be suspended and resumed latter. To do so, data that need to stay alive after suspension must be put on the heap (i.e. the coroutine frame). The optimizer is responsible for analyzing each AllocaInst and figure out whether it should be put on the stack or the frame. In most cases, for data that we are unable to accurately analyze lifetime, we can just conservatively put them on the heap. Unfortunately, there exists a few cases where certain data MUST be put on the stack, not on the heap. Without lifetime intrinsics, we are unable to correctly analyze those data's lifetime. To dig into more details, there exists cases where at certain code points, the current coroutine frame may have already been destroyed. Hence no frame access would be allowed beyond that point. The following is a common code pattern called "Symmetric Transfer" in coroutine: ``` auto tmp = await_suspend(); __builtin_coro_resume(tmp.address()); return; ``` In the above code example, `await_suspend()` returns a new coroutine handle, which we will obtain the address and then resume that coroutine. This essentially "transfered" from the current coroutine to a different coroutine. During the call to `await_suspend()`, the current coroutine may be destroyed, which should be fine because we are not accessing any data afterwards. However when LLVM is emitting IR for the above code, it needs to emit an AllocaInst for `tmp`. It will then call the `address` function on tmp. `address` function is a member function of coroutine, and there is no way for the LLVM optimizer to know that it does not capture the `tmp` pointer. So when the optimizer looks at it, it has to conservatively assume that `tmp` may escape and hence put it on the heap. Furthermore, in some cases `address` call would be inlined, which will generate a bunch of store/load instructions that move the `tmp` pointer around. Those stores will also make the compiler to think that `tmp` might escape. To summarize, it's really difficult for the mid-end to figure out that the `tmp` data is short-lived. I made some attempt in D98638, but it appears to be way too complex and is basically doing the same thing as inserting lifetime intrinsics in coroutines. Also, for reference, we already force emitting lifetime intrinsics in O0 for AlwaysInliner: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Passes/PassBuilder.cpp#L1893 Differential Revision: https://reviews.llvm.org/D99227 2021-03-26 04:46:20 +08:00			`// CHECK: switch i32 %[[CLEANUP_DEST2]], label %{{.+}} [`
[coroutines] Promote cleanup.dest.slot allocas to registers to avoid storing it in the coroutine frame Summary: We don't want to store cleanup dest slot saved into the coroutine frame (as some of the cleanup code may access them after coroutine frame destroyed). This is an alternative to https://reviews.llvm.org/D37093 It is possible to do this for all functions, but, cursory check showed that in -O0, we get slightly longer function (by 1-3 instructions), thus, we are only limiting cleanup.dest.slot elimination to coroutines. Reviewers: rjmccall, hfinkel, eric_niebler Reviewed By: eric_niebler Subscribers: EricWF, cfe-commits Differential Revision: https://reviews.llvm.org/D39768 llvm-svn: 317981 2017-11-12 01:00:43 +08:00			`// CHECK-NEXT: i32 0`
			`// CHECK-NEXT: i32 2`
			`// CHECK-NEXT: ]`