[GPGPU] Synchronize after each kernel, not each copy out

Summary: This change reduces the overall number of synchronize calls for kernels with a lot of output data at the cost of additional synchronize calls for kernels launched in sequence without any device to host transfers in between. As the latter pattern is a lot less frequent, this seems a better tradeoff. Even though the above motivation would be motivation enough, this is just a step towards enabling ppcg to not compute to and from device copy calls at all, which would be incorrect in case we still relied on these calls to place our synchronization statements. Reviewers: Meinersbur, bollu, singam-sanjay Reviewed By: bollu Subscribers: nemanjai, kbarton, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D36867 llvm-svn: 311155
2017-08-18 12:55:58 +00:00 · 2017-08-18 12:55:58 +00:00 · 62acb344d0
parent ec9581e5e0
commit 62acb344d0
1 changed files with 2 additions and 1 deletions
--- a/polly/lib/CodeGen/PPCGCodeGeneration.cpp
+++ b/polly/lib/CodeGen/PPCGCodeGeneration.cpp
@ -1219,6 +1219,8 @@ void GPUNodeBuilder::createUser(__isl_take isl_ast_node *UserStmt) {
  const char *Str = isl_id_get_name(Id);
  if (!strcmp(Str, "kernel")) {
    createKernel(UserStmt);
+    if (PollyManagedMemory)
+      createCallSynchronizeDevice();
    isl_ast_expr_free(Expr);
    return;
  }
@ -1248,7 +1250,6 @@ void GPUNodeBuilder::createUser(__isl_take isl_ast_node *UserStmt) {
    if (!PollyManagedMemory) {
      createDataTransfer(UserStmt, DEVICE_TO_HOST);
    } else {
-      createCallSynchronizeDevice();
      isl_ast_node_free(UserStmt);
    }
    isl_ast_expr_free(Expr);