llvm-project/clang/test/AST/ast-dump-openmp-target-para...

958 lines
84 KiB
C
Raw Normal View History

// RUN: %clang_cc1 -triple x86_64-unknown-unknown -fopenmp -ast-dump %s | FileCheck --match-full-lines -implicit-check-not=openmp_structured_block %s
void test_one(int x) {
#pragma omp target parallel for simd
for (int i = 0; i < x; i++)
;
}
void test_two(int x, int y) {
#pragma omp target parallel for simd
for (int i = 0; i < x; i++)
for (int i = 0; i < y; i++)
;
}
void test_three(int x, int y) {
#pragma omp target parallel for simd collapse(1)
for (int i = 0; i < x; i++)
for (int i = 0; i < y; i++)
;
}
void test_four(int x, int y) {
#pragma omp target parallel for simd collapse(2)
for (int i = 0; i < x; i++)
for (int i = 0; i < y; i++)
;
}
void test_five(int x, int y, int z) {
#pragma omp target parallel for simd collapse(2)
for (int i = 0; i < x; i++)
for (int i = 0; i < y; i++)
for (int i = 0; i < z; i++)
;
}
// CHECK: TranslationUnitDecl {{.*}} <<invalid sloc>> <invalid sloc>
// CHECK: |-FunctionDecl {{.*}} <{{.*}}ast-dump-openmp-target-parallel-for-simd.c:3:1, line:7:1> line:3:6 test_one 'void (int)'
// CHECK-NEXT: | |-ParmVarDecl {{.*}} <col:15, col:19> col:19 used x 'int'
// CHECK-NEXT: | `-CompoundStmt {{.*}} <col:22, line:7:1>
// CHECK-NEXT: | `-OMPTargetParallelForSimdDirective {{.*}} <line:4:9, col:37>
// CHECK-NEXT: | |-OMPFirstprivateClause {{.*}} <<invalid sloc>> <implicit>
// CHECK-NEXT: | | `-DeclRefExpr {{.*}} <line:5:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | `-CapturedStmt {{.*}} <col:3, line:6:5>
// CHECK-NEXT: | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | |-CapturedStmt {{.*}} <line:5:3, line:6:5>
// CHECK-NEXT: | | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | |-CapturedStmt {{.*}} <line:5:3, line:6:5>
// CHECK-NEXT: | | | | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | | | |-ForStmt {{.*}} <line:5:3, line:6:5>
// CHECK-NEXT: | | | | | | | |-DeclStmt {{.*}} <line:5:8, col:17>
// CHECK-NEXT: | | | | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | | | | `-NullStmt {{.*}} <line:6:5> openmp_structured_block
// CHECK-NEXT: | | | | | | |-ImplicitParamDecl {{.*}} <line:4:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:4:9) *const restrict'
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <line:5:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <line:4:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:4:9) *const restrict'
// CHECK-NEXT: | | | | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | | | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | | | | `-FieldDecl {{.*}} <line:5:23> col:23 implicit 'int'
// CHECK-NEXT: | | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | |-ForStmt {{.*}} <col:3, line:6:5>
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:5:8, col:17>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | | `-NullStmt {{.*}} <line:6:5> openmp_structured_block
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <line:4:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:4:9) *const restrict'
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <line:5:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | |-AlwaysInlineAttr {{.*}} <<invalid sloc>> Implicit __forceinline
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <line:4:9> col:9 implicit .global_tid. 'const int'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .part_id. 'const int *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .privates. 'void *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .copy_fn. 'void (*const restrict)(void *const restrict, ...)'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .task_t. 'void *const'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:4:9) *const restrict'
// CHECK-NEXT: | | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | | `-FieldDecl {{.*}} <line:5:23> col:23 implicit 'int'
// CHECK-NEXT: | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | |-CapturedStmt {{.*}} <col:3, line:6:5>
// CHECK-NEXT: | | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | |-ForStmt {{.*}} <line:5:3, line:6:5>
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:5:8, col:17>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | | `-NullStmt {{.*}} <line:6:5> openmp_structured_block
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <line:4:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:4:9) *const restrict'
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <line:5:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <line:4:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:4:9) *const restrict'
// CHECK-NEXT: | | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | | `-FieldDecl {{.*}} <line:5:23> col:23 implicit 'int'
// CHECK-NEXT: | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | |-ForStmt {{.*}} <col:3, line:6:5>
// CHECK-NEXT: | | | |-DeclStmt {{.*}} <line:5:8, col:17>
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | |-<<<NULL>>>
// CHECK-NEXT: | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | `-NullStmt {{.*}} <line:6:5> openmp_structured_block
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <line:4:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:4:9) *const restrict'
// CHECK-NEXT: | | `-VarDecl {{.*}} <line:5:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: |-FunctionDecl {{.*}} <line:9:1, line:14:1> line:9:6 test_two 'void (int, int)'
// CHECK-NEXT: | |-ParmVarDecl {{.*}} <col:15, col:19> col:19 used x 'int'
// CHECK-NEXT: | |-ParmVarDecl {{.*}} <col:22, col:26> col:26 used y 'int'
// CHECK-NEXT: | `-CompoundStmt {{.*}} <col:29, line:14:1>
// CHECK-NEXT: | `-OMPTargetParallelForSimdDirective {{.*}} <line:10:9, col:37>
// CHECK-NEXT: | |-OMPFirstprivateClause {{.*}} <<invalid sloc>> <implicit>
// CHECK-NEXT: | | |-DeclRefExpr {{.*}} <line:11:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | `-DeclRefExpr {{.*}} <line:12:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | `-CapturedStmt {{.*}} <line:11:3, line:13:7>
// CHECK-NEXT: | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | |-CapturedStmt {{.*}} <line:11:3, line:13:7>
// CHECK-NEXT: | | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | |-CapturedStmt {{.*}} <line:11:3, line:13:7>
// CHECK-NEXT: | | | | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | | | |-ForStmt {{.*}} <line:11:3, line:13:7>
// CHECK-NEXT: | | | | | | | |-DeclStmt {{.*}} <line:11:8, col:17>
// CHECK-NEXT: | | | | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | | | | `-ForStmt {{.*}} <line:12:5, line:13:7> openmp_structured_block
// CHECK-NEXT: | | | | | | | |-DeclStmt {{.*}} <line:12:10, col:19>
// CHECK-NEXT: | | | | | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | | `-NullStmt {{.*}} <line:13:7>
// CHECK-NEXT: | | | | | | |-ImplicitParamDecl {{.*}} <line:10:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:10:9) *const restrict'
// CHECK-NEXT: | | | | | | |-VarDecl {{.*}} <line:11:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <line:12:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | |-DeclRefExpr {{.*}} <line:11:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <line:12:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <line:10:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:10:9) *const restrict'
// CHECK-NEXT: | | | | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | | | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | | | | |-FieldDecl {{.*}} <line:11:23> col:23 implicit 'int'
// CHECK-NEXT: | | | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | | | `-FieldDecl {{.*}} <line:12:25> col:25 implicit 'int'
// CHECK-NEXT: | | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | |-ForStmt {{.*}} <line:11:3, line:13:7>
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:11:8, col:17>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | | `-ForStmt {{.*}} <line:12:5, line:13:7> openmp_structured_block
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:12:10, col:19>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | `-NullStmt {{.*}} <line:13:7>
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <line:10:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:10:9) *const restrict'
// CHECK-NEXT: | | | | |-VarDecl {{.*}} <line:11:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <line:12:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | |-DeclRefExpr {{.*}} <line:11:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <line:12:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | |-AlwaysInlineAttr {{.*}} <<invalid sloc>> Implicit __forceinline
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <line:10:9> col:9 implicit .global_tid. 'const int'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .part_id. 'const int *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .privates. 'void *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .copy_fn. 'void (*const restrict)(void *const restrict, ...)'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .task_t. 'void *const'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:10:9) *const restrict'
// CHECK-NEXT: | | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | | |-FieldDecl {{.*}} <line:11:23> col:23 implicit 'int'
// CHECK-NEXT: | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | `-FieldDecl {{.*}} <line:12:25> col:25 implicit 'int'
// CHECK-NEXT: | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | |-CapturedStmt {{.*}} <line:11:3, line:13:7>
// CHECK-NEXT: | | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | |-ForStmt {{.*}} <line:11:3, line:13:7>
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:11:8, col:17>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | | `-ForStmt {{.*}} <line:12:5, line:13:7> openmp_structured_block
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:12:10, col:19>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | `-NullStmt {{.*}} <line:13:7>
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <line:10:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:10:9) *const restrict'
// CHECK-NEXT: | | | | |-VarDecl {{.*}} <line:11:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <line:12:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | |-DeclRefExpr {{.*}} <line:11:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <line:12:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <line:10:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:10:9) *const restrict'
// CHECK-NEXT: | | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | | |-FieldDecl {{.*}} <line:11:23> col:23 implicit 'int'
// CHECK-NEXT: | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | `-FieldDecl {{.*}} <line:12:25> col:25 implicit 'int'
// CHECK-NEXT: | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | |-ForStmt {{.*}} <line:11:3, line:13:7>
// CHECK-NEXT: | | | |-DeclStmt {{.*}} <line:11:8, col:17>
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | |-<<<NULL>>>
// CHECK-NEXT: | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | `-ForStmt {{.*}} <line:12:5, line:13:7> openmp_structured_block
// CHECK-NEXT: | | | |-DeclStmt {{.*}} <line:12:10, col:19>
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | |-<<<NULL>>>
// CHECK-NEXT: | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | `-NullStmt {{.*}} <line:13:7>
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <line:10:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:10:9) *const restrict'
// CHECK-NEXT: | | |-VarDecl {{.*}} <line:11:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | `-VarDecl {{.*}} <line:12:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | |-DeclRefExpr {{.*}} <line:11:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | `-DeclRefExpr {{.*}} <line:12:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: |-FunctionDecl {{.*}} <line:16:1, line:21:1> line:16:6 test_three 'void (int, int)'
// CHECK-NEXT: | |-ParmVarDecl {{.*}} <col:17, col:21> col:21 used x 'int'
// CHECK-NEXT: | |-ParmVarDecl {{.*}} <col:24, col:28> col:28 used y 'int'
// CHECK-NEXT: | `-CompoundStmt {{.*}} <col:31, line:21:1>
// CHECK-NEXT: | `-OMPTargetParallelForSimdDirective {{.*}} <line:17:9, col:49>
// CHECK-NEXT: | |-OMPCollapseClause {{.*}} <col:38, col:48>
// CHECK-NEXT: | | `-ConstantExpr {{.*}} <col:47> 'int'
// CHECK-NEXT: | | `-IntegerLiteral {{.*}} <col:47> 'int' 1
// CHECK-NEXT: | |-OMPFirstprivateClause {{.*}} <<invalid sloc>> <implicit>
// CHECK-NEXT: | | |-DeclRefExpr {{.*}} <line:18:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | `-DeclRefExpr {{.*}} <line:19:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | `-CapturedStmt {{.*}} <line:18:3, line:20:7>
// CHECK-NEXT: | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | |-CapturedStmt {{.*}} <line:18:3, line:20:7>
// CHECK-NEXT: | | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | |-CapturedStmt {{.*}} <line:18:3, line:20:7>
// CHECK-NEXT: | | | | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | | | |-ForStmt {{.*}} <line:18:3, line:20:7>
// CHECK-NEXT: | | | | | | | |-DeclStmt {{.*}} <line:18:8, col:17>
// CHECK-NEXT: | | | | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | | | | `-ForStmt {{.*}} <line:19:5, line:20:7> openmp_structured_block
// CHECK-NEXT: | | | | | | | |-DeclStmt {{.*}} <line:19:10, col:19>
// CHECK-NEXT: | | | | | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | | `-NullStmt {{.*}} <line:20:7>
// CHECK-NEXT: | | | | | | |-ImplicitParamDecl {{.*}} <line:17:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:17:9) *const restrict'
// CHECK-NEXT: | | | | | | |-VarDecl {{.*}} <line:18:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <line:19:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | |-DeclRefExpr {{.*}} <line:18:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <line:19:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <line:17:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:17:9) *const restrict'
// CHECK-NEXT: | | | | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | | | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | | | | |-FieldDecl {{.*}} <line:18:23> col:23 implicit 'int'
// CHECK-NEXT: | | | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | | | `-FieldDecl {{.*}} <line:19:25> col:25 implicit 'int'
// CHECK-NEXT: | | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | |-ForStmt {{.*}} <line:18:3, line:20:7>
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:18:8, col:17>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | | `-ForStmt {{.*}} <line:19:5, line:20:7> openmp_structured_block
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:19:10, col:19>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | `-NullStmt {{.*}} <line:20:7>
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <line:17:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:17:9) *const restrict'
// CHECK-NEXT: | | | | |-VarDecl {{.*}} <line:18:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <line:19:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | |-DeclRefExpr {{.*}} <line:18:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <line:19:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | |-AlwaysInlineAttr {{.*}} <<invalid sloc>> Implicit __forceinline
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <line:17:9> col:9 implicit .global_tid. 'const int'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .part_id. 'const int *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .privates. 'void *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .copy_fn. 'void (*const restrict)(void *const restrict, ...)'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .task_t. 'void *const'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:17:9) *const restrict'
// CHECK-NEXT: | | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | | |-FieldDecl {{.*}} <line:18:23> col:23 implicit 'int'
// CHECK-NEXT: | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | `-FieldDecl {{.*}} <line:19:25> col:25 implicit 'int'
// CHECK-NEXT: | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | |-CapturedStmt {{.*}} <line:18:3, line:20:7>
// CHECK-NEXT: | | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | |-ForStmt {{.*}} <line:18:3, line:20:7>
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:18:8, col:17>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | | `-ForStmt {{.*}} <line:19:5, line:20:7> openmp_structured_block
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:19:10, col:19>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | `-NullStmt {{.*}} <line:20:7>
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <line:17:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:17:9) *const restrict'
// CHECK-NEXT: | | | | |-VarDecl {{.*}} <line:18:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <line:19:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | |-DeclRefExpr {{.*}} <line:18:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <line:19:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <line:17:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:17:9) *const restrict'
// CHECK-NEXT: | | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | | |-FieldDecl {{.*}} <line:18:23> col:23 implicit 'int'
// CHECK-NEXT: | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | `-FieldDecl {{.*}} <line:19:25> col:25 implicit 'int'
// CHECK-NEXT: | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | |-ForStmt {{.*}} <line:18:3, line:20:7>
// CHECK-NEXT: | | | |-DeclStmt {{.*}} <line:18:8, col:17>
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | |-<<<NULL>>>
// CHECK-NEXT: | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | `-ForStmt {{.*}} <line:19:5, line:20:7> openmp_structured_block
// CHECK-NEXT: | | | |-DeclStmt {{.*}} <line:19:10, col:19>
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | |-<<<NULL>>>
// CHECK-NEXT: | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | `-NullStmt {{.*}} <line:20:7>
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <line:17:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:17:9) *const restrict'
// CHECK-NEXT: | | |-VarDecl {{.*}} <line:18:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | `-VarDecl {{.*}} <line:19:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | |-DeclRefExpr {{.*}} <line:18:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | `-DeclRefExpr {{.*}} <line:19:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: |-FunctionDecl {{.*}} <line:23:1, line:28:1> line:23:6 test_four 'void (int, int)'
// CHECK-NEXT: | |-ParmVarDecl {{.*}} <col:16, col:20> col:20 used x 'int'
// CHECK-NEXT: | |-ParmVarDecl {{.*}} <col:23, col:27> col:27 used y 'int'
// CHECK-NEXT: | `-CompoundStmt {{.*}} <col:30, line:28:1>
// CHECK-NEXT: | `-OMPTargetParallelForSimdDirective {{.*}} <line:24:9, col:49>
// CHECK-NEXT: | |-OMPCollapseClause {{.*}} <col:38, col:48>
// CHECK-NEXT: | | `-ConstantExpr {{.*}} <col:47> 'int'
// CHECK-NEXT: | | `-IntegerLiteral {{.*}} <col:47> 'int' 2
// CHECK-NEXT: | |-OMPFirstprivateClause {{.*}} <<invalid sloc>> <implicit>
// CHECK-NEXT: | | |-DeclRefExpr {{.*}} <line:25:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | `-DeclRefExpr {{.*}} <line:26:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | `-CapturedStmt {{.*}} <line:25:3, line:27:7>
// CHECK-NEXT: | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | |-CapturedStmt {{.*}} <line:25:3, line:27:7>
// CHECK-NEXT: | | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | |-CapturedStmt {{.*}} <line:25:3, line:27:7>
// CHECK-NEXT: | | | | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | | | |-ForStmt {{.*}} <line:25:3, line:27:7>
// CHECK-NEXT: | | | | | | | |-DeclStmt {{.*}} <line:25:8, col:17>
// CHECK-NEXT: | | | | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | | `-ForStmt {{.*}} <line:26:5, line:27:7>
// CHECK-NEXT: | | | | | | | |-DeclStmt {{.*}} <line:26:10, col:19>
// CHECK-NEXT: | | | | | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | | | | `-NullStmt {{.*}} <line:27:7> openmp_structured_block
// CHECK-NEXT: | | | | | | |-ImplicitParamDecl {{.*}} <line:24:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:24:9) *const restrict'
// CHECK-NEXT: | | | | | | |-VarDecl {{.*}} <line:25:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <line:26:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | |-DeclRefExpr {{.*}} <line:25:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <line:26:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <line:24:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:24:9) *const restrict'
// CHECK-NEXT: | | | | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | | | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | | | | |-FieldDecl {{.*}} <line:25:23> col:23 implicit 'int'
// CHECK-NEXT: | | | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | | | `-FieldDecl {{.*}} <line:26:25> col:25 implicit 'int'
// CHECK-NEXT: | | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | |-ForStmt {{.*}} <line:25:3, line:27:7>
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:25:8, col:17>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | `-ForStmt {{.*}} <line:26:5, line:27:7>
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:26:10, col:19>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | | `-NullStmt {{.*}} <line:27:7> openmp_structured_block
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <line:24:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:24:9) *const restrict'
// CHECK-NEXT: | | | | |-VarDecl {{.*}} <line:25:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <line:26:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | |-DeclRefExpr {{.*}} <line:25:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <line:26:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | |-AlwaysInlineAttr {{.*}} <<invalid sloc>> Implicit __forceinline
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <line:24:9> col:9 implicit .global_tid. 'const int'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .part_id. 'const int *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .privates. 'void *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .copy_fn. 'void (*const restrict)(void *const restrict, ...)'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .task_t. 'void *const'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:24:9) *const restrict'
// CHECK-NEXT: | | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | | |-FieldDecl {{.*}} <line:25:23> col:23 implicit 'int'
// CHECK-NEXT: | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | `-FieldDecl {{.*}} <line:26:25> col:25 implicit 'int'
// CHECK-NEXT: | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | |-CapturedStmt {{.*}} <line:25:3, line:27:7>
// CHECK-NEXT: | | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | |-ForStmt {{.*}} <line:25:3, line:27:7>
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:25:8, col:17>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | `-ForStmt {{.*}} <line:26:5, line:27:7>
// CHECK-NEXT: | | | | | |-DeclStmt {{.*}} <line:26:10, col:19>
// CHECK-NEXT: | | | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | | `-NullStmt {{.*}} <line:27:7> openmp_structured_block
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <line:24:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:24:9) *const restrict'
// CHECK-NEXT: | | | | |-VarDecl {{.*}} <line:25:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <line:26:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | |-DeclRefExpr {{.*}} <line:25:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <line:26:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <line:24:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:24:9) *const restrict'
// CHECK-NEXT: | | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | | |-FieldDecl {{.*}} <line:25:23> col:23 implicit 'int'
// CHECK-NEXT: | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | `-FieldDecl {{.*}} <line:26:25> col:25 implicit 'int'
// CHECK-NEXT: | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | |-ForStmt {{.*}} <line:25:3, line:27:7>
// CHECK-NEXT: | | | |-DeclStmt {{.*}} <line:25:8, col:17>
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | |-<<<NULL>>>
// CHECK-NEXT: | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | `-ForStmt {{.*}} <line:26:5, line:27:7>
// CHECK-NEXT: | | | |-DeclStmt {{.*}} <line:26:10, col:19>
// CHECK-NEXT: | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | |-<<<NULL>>>
// CHECK-NEXT: | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | `-NullStmt {{.*}} <line:27:7> openmp_structured_block
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <line:24:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:24:9) *const restrict'
// CHECK-NEXT: | | |-VarDecl {{.*}} <line:25:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | `-VarDecl {{.*}} <line:26:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | |-DeclRefExpr {{.*}} <line:25:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | `-DeclRefExpr {{.*}} <line:26:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: `-FunctionDecl {{.*}} <line:30:1, line:36:1> line:30:6 test_five 'void (int, int, int)'
// CHECK-NEXT: |-ParmVarDecl {{.*}} <col:16, col:20> col:20 used x 'int'
// CHECK-NEXT: |-ParmVarDecl {{.*}} <col:23, col:27> col:27 used y 'int'
// CHECK-NEXT: |-ParmVarDecl {{.*}} <col:30, col:34> col:34 used z 'int'
// CHECK-NEXT: `-CompoundStmt {{.*}} <col:37, line:36:1>
// CHECK-NEXT: `-OMPTargetParallelForSimdDirective {{.*}} <line:31:9, col:49>
// CHECK-NEXT: |-OMPCollapseClause {{.*}} <col:38, col:48>
// CHECK-NEXT: | `-ConstantExpr {{.*}} <col:47> 'int'
// CHECK-NEXT: | `-IntegerLiteral {{.*}} <col:47> 'int' 2
// CHECK-NEXT: |-OMPFirstprivateClause {{.*}} <<invalid sloc>> <implicit>
// CHECK-NEXT: | |-DeclRefExpr {{.*}} <line:32:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | |-DeclRefExpr {{.*}} <line:33:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | `-DeclRefExpr {{.*}} <line:34:27> 'int' lvalue ParmVar {{.*}} 'z' 'int'
// CHECK-NEXT: `-CapturedStmt {{.*}} <line:32:3, line:35:9>
// CHECK-NEXT: |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | |-CapturedStmt {{.*}} <line:32:3, line:35:9>
// CHECK-NEXT: | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | |-CapturedStmt {{.*}} <line:32:3, line:35:9>
// CHECK-NEXT: | | | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | | | |-ForStmt {{.*}} <line:32:3, line:35:9>
// CHECK-NEXT: | | | | | | |-DeclStmt {{.*}} <line:32:8, col:17>
// CHECK-NEXT: | | | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-ForStmt {{.*}} <line:33:5, line:35:9>
// CHECK-NEXT: | | | | | | |-DeclStmt {{.*}} <line:33:10, col:19>
// CHECK-NEXT: | | | | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | | | `-ForStmt {{.*}} <line:34:7, line:35:9> openmp_structured_block
// CHECK-NEXT: | | | | | | |-DeclStmt {{.*}} <line:34:12, col:21>
// CHECK-NEXT: | | | | | | | `-VarDecl {{.*}} <col:12, col:20> col:16 used i 'int' cinit
// CHECK-NEXT: | | | | | | | `-IntegerLiteral {{.*}} <col:20> 'int' 0
// CHECK-NEXT: | | | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | | | |-BinaryOperator {{.*}} <col:23, col:27> 'int' '<'
// CHECK-NEXT: | | | | | | | |-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | | `-ImplicitCastExpr {{.*}} <col:27> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:27> 'int' lvalue ParmVar {{.*}} 'z' 'int'
// CHECK-NEXT: | | | | | | |-UnaryOperator {{.*}} <col:30, col:31> 'int' postfix '++'
// CHECK-NEXT: | | | | | | | `-DeclRefExpr {{.*}} <col:30> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | | `-NullStmt {{.*}} <line:35:9>
// CHECK-NEXT: | | | | | |-ImplicitParamDecl {{.*}} <line:31:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:31:9) *const restrict'
// CHECK-NEXT: | | | | | |-VarDecl {{.*}} <line:32:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | | |-VarDecl {{.*}} <line:33:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | | `-VarDecl {{.*}} <line:34:12, col:20> col:16 used i 'int' cinit
// CHECK-NEXT: | | | | | `-IntegerLiteral {{.*}} <col:20> 'int' 0
// CHECK-NEXT: | | | | |-DeclRefExpr {{.*}} <line:32:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | |-DeclRefExpr {{.*}} <line:33:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <line:34:27> 'int' lvalue ParmVar {{.*}} 'z' 'int'
// CHECK-NEXT: | | | |-ImplicitParamDecl {{.*}} <line:31:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:31:9) *const restrict'
// CHECK-NEXT: | | | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | | | |-FieldDecl {{.*}} <line:32:23> col:23 implicit 'int'
// CHECK-NEXT: | | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | | |-FieldDecl {{.*}} <line:33:25> col:25 implicit 'int'
// CHECK-NEXT: | | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | | `-FieldDecl {{.*}} <line:34:27> col:27 implicit 'int'
// CHECK-NEXT: | | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | |-ForStmt {{.*}} <line:32:3, line:35:9>
// CHECK-NEXT: | | | | |-DeclStmt {{.*}} <line:32:8, col:17>
// CHECK-NEXT: | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | `-ForStmt {{.*}} <line:33:5, line:35:9>
// CHECK-NEXT: | | | | |-DeclStmt {{.*}} <line:33:10, col:19>
// CHECK-NEXT: | | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | `-ForStmt {{.*}} <line:34:7, line:35:9> openmp_structured_block
// CHECK-NEXT: | | | | |-DeclStmt {{.*}} <line:34:12, col:21>
// CHECK-NEXT: | | | | | `-VarDecl {{.*}} <col:12, col:20> col:16 used i 'int' cinit
// CHECK-NEXT: | | | | | `-IntegerLiteral {{.*}} <col:20> 'int' 0
// CHECK-NEXT: | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | |-BinaryOperator {{.*}} <col:23, col:27> 'int' '<'
// CHECK-NEXT: | | | | | |-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | `-ImplicitCastExpr {{.*}} <col:27> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:27> 'int' lvalue ParmVar {{.*}} 'z' 'int'
// CHECK-NEXT: | | | | |-UnaryOperator {{.*}} <col:30, col:31> 'int' postfix '++'
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:30> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | `-NullStmt {{.*}} <line:35:9>
// CHECK-NEXT: | | | |-ImplicitParamDecl {{.*}} <line:31:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:31:9) *const restrict'
// CHECK-NEXT: | | | |-VarDecl {{.*}} <line:32:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | |-VarDecl {{.*}} <line:33:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | `-VarDecl {{.*}} <line:34:12, col:20> col:16 used i 'int' cinit
// CHECK-NEXT: | | | `-IntegerLiteral {{.*}} <col:20> 'int' 0
// CHECK-NEXT: | | |-DeclRefExpr {{.*}} <line:32:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | |-DeclRefExpr {{.*}} <line:33:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | `-DeclRefExpr {{.*}} <line:34:27> 'int' lvalue ParmVar {{.*}} 'z' 'int'
// CHECK-NEXT: | |-AlwaysInlineAttr {{.*}} <<invalid sloc>> Implicit __forceinline
// CHECK-NEXT: | |-ImplicitParamDecl {{.*}} <line:31:9> col:9 implicit .global_tid. 'const int'
// CHECK-NEXT: | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .part_id. 'const int *const restrict'
// CHECK-NEXT: | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .privates. 'void *const restrict'
// CHECK-NEXT: | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .copy_fn. 'void (*const restrict)(void *const restrict, ...)'
// CHECK-NEXT: | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .task_t. 'void *const'
// CHECK-NEXT: | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:31:9) *const restrict'
// CHECK-NEXT: | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | |-FieldDecl {{.*}} <line:32:23> col:23 implicit 'int'
// CHECK-NEXT: | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | |-FieldDecl {{.*}} <line:33:25> col:25 implicit 'int'
// CHECK-NEXT: | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | `-FieldDecl {{.*}} <line:34:27> col:27 implicit 'int'
// CHECK-NEXT: | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | |-CapturedStmt {{.*}} <line:32:3, line:35:9>
// CHECK-NEXT: | | |-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | | | |-ForStmt {{.*}} <line:32:3, line:35:9>
// CHECK-NEXT: | | | | |-DeclStmt {{.*}} <line:32:8, col:17>
// CHECK-NEXT: | | | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | `-ForStmt {{.*}} <line:33:5, line:35:9>
// CHECK-NEXT: | | | | |-DeclStmt {{.*}} <line:33:10, col:19>
// CHECK-NEXT: | | | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | | | `-ForStmt {{.*}} <line:34:7, line:35:9> openmp_structured_block
// CHECK-NEXT: | | | | |-DeclStmt {{.*}} <line:34:12, col:21>
// CHECK-NEXT: | | | | | `-VarDecl {{.*}} <col:12, col:20> col:16 used i 'int' cinit
// CHECK-NEXT: | | | | | `-IntegerLiteral {{.*}} <col:20> 'int' 0
// CHECK-NEXT: | | | | |-<<<NULL>>>
// CHECK-NEXT: | | | | |-BinaryOperator {{.*}} <col:23, col:27> 'int' '<'
// CHECK-NEXT: | | | | | |-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | | `-ImplicitCastExpr {{.*}} <col:27> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:27> 'int' lvalue ParmVar {{.*}} 'z' 'int'
// CHECK-NEXT: | | | | |-UnaryOperator {{.*}} <col:30, col:31> 'int' postfix '++'
// CHECK-NEXT: | | | | | `-DeclRefExpr {{.*}} <col:30> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | | `-NullStmt {{.*}} <line:35:9>
// CHECK-NEXT: | | | |-ImplicitParamDecl {{.*}} <line:31:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | | | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:31:9) *const restrict'
// CHECK-NEXT: | | | |-VarDecl {{.*}} <line:32:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | | |-VarDecl {{.*}} <line:33:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | | `-VarDecl {{.*}} <line:34:12, col:20> col:16 used i 'int' cinit
// CHECK-NEXT: | | | `-IntegerLiteral {{.*}} <col:20> 'int' 0
// CHECK-NEXT: | | |-DeclRefExpr {{.*}} <line:32:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | |-DeclRefExpr {{.*}} <line:33:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | `-DeclRefExpr {{.*}} <line:34:27> 'int' lvalue ParmVar {{.*}} 'z' 'int'
// CHECK-NEXT: | |-ImplicitParamDecl {{.*}} <line:31:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:31:9) *const restrict'
// CHECK-NEXT: | |-RecordDecl {{.*}} <col:9> col:9 implicit struct definition
// CHECK-NEXT: | | |-CapturedRecordAttr {{.*}} <<invalid sloc>> Implicit
// CHECK-NEXT: | | |-FieldDecl {{.*}} <line:32:23> col:23 implicit 'int'
// CHECK-NEXT: | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | |-FieldDecl {{.*}} <line:33:25> col:25 implicit 'int'
// CHECK-NEXT: | | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | | `-FieldDecl {{.*}} <line:34:27> col:27 implicit 'int'
// CHECK-NEXT: | | `-OMPCaptureKindAttr {{.*}} <<invalid sloc>> Implicit 9
// CHECK-NEXT: | `-CapturedDecl {{.*}} <<invalid sloc>> <invalid sloc> nothrow
// CHECK-NEXT: | |-ForStmt {{.*}} <line:32:3, line:35:9>
// CHECK-NEXT: | | |-DeclStmt {{.*}} <line:32:8, col:17>
// CHECK-NEXT: | | | `-VarDecl {{.*}} <col:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | | |-<<<NULL>>>
// CHECK-NEXT: | | |-BinaryOperator {{.*}} <col:19, col:23> 'int' '<'
// CHECK-NEXT: | | | |-ImplicitCastExpr {{.*}} <col:19> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:19> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | `-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: | | |-UnaryOperator {{.*}} <col:26, col:27> 'int' postfix '++'
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <col:26> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | `-ForStmt {{.*}} <line:33:5, line:35:9>
// CHECK-NEXT: | | |-DeclStmt {{.*}} <line:33:10, col:19>
// CHECK-NEXT: | | | `-VarDecl {{.*}} <col:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | | |-<<<NULL>>>
// CHECK-NEXT: | | |-BinaryOperator {{.*}} <col:21, col:25> 'int' '<'
// CHECK-NEXT: | | | |-ImplicitCastExpr {{.*}} <col:21> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:21> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | `-ImplicitCastExpr {{.*}} <col:25> 'int' <LValueToRValue>
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <col:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: | | |-UnaryOperator {{.*}} <col:28, col:29> 'int' postfix '++'
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <col:28> 'int' lvalue Var {{.*}} 'i' 'int'
[clang][OpeMP] Model OpenMP structured-block in AST (PR40563) Summary: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 3: ``` structured block For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct. COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks. ``` ``` 2.1 Directive Format Some executable directives include a structured block. A structured block: • may contain infinite loops where the point of exit is never reached; • may halt due to an IEEE exception; • may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++); • may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and Restrictions Restrictions to structured blocks are as follows: • Entry to a structured block must not be the result of a branch. • The point of exit cannot be a branch out of the structured block. C / C++ • The point of entry to a structured block must not be a call to setjmp(). • longjmp() and throw() must not violate the entry/exit criteria. ``` Of particular note here is the fact that OpenMP structured blocks are as-if `noexcept`, in the same sense as with the normal `noexcept` functions in C++. I.e. if throw happens, and it attempts to travel out of the `noexcept` function (here: out of the current structured-block), then the program terminates. Now, one of course can say that since it is explicitly prohibited by the Specification, then any and all programs that violate this Specification contain undefined behavior, and are unspecified, and thus no one should care about them. Just don't write broken code /s But i'm not sure this is a reasonable approach. I have personally had oss-fuzz issues of this origin - exception thrown inside of an OpenMP structured-block that is not caught, thus causing program termination. This issue isn't all that hard to catch, it's not any particularly different from diagnosing the same situation with the normal `noexcept` function. Now, clang static analyzer does not presently model exceptions. But clang-tidy has a simplisic [[ https://clang.llvm.org/extra/clang-tidy/checks/bugprone-exception-escape.html | bugprone-exception-escape ]] check, and it is even refactored as a `ExceptionAnalyzer` class for reuse. So it would be trivial to use that analyzer to check for exceptions escaping out of OpenMP structured blocks. (D59466) All that sounds too great to be true. Indeed, there is a caveat. Presently, it's practically impossible to do. To check a OpenMP structured block you need to somehow 'get' the OpenMP structured block, and you can't because it's simply not modelled in AST. `CapturedStmt`/`CapturedDecl` is not it's representation. Now, it is of course possible to write e.g. some AST matcher that would e.g. match every OpenMP executable directive, and then return the whatever `Stmt` is the structured block of said executable directive, if any. But i said //practically//. This isn't practical for the following reasons: 1. This **will** bitrot. That matcher will need to be kept up-to-date, and refreshed with every new OpenMP spec version. 2. Every single piece of code that would want that knowledge would need to have such matcher. Well, okay, if it is an AST matcher, it could be shared. But then you still have `RecursiveASTVisitor` and friends. `2 > 1`, so now you have code duplication. So it would be reasonable (and is fully within clang AST spirit) to not force every single consumer to do that work, but instead store that knowledge in the correct, and appropriate place - AST, class structure. Now, there is another hoop we need to get through. It isn't fully obvious //how// to model this. The best solution would of course be to simply add a `OMPStructuredBlock` transparent node. It would be optimal, it would give us two properties: * Given this `OMPExecutableDirective`, what's it OpenMP structured block? * It is trivial to check whether the `Stmt*` is a OpenMP structured block (`isa<OMPStructuredBlock>(ptr)`) But OpenMP structured block isn't **necessarily** the first, direct child of `OMP*Directive`. (even ignoring the clang's `CapturedStmt`/`CapturedDecl` that were inserted inbetween). So i'm not sure whether or not we could re-create AST statements after they were already created? There would be other costs to a new AST node: https://bugs.llvm.org/show_bug.cgi?id=40563#c12 ``` 1. You will need to break the representation of loops. The body should be replaced by the "structured block" entity. 2. You will need to support serialization/deserialization. 3. You will need to support template instantiation. 4. You will need to support codegen and take this new construct to account in each OpenMP directive. ``` Instead, there **is** an functionally-equivalent, alternative solution, consisting of two parts. Part 1: * Add a member function `isStandaloneDirective()` to the `OMPExecutableDirective` class, that will tell whether this directive is stand-alone or not, as per the spec. We need it because we can't just check for the existance of associated statements, see code comment. * Add a member function `getStructuredBlock()` to the OMPExecutableDirective` class itself, that assert that this is not a stand-alone directive, and either return the correct loop body if this is a loop-like directive, or the captured statement. This way, given an `OMPExecutableDirective`, we can get it's structured block. Also, since the knowledge is ingrained into the clang OpenMP implementation, it will not cause any duplication, and //hopefully// won't bitrot. Great we achieved 1 of 2 properties of `OMPStructuredBlock` approach. Thus, there is a second part needed: * How can we check whether a given `Stmt*` is `OMPStructuredBlock`? Well, we can't really, in general. I can see this workaround: ``` class FunctionASTVisitor : public RecursiveASTVisitor<FunctionASTVisitor> { using Base = RecursiveASTVisitor<FunctionASTVisitor>; public: bool VisitOMPExecDir(OMPExecDir *D) { OmpStructuredStmts.emplace_back(D.getStructuredStmt()); } bool VisitSOMETHINGELSE(???) { if(InOmpStructuredStmt) HI! } bool TraverseStmt(Stmt *Node) { if (!Node) return Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) ++InOmpStructuredStmt; Base::TraverseStmt(Node); if (OmpStructuredStmts.back() == Node) { OmpStructuredStmts.pop_back(); --InOmpStructuredStmt; } return true; } std::vector<Stmt*> OmpStructuredStmts; int InOmpStructuredStmt = 0; }; ``` But i really don't see using it in practice. It's just too intrusive; and again, requires knowledge duplication. .. but no. The solution lies right on the ground. Why don't we simply store this `i'm a openmp structured block` in the bitfield of the `Stmt` itself? This does not appear to have any impact on the memory footprint of the clang AST, since it's just a single extra bit in the bitfield. At least the static assertions don't fail. Thus, indeed, we can achieve both of the properties without a new AST node. We can cheaply set that bit right in sema, at the end of `Sema::ActOnOpenMPExecutableDirective()`, by just calling the `getStructuredBlock()` that we just added. Test coverage that demonstrates all this has been added. This isn't as great with serialization though. Most of it does not use abbrevs, so we do end up paying the full price (4 bytes?) instead of a single bit. That price, of course, can be reclaimed by using abbrevs. In fact, i suspect that //might// not just reclaim these bytes, but pack these PCH significantly. I'm not seeing a third solution. If there is one, it would be interesting to hear about it. ("just don't write code that would require `isa<OMPStructuredBlock>(ptr)`" is not a solution.) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40563 | PR40563 ]]. Reviewers: ABataev, rjmccall, hfinkel, rsmith, riccibruno, gribozavr Reviewed By: ABataev, gribozavr Subscribers: mgorny, aaron.ballman, steveire, guansong, jfb, jdoerfert, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D59214 llvm-svn: 356570
2019-03-21 00:32:36 +08:00
// CHECK-NEXT: | | `-ForStmt {{.*}} <line:34:7, line:35:9> openmp_structured_block
// CHECK-NEXT: | | |-DeclStmt {{.*}} <line:34:12, col:21>
// CHECK-NEXT: | | | `-VarDecl {{.*}} <col:12, col:20> col:16 used i 'int' cinit
// CHECK-NEXT: | | | `-IntegerLiteral {{.*}} <col:20> 'int' 0
// CHECK-NEXT: | | |-<<<NULL>>>
// CHECK-NEXT: | | |-BinaryOperator {{.*}} <col:23, col:27> 'int' '<'
// CHECK-NEXT: | | | |-ImplicitCastExpr {{.*}} <col:23> 'int' <LValueToRValue>
// CHECK-NEXT: | | | | `-DeclRefExpr {{.*}} <col:23> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | | `-ImplicitCastExpr {{.*}} <col:27> 'int' <LValueToRValue>
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <col:27> 'int' lvalue ParmVar {{.*}} 'z' 'int'
// CHECK-NEXT: | | |-UnaryOperator {{.*}} <col:30, col:31> 'int' postfix '++'
// CHECK-NEXT: | | | `-DeclRefExpr {{.*}} <col:30> 'int' lvalue Var {{.*}} 'i' 'int'
// CHECK-NEXT: | | `-NullStmt {{.*}} <line:35:9>
// CHECK-NEXT: | |-ImplicitParamDecl {{.*}} <line:31:9> col:9 implicit .global_tid. 'const int *const restrict'
// CHECK-NEXT: | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit .bound_tid. 'const int *const restrict'
// CHECK-NEXT: | |-ImplicitParamDecl {{.*}} <col:9> col:9 implicit __context 'struct (anonymous at {{.*}}ast-dump-openmp-target-parallel-for-simd.c:31:9) *const restrict'
// CHECK-NEXT: | |-VarDecl {{.*}} <line:32:8, col:16> col:12 used i 'int' cinit
// CHECK-NEXT: | | `-IntegerLiteral {{.*}} <col:16> 'int' 0
// CHECK-NEXT: | |-VarDecl {{.*}} <line:33:10, col:18> col:14 used i 'int' cinit
// CHECK-NEXT: | | `-IntegerLiteral {{.*}} <col:18> 'int' 0
// CHECK-NEXT: | `-VarDecl {{.*}} <line:34:12, col:20> col:16 used i 'int' cinit
// CHECK-NEXT: | `-IntegerLiteral {{.*}} <col:20> 'int' 0
// CHECK-NEXT: |-DeclRefExpr {{.*}} <line:32:23> 'int' lvalue ParmVar {{.*}} 'x' 'int'
// CHECK-NEXT: |-DeclRefExpr {{.*}} <line:33:25> 'int' lvalue ParmVar {{.*}} 'y' 'int'
// CHECK-NEXT: `-DeclRefExpr {{.*}} <line:34:27> 'int' lvalue ParmVar {{.*}} 'z' 'int'