llvm-project/flang/documentation/ControlFlowGraph.md

6.9 KiB

Concept

After a Fortran subprogram has been parsed, its names resolved, and all its semantic constraints successfully checked, the parse tree of its executable part is translated into another abstract representation, namely the control flow graph described in this note.

This second representation of the subprogram's executable part is suitable for analysis and incremental modification as the subprogram is readied for code generation. Many high-level Fortran features are implemented by rewriting portions of a subprogram's control flow graph in place.

Control Flow Graph

A control flow graph is a collection of simple (i.e., "non-extended") basic blocks that comprise straight-line sequences of actions with a single entry point and a single exit point, and a collection of directed flow edges (or arcs) denoting all possible transitions of control flow that may take place during execution from the end of one basic block to the beginning of another (or itself).

A block that has multiple distinct successors in the flow of control must end with an action that selects its successor.

The sequence of actions that constitutes a basic block may include references to user and library procedures. Subprogram calls with implicit control flow afterwards, namely alternate returns and END=/ERR= labels on input/output, will be lowered in translation to a representation that materializes that control flow into something similar to a computed GO TO or C language switch statement.

For convenience in optimization and to simplify the implementation of data flow confluence functions, we may choose to maintain the property that each flow arc is the sole outbound arc emanating from its originating block, the sole inbound arc arriving at its destination, or both. Empty blocks would inserted to "split" arcs when necessary to maintain this invariant property.

Fortran subprograms (other than internal subprograms) can have multiple entry points by using the obsolescent ENTRY statement. We will implement such subprograms by constructing a union of their dummy argument lists and using it as part of the definition of a new subroutine or function that can be called by each of the entry points, which are then all converted into wrapper routines that pass a selector value as an additional argument to drive a switch on entry to the new subprogram.

This transformation ensures that every subprogram's control flow graph has a well-defined START node.

Statement labels can be used in Fortran on any statement, but only the labels that decorate legal destinations of GO TO statements need to be implemented in the control flow graph. Specifically, non-executable statements like DATA, NAMELIST, and FORMAT statements will be extracted into data initialization records before or during the construction of the control flow graph, and will survive only as synonyms for CONTINUE.

Nests of multiple labeled DO loops that terminate on the same label will be have that label rewritten so that GO TO within the loop nest will arrive at the copy that most closely nests the context. The Fortran standard does not require us to do this, but XLF (at least) works this way.

Expressions and Statements (Operations and Actions)

Expressions are trees, not DAGs, of intrinsic operations, resolved function references, constant literals, and data designators.

Expression nodes are represented in the compiler in a type-safe manner. There is a distinct class or class template for every category of intrinsic type, templatized over its supported kind type parameter values.

Operands are storage-owning indirections to other instances of Expression, instances of constant values, and to representations of data and function references. These indirections are not nullable apart from the situation in which the operands of an expression are being removed for use elsewhere before the expression is destructed.

The ranks and the extents of the shapes of the results of expressions are explicit for constant arrays and recoverable by analysis otherwise.

Parenthesized subexpressions are scrupulously preserved in accordance with the Fortran standard.

The expression tree is meant to be a representation that is as equally well suited for use in the symbol table (e.g., for a bound of an explicit shape array) as it is for an action in a basic block of the control flow graph (e.g., the right hand side of an assignment statement).

Each basic block comprises a linear sequence of actions. These are represented as a doubly-linked list so that insertion and deletion can be done in constant time.

Only the last action in a basic block can represent a change to the flow of control.

Scope Transitions

Some of the various scopes of the symbol table are visible in the control flow graph as SCOPE ENTRY and SCOPE EXIT actions. SCOPE ENTRY actions are unique for their corresponding scopes, while SCOPE EXIT actions need not be so. It must be the case that any flow of control within the subprogram will enter only scopes that are not yet active, and exit only the most recently entered scope that has not yet been deactivated; i.e., when modeled by a push-down stack that is pushed by each traversal of a SCOPE ENTRY action, the entries of the stack are always distinct, only the scope at the top of the stack is ever popped by SCOPE EXIT, and the stack is empty when the subprogram terminates. Further, any references to resolved symbols must be to symbols whose scopes are active.

The DEALLOCATE actions and calls to FINAL procedures implied by scoped lifetimes will be explicit in the sequence of actions in the control flow graph.

Parallel regions might be partially represented by scopes, or by explicit operations similar to the scope entry and exit operations.

Data Flow Representation

The subprogram text will be in static single assignment form by the time the subprogram arrives at the bridge to the LLVM IR builder. Merge points are actions at the heads of basic blocks whose operands are definition points; definition points are actions at the ends of basic blocks whose operands are expression trees (which may refer to merge points).

Rewriting Transformations

I/O

Dynamic allocation

Array constructors

Derived type initialization, deallocation, and finalization

The machinery behind the complicated semantics of Fortran's derived types and ALLOCATABLE objects will be implemented in large part by the run time support library.

Actual argument temporaries

Array assignments, WHERE, and FORALL

Array operations have shape.

WHERE masks have shape. Their effects on array operations are by means of explicit MASK operands that are part of array assignment operations.

Intrinsic function and subroutine calls