llvm-project/flang/docs/ControlFlowGraph.md

<!--===- docs/ControlFlowGraph.md 
  
   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
   See https://llvm.org/LICENSE.txt for license information.
   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
  
-->

# Control Flow Graph

```eval_rst
.. contents::
   :local:
```

## Concept
After a Fortran subprogram has been parsed, its names resolved, and all its
semantic constraints successfully checked, the parse tree of its
executable part is translated into another abstract representation,
namely the _control flow graph_ described in this note.

This second representation of the subprogram's executable part is
suitable for analysis and incremental modification as the subprogram
is readied for code generation.
Many high-level Fortran features are implemented by rewriting portions
of a subprogram's control flow graph in place.

### Control Flow Graph
A _control flow graph_ is a collection of simple (_i.e.,_ "non-extended")
basic _blocks_ that comprise straight-line sequences of _actions_ with a
single entry point and a single exit point, and a collection of
directed flow _edges_ (or _arcs_) denoting all possible transitions of
control flow that may take place during execution from the end of
one basic block to the beginning of another (or itself).

A block that has multiple distinct successors in the flow of control
must end with an action that selects its successor.

The sequence of actions that constitutes a basic block may
include references to user and library procedures.
Subprogram calls with implicit control flow afterwards, namely
alternate returns and `END=`/`ERR=` labels on input/output,
will be lowered in translation to a representation that materializes
that control flow into something similar to a computed `GO TO` or
C language `switch` statement.

For convenience in optimization and to simplify the implementation of
data flow confluence functions, we may choose to maintain the
property that each flow arc is the sole outbound arc emanating from
its originating block, the sole inbound arc arriving at its destination,
or both.
Empty blocks would inserted to "split" arcs when necessary to maintain this
invariant property.

Fortran subprograms (other than internal subprograms) can have multiple
entry points by using the obsolescent `ENTRY` statement.
We will implement such subprograms by constructing a union
of their dummy argument lists and using it as part of the definition
of a new subroutine or function that can be called by each of
the entry points, which are then all converted into wrapper routines that
pass a selector value as an additional argument to drive a `switch` on entry
to the new subprogram.

This transformation ensures that every subprogram's control
flow graph has a well-defined `START` node.

Statement labels can be used in Fortran on any statement, but only
the labels that decorate legal destinations of `GO TO` statements
need to be implemented in the control flow graph.
Specifically, non-executable statements like `DATA`, `NAMELIST`, and
`FORMAT` statements will be extracted into data initialization
records before or during the construction of the control flow
graph, and will survive only as synonyms for `CONTINUE`.

Nests of multiple labeled `DO` loops that terminate on the same
label will be have that label rewritten so that `GO TO` within
the loop nest will arrive at the copy that most closely nests
the context.
The Fortran standard does not require us to do this, but XLF
(at least) works this way.

### Expressions and Statements (Operations and Actions)
Expressions are trees, not DAGs, of intrinsic operations,
resolved function references, constant literals, and
data designators.

Expression nodes are represented in the compiler in a type-safe manner.
There is a distinct class or class template for every category of
intrinsic type, templatized over its supported kind type parameter values.

Operands are storage-owning indirections to other instances
of `Expression`, instances of constant values, and to representations
of data and function references.
These indirections are not nullable apart from the situation in which
the operands of an expression are being removed for use elsewhere before
the expression is destructed.

The ranks and the extents of the shapes of the results of expressions
are explicit for constant arrays and recoverable by analysis otherwise.

Parenthesized subexpressions are scrupulously preserved in accordance with
the Fortran standard.

The expression tree is meant to be a representation that is
as equally well suited for use in the symbol table (e.g., for
a bound of an explicit shape array) as it is for an action
in a basic block of the control flow graph (e.g., the right
hand side of an assignment statement).

Each basic block comprises a linear sequence of _actions_.
These are represented as a doubly-linked list so that insertion
and deletion can be done in constant time.

Only the last action in a basic block can represent a change
to the flow of control.

### Scope Transitions
Some of the various scopes of the symbol table are visible in the control flow
graph as `SCOPE ENTRY` and `SCOPE EXIT` actions.
`SCOPE ENTRY` actions are unique for their corresponding scopes,
while `SCOPE EXIT` actions need not be so.
It must be the case that
any flow of control within the subprogram will enter only scopes that are
not yet active, and exit only the most recently entered scope that has not
yet been deactivated; i.e., when modeled by a push-down stack that is
pushed by each traversal of a `SCOPE ENTRY` action,
the entries of the stack are always distinct, only the scope at
the top of the stack is ever popped by `SCOPE EXIT`, and the stack is empty
when the subprogram terminates.
Further, any references to resolved symbols must be to symbols whose scopes
are active.

The `DEALLOCATE` actions and calls to `FINAL` procedures implied by scoped
lifetimes will be explicit in the sequence of actions in the control flow
graph.

Parallel regions might be partially represented by scopes, or by explicit
operations similar to the scope entry and exit operations.

### Data Flow Representation
The subprogram text will be in static single assignment form by the time the
subprogram arrives at the bridge to the LLVM IR builder.
Merge points are actions at the heads of basic blocks whose operands
are definition points; definition points are actions at the ends of
basic blocks whose operands are expression trees (which may refer to
merge points).

### Rewriting Transformations

#### I/O
#### Dynamic allocation
#### Array constructors

#### Derived type initialization, deallocation, and finalization
The machinery behind the complicated semantics of Fortran's derived types
and `ALLOCATABLE` objects will be implemented in large part by the run time
support library.

#### Actual argument temporaries
#### Array assignments, `WHERE`, and `FORALL`

Array operations have shape.

`WHERE` masks have shape.
Their effects on array operations are by means of explicit `MASK` operands that
are part of array assignment operations.

#### Intrinsic function and subroutine calls
Revert D86875 "[Flang][NFC] Remove license comments from files in docs/ folder." This reverts commit f787c9a90c69f, this was causing some build issues. 2020-09-11 01:34:37 +08:00			`<!--===- docs/ControlFlowGraph.md`

			`Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.`
			`See https://llvm.org/LICENSE.txt for license information.`
			`SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception`

			`-->`

[flang] Add new documentation main page Add a new index page to be the Flang documentation mainpage instead of Overview.md, which jumps straight into the compiler Design. The index file needs to be in .rst format to use the toctree directive to create table of contents. Also use the sphinx_markdown_tables extension to generate html tables form markdown. A number of additional style changes to the existing docs were needed to make this work well: * Convert all headings to the # style, which works better with toctree's titlesonly option. Ensure that there is only one top-level heading per document. * Add a title to documents that don't have one for rendering on the index. * Convert the grammar docs from .txt to .md. for better rendering * Fixed broken link to a section in another document - sphinx does not seem to support anchor links in markdown files. Depends on D87226 Reviewed By: sameeranjoshi Differential Revision: https://reviews.llvm.org/D87242 2020-09-11 21:17:19 +08:00			`# Control Flow Graph`

			```eval_rst
			`.. contents::`
			`:local:`
			```

[flang] Documentation Original-commit: flang-compiler/f18@27f6927053fa685b9366647fa02b595ffd830441 Reviewed-on: https://github.com/flang-compiler/f18/pull/111 Tree-same-pre-rewrite: false 2018-06-19 02:26:06 +08:00			`## Concept`
			`After a Fortran subprogram has been parsed, its names resolved, and all its`
			`semantic constraints successfully checked, the parse tree of its`
			`executable part is translated into another abstract representation,`
			`namely the _control flow graph_ described in this note.`

			`This second representation of the subprogram's executable part is`
			`suitable for analysis and incremental modification as the subprogram`
			`is readied for code generation.`
			`Many high-level Fortran features are implemented by rewriting portions`
			`of a subprogram's control flow graph in place.`

			`### Control Flow Graph`
			`A _control flow graph_ is a collection of simple (_i.e.,_ "non-extended")`
			`basic _blocks_ that comprise straight-line sequences of _actions_ with a`
			`single entry point and a single exit point, and a collection of`
[flang] Update ControlFlowGraph.md Original-commit: flang-compiler/f18@50ce01185f708c3d80fd87579d28adc72dcf1555 Reviewed-on: https://github.com/flang-compiler/f18/pull/301 2019-02-26 02:47:17 +08:00			`directed flow _edges_ (or _arcs_) denoting all possible transitions of`
[flang] begin expression representation Original-commit: flang-compiler/f18@627c057d5c8392914eb186e8f6dacab40b5a6666 Reviewed-on: https://github.com/flang-compiler/f18/pull/111 Tree-same-pre-rewrite: false 2018-06-19 03:09:56 +08:00			`control flow that may take place during execution from the end of`
			`one basic block to the beginning of another (or itself).`

			`A block that has multiple distinct successors in the flow of control`
			`must end with an action that selects its successor.`
[flang] Documentation Original-commit: flang-compiler/f18@27f6927053fa685b9366647fa02b595ffd830441 Reviewed-on: https://github.com/flang-compiler/f18/pull/111 Tree-same-pre-rewrite: false 2018-06-19 02:26:06 +08:00
			`The sequence of actions that constitutes a basic block may`
			`include references to user and library procedures.`
			`Subprogram calls with implicit control flow afterwards, namely`
			alternate returns and `END=`/`ERR=` labels on input/output,
			`will be lowered in translation to a representation that materializes`
			that control flow into something similar to a computed `GO TO` or
			C language `switch` statement.

			`For convenience in optimization and to simplify the implementation of`
			`data flow confluence functions, we may choose to maintain the`
			`property that each flow arc is the sole outbound arc emanating from`
			`its originating block, the sole inbound arc arriving at its destination,`
			`or both.`
			`Empty blocks would inserted to "split" arcs when necessary to maintain this`
			`invariant property.`

			`Fortran subprograms (other than internal subprograms) can have multiple`
			entry points by using the obsolescent `ENTRY` statement.
			`We will implement such subprograms by constructing a union`
			`of their dummy argument lists and using it as part of the definition`
			`of a new subroutine or function that can be called by each of`
			`the entry points, which are then all converted into wrapper routines that`
			pass a selector value as an additional argument to drive a `switch` on entry
			`to the new subprogram.`

			`This transformation ensures that every subprogram's control`
			flow graph has a well-defined `START` node.

			`Statement labels can be used in Fortran on any statement, but only`
			the labels that decorate legal destinations of `GO TO` statements
			`need to be implemented in the control flow graph.`
			Specifically, non-executable statements like `DATA`, `NAMELIST`, and
			`FORMAT` statements will be extracted into data initialization
			`records before or during the construction of the control flow`
			graph, and will survive only as synonyms for `CONTINUE`.

			Nests of multiple labeled `DO` loops that terminate on the same
			label will be have that label rewritten so that `GO TO` within
			`the loop nest will arrive at the copy that most closely nests`
			`the context.`
			`The Fortran standard does not require us to do this, but XLF`
			`(at least) works this way.`

			`### Expressions and Statements (Operations and Actions)`
			`Expressions are trees, not DAGs, of intrinsic operations,`
			`resolved function references, constant literals, and`
			`data designators.`

			`Expression nodes are represented in the compiler in a type-safe manner.`
[flang] Incorporate suggestions from code review. Original-commit: flang-compiler/f18@c66bd9ab15846afcaed129764b8c958f277e071f Reviewed-on: https://github.com/flang-compiler/f18/pull/111 2018-06-23 07:44:23 +08:00			`There is a distinct class or class template for every category of`
			`intrinsic type, templatized over its supported kind type parameter values.`

			`Operands are storage-owning indirections to other instances`
[flang] Documentation Original-commit: flang-compiler/f18@27f6927053fa685b9366647fa02b595ffd830441 Reviewed-on: https://github.com/flang-compiler/f18/pull/111 Tree-same-pre-rewrite: false 2018-06-19 02:26:06 +08:00			of `Expression`, instances of constant values, and to representations
			`of data and function references.`
[flang] Incorporate suggestions from code review. Original-commit: flang-compiler/f18@c66bd9ab15846afcaed129764b8c958f277e071f Reviewed-on: https://github.com/flang-compiler/f18/pull/111 2018-06-23 07:44:23 +08:00			`These indirections are not nullable apart from the situation in which`
			`the operands of an expression are being removed for use elsewhere before`
			`the expression is destructed.`
[flang] Documentation Original-commit: flang-compiler/f18@27f6927053fa685b9366647fa02b595ffd830441 Reviewed-on: https://github.com/flang-compiler/f18/pull/111 Tree-same-pre-rewrite: false 2018-06-19 02:26:06 +08:00
			`The ranks and the extents of the shapes of the results of expressions`
			`are explicit for constant arrays and recoverable by analysis otherwise.`

			`Parenthesized subexpressions are scrupulously preserved in accordance with`
			`the Fortran standard.`

			`The expression tree is meant to be a representation that is`
			`as equally well suited for use in the symbol table (e.g., for`
			`a bound of an explicit shape array) as it is for an action`
			`in a basic block of the control flow graph (e.g., the right`
			`hand side of an assignment statement).`

			`Each basic block comprises a linear sequence of _actions_.`
			`These are represented as a doubly-linked list so that insertion`
			`and deletion can be done in constant time.`

			`Only the last action in a basic block can represent a change`
			`to the flow of control.`

			`### Scope Transitions`
			`Some of the various scopes of the symbol table are visible in the control flow`
			graph as `SCOPE ENTRY` and `SCOPE EXIT` actions.
			`SCOPE ENTRY` actions are unique for their corresponding scopes,
			while `SCOPE EXIT` actions need not be so.
			`It must be the case that`
			`any flow of control within the subprogram will enter only scopes that are`
			`not yet active, and exit only the most recently entered scope that has not`
			`yet been deactivated; i.e., when modeled by a push-down stack that is`
			pushed by each traversal of a `SCOPE ENTRY` action,
			`the entries of the stack are always distinct, only the scope at`
			the top of the stack is ever popped by `SCOPE EXIT`, and the stack is empty
			`when the subprogram terminates.`
			`Further, any references to resolved symbols must be to symbols whose scopes`
			`are active.`

			The `DEALLOCATE` actions and calls to `FINAL` procedures implied by scoped
			`lifetimes will be explicit in the sequence of actions in the control flow`
			`graph.`

			`Parallel regions might be partially represented by scopes, or by explicit`
			`operations similar to the scope entry and exit operations.`

			`### Data Flow Representation`
			`The subprogram text will be in static single assignment form by the time the`
			`subprogram arrives at the bridge to the LLVM IR builder.`
			`Merge points are actions at the heads of basic blocks whose operands`
			`are definition points; definition points are actions at the ends of`
			`basic blocks whose operands are expression trees (which may refer to`
			`merge points).`

			`### Rewriting Transformations`

			`#### I/O`
			`#### Dynamic allocation`
			`#### Array constructors`

			`#### Derived type initialization, deallocation, and finalization`
			`The machinery behind the complicated semantics of Fortran's derived types`
			and `ALLOCATABLE` objects will be implemented in large part by the run time
			`support library.`

			`#### Actual argument temporaries`
			#### Array assignments, `WHERE`, and `FORALL`

			`Array operations have shape.`

			`WHERE` masks have shape.
			Their effects on array operations are by means of explicit `MASK` operands that
			`are part of array assignment operations.`

			`#### Intrinsic function and subroutine calls`