2020-09-11 01:34:37 +08:00
|
|
|
<!--===- docs/Semantics.md
|
|
|
|
|
|
|
|
Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
|
|
|
|
|
|
-->
|
|
|
|
|
2019-05-15 11:01:07 +08:00
|
|
|
# Semantic Analysis
|
|
|
|
|
2020-09-11 21:17:19 +08:00
|
|
|
```eval_rst
|
|
|
|
.. contents::
|
|
|
|
:local:
|
|
|
|
```
|
|
|
|
|
2019-05-16 03:47:23 +08:00
|
|
|
The semantic analysis pass determines if a syntactically correct Fortran
|
|
|
|
program is is legal by enforcing the constraints of the language.
|
|
|
|
|
|
|
|
The input is a parse tree with a `Program` node at the root;
|
|
|
|
and a "cooked" character stream, a contiguous stream of characters
|
|
|
|
containing a normalized form of the Fortran source.
|
|
|
|
|
|
|
|
The semantic analysis pass takes a parse tree for a syntactically
|
|
|
|
correct Fortran program and determines whether it is legal by enforcing
|
2019-05-15 11:01:07 +08:00
|
|
|
the constraints of the language.
|
2018-03-21 06:47:52 +08:00
|
|
|
|
2018-03-23 02:53:59 +08:00
|
|
|
If the program is not legal, the results of the semantic pass will be a list of
|
|
|
|
errors associated with the program.
|
2018-03-21 06:47:52 +08:00
|
|
|
|
2019-05-15 11:01:07 +08:00
|
|
|
If the program is legal, the semantic pass will produce a (possibly modified)
|
|
|
|
parse tree for the semantically correct program with each name mapped to a symbol
|
|
|
|
and each expression fully analyzed.
|
|
|
|
|
2019-05-16 03:47:23 +08:00
|
|
|
All user errors are detected either prior to or during semantic analysis.
|
2019-05-15 11:01:07 +08:00
|
|
|
After it completes successfully the program should compile with no error messages.
|
|
|
|
There may still be warnings or informational messages.
|
|
|
|
|
|
|
|
## Phases of Semantic Analysis
|
|
|
|
|
|
|
|
1. [Validate labels](#validate-labels) -
|
|
|
|
Check all constraints on labels and branches
|
|
|
|
2. [Rewrite DO loops](#rewrite-do-loops) -
|
|
|
|
Convert all occurrences of `LabelDoStmt` to `DoConstruct`.
|
|
|
|
3. [Name resolution](#name-resolution) -
|
|
|
|
Analyze names and declarations, build a tree of Scopes containing Symbols,
|
|
|
|
and fill in the `Name::symbol` data member in the parse tree
|
|
|
|
4. [Rewrite parse tree](#rewrite-parse-tree) -
|
|
|
|
Fix incorrect parses based on symbol information
|
|
|
|
5. [Expression analysis](#expression-analysis) -
|
|
|
|
Analyze all expressions in the parse tree and fill in `Expr::typedExpr` and
|
2019-05-16 03:47:23 +08:00
|
|
|
`Variable::typedExpr` with analyzed expressions; fix incorrect parses
|
|
|
|
based on the result of this analysis
|
2019-05-15 11:01:07 +08:00
|
|
|
6. [Statement semantics](#statement-semantics) -
|
|
|
|
Perform remaining semantic checks on the execution parts of subprograms
|
|
|
|
7. [Write module files](#write-module-files) -
|
|
|
|
If no errors have occurred, write out `.mod` files for modules and submodules
|
|
|
|
|
2019-05-16 03:47:23 +08:00
|
|
|
If phase 1 or phase 2 encounter an error on any of the program units,
|
|
|
|
compilation terminates. Otherwise, phases 3-6 are all performed even if
|
|
|
|
errors occur.
|
|
|
|
Module files are written (phase 7) only if there are no errors.
|
|
|
|
|
2019-05-15 11:01:07 +08:00
|
|
|
### Validate labels
|
|
|
|
|
|
|
|
Perform semantic checks related to labels and branches:
|
|
|
|
- check that any labels that are referenced are defined and in scope
|
|
|
|
- check branches into loop bodies
|
|
|
|
- check that labeled `DO` loops are properly nested
|
|
|
|
- check labels in data transfer statements
|
|
|
|
|
|
|
|
### Rewrite DO loops
|
|
|
|
|
2019-05-15 22:02:23 +08:00
|
|
|
This phase normalizes the parse tree by removing all unstructured `DO` loops
|
2019-05-15 11:01:07 +08:00
|
|
|
and replacing them with `DO` constructs.
|
|
|
|
|
|
|
|
### Name resolution
|
|
|
|
|
|
|
|
The name resolution phase walks the parse tree and constructs the symbol table.
|
|
|
|
|
|
|
|
The symbol table consists of a tree of `Scope` objects rooted at the global scope.
|
|
|
|
The global scope is owned by the `SemanticsContext` object.
|
|
|
|
It contains a `Scope` for each program unit in the compilation.
|
|
|
|
|
|
|
|
Each `Scope` in the scope tree contains child scopes representing other scopes
|
|
|
|
lexically nested in it.
|
|
|
|
Each `Scope` also contains a map of `CharBlock` to `Symbol` representing names
|
|
|
|
declared in that scope. (All names in the symbol table are represented as
|
|
|
|
`CharBlock` objects, i.e. as substrings of the cooked character stream.)
|
|
|
|
|
|
|
|
All `Symbol` objects are owned by the symbol table data structures.
|
|
|
|
They should be accessed as `Symbol *` or `Symbol &` outside of the symbol
|
|
|
|
table classes as they can't be created, copied, or moved.
|
|
|
|
The `Symbol` class has functions and data common across all symbols, and a
|
|
|
|
`details` field that contains more information specific to that type of symbol.
|
|
|
|
Many symbols also have types, represented by `DeclTypeSpec`.
|
|
|
|
Types are also owned by scopes.
|
|
|
|
|
|
|
|
Name resolution happens on the parse tree in this order:
|
|
|
|
1. Process the specification of a program unit:
|
2019-05-15 22:02:23 +08:00
|
|
|
1. Create a new scope for the unit
|
|
|
|
2. Create a symbol for each contained subprogram containing just the name
|
|
|
|
3. Process the opening statement of the unit (`ModuleStmt`, `FunctionStmt`, etc.)
|
|
|
|
4. Process the specification part of the unit
|
2019-05-15 11:01:07 +08:00
|
|
|
2. Apply the same process recursively to nested subprograms
|
|
|
|
3. Process the execution part of the program unit
|
|
|
|
4. Process the execution parts of nested subprograms recursively
|
|
|
|
|
|
|
|
After the completion of this phase, every `Name` corresponds to a `Symbol`
|
|
|
|
unless an error occurred.
|
|
|
|
|
|
|
|
### Rewrite parse tree
|
|
|
|
|
|
|
|
The parser cannot build a completely correct parse tree without symbol information.
|
2019-05-15 22:02:23 +08:00
|
|
|
This phase corrects mis-parses based on symbols:
|
2019-05-15 11:01:07 +08:00
|
|
|
- Array element assignments may be parsed as statement functions: `a(i) = ...`
|
|
|
|
- Namelist group names without `NML=` may be parsed as format expressions
|
|
|
|
- A file unit number expression may be parsed as a character variable
|
|
|
|
|
|
|
|
This phase also produces an internal error if it finds a `Name` that does not
|
|
|
|
have its `symbol` data member filled in. This error is suppressed if other
|
|
|
|
errors have occurred because in that case a `Name` corresponding to an erroneous
|
|
|
|
symbol may not be resolved.
|
|
|
|
|
|
|
|
### Expression analysis
|
|
|
|
|
|
|
|
Expressions that occur in the specification part are analyzed during name
|
|
|
|
resolution, for example, initial values, array bounds, type parameters.
|
|
|
|
Any remaining expressions are analyzed in this phase.
|
|
|
|
|
|
|
|
For each `Variable` and top-level `Expr` (i.e. one that is not nested below
|
|
|
|
another `Expr` in the parse tree) the analyzed form of the expression is saved
|
|
|
|
in the `typedExpr` data member. After this phase has completed, the analyzed
|
2019-05-16 03:47:23 +08:00
|
|
|
expression can be accessed using `semantics::GetExpr()`.
|
2019-05-15 11:01:07 +08:00
|
|
|
|
|
|
|
This phase also corrects mis-parses based on the result of expression analysis:
|
|
|
|
- An expression like `a(b)` is parsed as a function reference but may need
|
|
|
|
to be rewritten to an array element reference (if `a` is an object entity)
|
|
|
|
or to a structure constructor (if `a` is a derive type)
|
|
|
|
- An expression like `a(b:c)` is parsed as an array section but may need to be
|
|
|
|
rewritten as a substring if `a` is an object with type CHARACTER
|
|
|
|
|
|
|
|
### Statement semantics
|
|
|
|
|
|
|
|
Multiple independent checkers driven by the `SemanticsVisitor` framework
|
|
|
|
perform the remaining semantic checks.
|
2019-05-15 22:02:23 +08:00
|
|
|
By this phase, all names and expressions that can be successfully resolved
|
2019-05-15 11:01:07 +08:00
|
|
|
have been. But there may be names without symbols or expressions without
|
|
|
|
analyzed form if errors occurred earlier.
|
|
|
|
|
|
|
|
### Write module files
|
|
|
|
|
|
|
|
Separate compilation information is written out on successful compilation
|
|
|
|
of modules and submodules. These are used as input to name resolution
|
|
|
|
in program units that `USE` the modules.
|
|
|
|
|
|
|
|
Module files are stripped down Fortran source for the module.
|
|
|
|
Parts that aren't needed to compile dependent program units (e.g. action statements)
|
|
|
|
are omitted.
|
|
|
|
|
|
|
|
The module file for module `m` is named `m.mod` and the module file for
|
|
|
|
submodule `s` of module `m` is named `m-s.mod`.
|