[flang] Add the proposal document and rationale for the internal naming module that was previously added.

Summary:
This document describes how uniquing of internal names is done. This
name uniquing is done to support the constraints and invariants of the FIR
dialect of MLIR.

Reviewers: jeanPerier, mehdi_amini, DavidTruby, jdoerfert, sscalpone, kiranchandramohan

Reviewed By: jeanPerier, sscalpone, kiranchandramohan

Subscribers: tskeith, kiranchandramohan, rriddle, llvm-commits

Tags: #llvm, #flang

Differential Revision: https://reviews.llvm.org/D79089
This commit is contained in:
Eric Schweitz 2020-04-29 07:08:17 -07:00
parent 5d46e4b0da
commit 7875362986
1 changed files with 118 additions and 0 deletions

View File

@ -0,0 +1,118 @@
## Bijective Internal Name Uniquing
FIR has a flat namespace. No two objects may have the same name at
the module level. (These would be functions, globals, etc.)
This necessitates some sort of encoding scheme to unique
symbols from the front-end into FIR.
Another requirement is
to be able to reverse these unique names and recover the associated
symbol in the symbol table.
Fortran is case insensitive, which allows the compiler to convert the
user's identifiers to all lower case. Such a universal conversion implies
that all upper case letters are available for use in uniquing.
### Prefix `_Q`
All uniqued names have the prefix sequence `_Q` to indicate the name has
been uniqued. (Q is chosen because it is a
[low frequency letter](http://pi.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html)
in English.)
### Scope Building
Symbols can be scoped by the module, submodule, or procedure that contains
that symbol. After the `_Q` sigil, names are constructed from outermost to
innermost scope as
* Module name prefixed with `M`
* Submodule name prefixed with `S`
* Procedure name prefixed with `F`
Given:
```
submodule (mod:s1mod) s2mod
...
subroutine sub
...
contains
function fun
```
The uniqued name of `fun` becomes:
```
_QMmodSs1modSs2modFsubPfun
```
### Common blocks
* A common block name will be prefixed with `B`
### Module scope global data
* A global data entity is prefixed with `E`
* A global entity that is constant (parameter) will be prefixed with `EC`
### Procedures/Subprograms
* A procedure/subprogram is prefixed with `P`
Given:
```
subroutine sub
```
The uniqued name of `sub` becomes:
```
_QPsub
```
### Derived types and related
* A derived type is prefixed with `T`
* If a derived type has KIND parameters, they are listed in a consistent
canonical order where each takes the form `Ki` and where _i_ is the
compile-time constant value. (All type parameters are integer.) If _i_
is a negative value, the prefix `KN` will be used and _i_ will reflect
the magnitude of the value.
Given:
```
module mymodule
type mytype
integer :: member
end type
...
```
The uniqued name of `mytype` becomes:
```
_QMmymoduleTmytype
```
Given:
```
type yourtype(k1,k2)
integer, kind :: k1, k2
real :: mem1
complex :: mem2
end type
```
The uniqued name of `yourtype` where `k1=4` and `k2=-6` (at compile-time):
```
_QTyourtypeK4KN6
```
* A derived type dispatch table is prefixed with `D`. The dispatch table
for `type t` would be `_QDTt`
* A type descriptor instance is prefixed with `C`. Intrinsic types can
be encoded with their names and kinds. The type descriptor for the
type `yourtype` above would be `_QCTyourtypeK4KN6`. The type
descriptor for `REAL(4)` would be `_QCrealK4`.
### Compiler generated names
Compiler generated names do not have to be mapped back to Fortran. These
names will be prefixed with `_QQ` and followed by a unique compiler
generated identifier. There is, of course, no mapping back to a symbol
derived from the input source in this case as no such symbol exists.