llvm-project/flang/documentation/ModFiles.md

5.9 KiB

Module Files

Module files hold information from a module that is necessary to compile program units that depend on the module.

Name

Module files must be searchable by module name. They are typically named <modulename>.mod. The advantage of using .mod is that it is consistent with other compilers so users will know what they are. Also, makefiles and scripts often use rm *.mod to clean up.

The disadvantage of using the same name as other compilers is that it is not clear which compiler created a .mod file and files from multiple compilers cannot be in the same directory. This could be solved by adding something between the module name and extension, e.g. <modulename>-f18.mod.

Format

Module files will be Fortran source. Declarations of all visible entities will be included, along with private entities that they depend on. Entity declarations that span multiple statements will be collapsed into a single type-declaration-statement. Executable statements will be omitted.

Header

There will be a header containing extra information that cannot be expressed in Fortran. This will take the form of a comment or directive at the beginning of the file.

If it's a comment, the module file reader would have to strip it out and perform ad hoc parsing on it. If it's a directive the compiler could parse it like other directives as part of the grammar. Processing the header before parsing might result in better error messages when the .mod file is invalid.

Regardless of whether the header is a comment or directive we can use the same string to introduce it: !mod$.

Information in the header:

  • Magic string to confirm it is an f18 .mod file
  • Version information: to indicate the version of the file format, in case it changes, and the version of the compiler that wrote the file, for diagnostics.
  • Checksum of the body of the current file
  • Modules we depend on and the checksum of their module file when the current module file is created
  • The source file that produced the .mod file? This could be used in error messages.

Body

The body will consist of minimal Fortran source for the required declarations. The order will match the order they first appeared in the source.

Some normalization will take place:

  • extraneous spaces will be removed
  • implicit types will be made explicit
  • attributes will be written in a consistent order
  • entity declarations will be combined into a single declaration
  • function return types specified in a prefix-spec will be replaced by an entity declaration
  • etc.

Symbols included

All public symbols from the module need to be included.

In addition, some private symbols are needed:

  • private types that appear in the public API
  • private components of non-private derived types
  • private parameters used in non-private declarations (initial values, kind parameters)
  • others?

It might be possible to anonymize private names if users don't want them exposed in the .mod file. (Currently they are readable in PGI .mod files.)

USE association

A module that contains USE statements needs them represented in the .mod file. Each use-associated symbol will be written as a separate use-only statement, possibly with renaming.

Alternatives:

  • Emit a single USE for each module, listing all of the symbols that were use-associated in the only-list.
  • Detect when all of the symbols from a module are imported (either by a use-stmt without an only-list or because all of the public symbols of the module have been listed in only-lists). In that case collapse them into a single use-stmt.
  • Emit the use-stmts that appeared in the original source.

Reading and writing module files

Options

The compiler will have command-line options to specify where to search for module files and where to write them. By default it will be the current directory for both.

For PGI, -I specifies directories to search for include files and module files. -module specifics a directory to write module files in as well as to search for them. gfortran is similar except it uses -J instead of -module.

The search order for module files is:

  1. The -module directory (Note: for gfortran the -J directory is not searched).
  2. The current directory
  3. The -I directories in the order they appear on the command line

Writing module files

When writing a module file, if the existing one matches what would be written, the timestamp is not updated.

Module files will be written after semantics, i.e. after the compiler has determined the module is valid Fortran.
NOTE: PGI does create .mod files sometimes even when the module has a compilation error.

Question: If the compiler can get far enough to determine it is compiling a module but then encounters an error, should it delete the existing .mod file? PGI does not, gfortran does.

Reading module files

When the compiler finds a .mod file it needs to read, it firsts checks the first line and verifies it is a valid module file. It can also verify checksums of modules it depends on and report if they are out of date.

If the header is valid, the module file will be run through the parser and name resolution to recreate the symbols from the module. Once the symbol table is populated the parse tree can be discarded.

When processing .mod files we know they are valid Fortran with these properties:

  1. The input (without the header) is already in the "cooked input" format.
  2. No preprocessing is necessary.
  3. No errors can occur.

Error messages referring to modules

With this design, diagnostics can refer to names in modules and can emit a normalized declaration of an entity but not point to its location in the source.

If the header includes the source file it came from, that could be included in a diagnostic but we still wouldn't have line numbers.

To provide line numbers and character positions or source lines as the user wrote them we would have to save some amount of provenance information in the module file as well.