forked from OSchip/llvm-project
346 lines
16 KiB
Markdown
346 lines
16 KiB
Markdown
<!--===- docs/IORuntimeInternals.md
|
|
|
|
Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
See https://llvm.org/LICENSE.txt for license information.
|
|
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
|
|
-->
|
|
|
|
# Fortran I/O Runtime Library Internal Design
|
|
|
|
```eval_rst
|
|
.. contents::
|
|
:local:
|
|
```
|
|
|
|
This note is meant to be an overview of the design of the *implementation*
|
|
of the f18 Fortran compiler's runtime support library for I/O statements.
|
|
|
|
The *interface* to the I/O runtime support library is defined in the
|
|
C++ header file `runtime/io-api.h`.
|
|
This interface was designed to minimize the amount of complexity exposed
|
|
to its clients, which are of course the sequences of calls generated by
|
|
the compiler to implement each I/O statement.
|
|
By keeping this interface as simple as possible, we hope that we have
|
|
lowered the risk of future incompatible changes that would necessitate
|
|
recompilation of Fortran codes in order to link with later versions of
|
|
the runtime library.
|
|
As one will see in `io-api.h`, the interface is also directly callable
|
|
from C and C++ programs.
|
|
|
|
The I/O facilities of the Fortran 2018 language are specified in the
|
|
language standard in its clauses 12 (I/O statements) and 13 (`FORMAT`).
|
|
It's a complicated collection of language features:
|
|
* Files can comprise *records* or *streams*.
|
|
* Records can be fixed-length or variable-length.
|
|
* Record files can be accessed sequentially or directly (random access).
|
|
* Files can be *formatted*, or *unformatted* raw bits.
|
|
* `CHARACTER` scalars and arrays can be used as if they were
|
|
fixed-length formatted sequential record files.
|
|
* Formatted I/O can be under control of a `FORMAT` statement
|
|
or `FMT=` specifier, *list-directed* with default formatting chosen
|
|
by the runtime, or `NAMELIST`, in which a collection of variables
|
|
can be given a name and passed as a group to the runtime library.
|
|
* Sequential records of a file can be partially processed by one
|
|
or more *non-advancing* I/O statements and eventually completed by
|
|
another.
|
|
* `FORMAT` strings can manipulate the position in the current
|
|
record arbitrarily, causing re-reading or overwriting.
|
|
* Floating-point output formatting supports more rounding modes
|
|
than the IEEE standard for floating-point arithmetic.
|
|
|
|
The Fortran I/O runtime support library is written in C++17, and
|
|
uses some C++17 standard library facilities, but it is intended
|
|
to not have any link-time dependences on the C++ runtime support
|
|
library or any LLVM libraries.
|
|
This is important because there are at least two C++ runtime support
|
|
libraries, and we don't want Fortran application builders to have to
|
|
build multiple versions of their codes; neither do we want to require
|
|
them to ship LLVM libraries along with their products.
|
|
|
|
Consequently, dynamic memory allocation in the Fortran runtime
|
|
uses only C's `malloc()` and `free()` functions, and the few
|
|
C++ standard class templates that we instantiate in the library have been
|
|
modified with optional template arguments that override their
|
|
allocators and deallocators.
|
|
|
|
Conversions between the many binary floating-point formats supported
|
|
by f18 and their decimal representations are performed with the same
|
|
template library of fast conversion algorithms used to interpret
|
|
floating-point values in Fortran source programs and to emit them
|
|
to module files.
|
|
|
|
## Overview of Classes
|
|
|
|
A suite of C++ classes and class templates are composed to construct
|
|
the Fortran I/O runtime support library.
|
|
They (mostly) reside in the C++ namespace `Fortran::runtime::io`.
|
|
They are summarized here in a bottom-up order of dependence.
|
|
|
|
The header and C++ implementation source file names of these
|
|
classes are in the process of being vigorously rearranged and
|
|
modified; use `grep` or an IDE to discover these classes in
|
|
the source for now. (Sorry!)
|
|
|
|
### `Terminator`
|
|
|
|
A general facility for the entire library, `Terminator` latches a
|
|
source program statement location in terms of an unowned pointer to
|
|
its source file path name and line number and uses them to construct
|
|
a fatal error message if needed.
|
|
It is used for both user program errors and internal runtime library crashes.
|
|
|
|
### `IoErrorHandler`
|
|
|
|
When I/O error conditions arise at runtime that the Fortran program
|
|
might have the privilege to handle itself via `ERR=`, `END=`, or
|
|
`EOR=` labels and/or by an `IOSTAT=` variable, this subclass of
|
|
`Terminator` is used to either latch the error indication or to crash.
|
|
It sorts out priorities in the case of multiple errors and determines
|
|
the final `IOSTAT=` value at the end of an I/O statement.
|
|
|
|
### `MutableModes`
|
|
|
|
Fortran's formatted I/O statements are affected by a suite of
|
|
modes that can be configured by `OPEN` statements, overridden by
|
|
data transfer I/O statement control lists, and further overridden
|
|
between data items with control edit descriptors in a `FORMAT` string.
|
|
These modes are represented with a `MutableModes` instance, and these
|
|
are instantiated and copied where one would expect them to be in
|
|
order to properly isolate their modifications.
|
|
The modes in force at the time each data item is processed constitute
|
|
a member of each `DataEdit`.
|
|
|
|
### `DataEdit`
|
|
|
|
Represents a single data edit descriptor from a `FORMAT` statement
|
|
or `FMT=` character value, with some hidden extensions to also
|
|
support formatting of list-directed transfers.
|
|
It holds an instance of `MutableModes`, and also has a repetition
|
|
count for when an array appears as a data item in the *io-list*.
|
|
For simplicity and efficiency, each data edit descriptor is
|
|
encoded in the `DataEdit` as a simple capitalized character
|
|
(or two) and some optional field widths.
|
|
|
|
### `FormatControl<>`
|
|
|
|
This class template traverses a `FORMAT` statement's contents (or `FMT=`
|
|
character value) to extract data edit descriptors like `E20.14` to
|
|
serve each item in an I/O data transfer statement's *io-list*,
|
|
making callbacks to an instance of its class template argument
|
|
along the way to effect character literal output and record
|
|
positioning.
|
|
The Fortran language standard defines formatted I/O as if the `FORMAT`
|
|
string were driving the traversal of the data items in the *io-list*,
|
|
but our implementation reverses that perspective to allow a more
|
|
convenient (for the compiler) I/O runtime support library API design
|
|
in which each data item is presented to the library with a distinct
|
|
type-dependent call.
|
|
|
|
Clients of `FormatControl` instantiations call its `GetNextDataEdit()`
|
|
member function to acquire the next data edit descriptor to be processed
|
|
from the format, and `FinishOutput()` to flush out any remaining
|
|
output strings or record positionings at the end of the *io-list*.
|
|
|
|
The `DefaultFormatControlCallbacks` structure summarizes the API
|
|
expected by `FormatControl` from its class template actual arguments.
|
|
|
|
### `OpenFile`
|
|
|
|
This class encapsulates all (I hope) the operating system interfaces
|
|
used to interact with the host's filesystems for operations on
|
|
external units.
|
|
Asynchronous I/O interfaces are faked for now with synchronous
|
|
operations and deferred results.
|
|
|
|
### `ConnectionState`
|
|
|
|
An active connection to an external or internal unit maintains
|
|
the common parts of its state in this subclass of `ConnectionAttributes`.
|
|
The base class holds state that should not change during the
|
|
lifetime of the connection, while the subclass maintains state
|
|
that may change during I/O statement execution.
|
|
|
|
### `InternalDescriptorUnit`
|
|
|
|
When I/O is being performed from/to a Fortran `CHARACTER` array
|
|
rather than an external file, this class manages the standard
|
|
interoperable descriptor used to access its elements as records.
|
|
It has the necessary interfaces to serve as an actual argument
|
|
to the `FormatControl` class template.
|
|
|
|
### `FileFrame<>`
|
|
|
|
This CRTP class template isolates all of the complexity involved between
|
|
an external unit's `OpenFile` and the buffering requirements
|
|
imposed by the capabilities of Fortran `FORMAT` control edit
|
|
descriptors that allow repositioning within the current record.
|
|
Its interface enables its clients to define a "frame" (my term,
|
|
not Fortran's) that is a contiguous range of bytes that are
|
|
or may soon be in the file.
|
|
This frame is defined as a file offset and a byte size.
|
|
The `FileFrame` instance manages an internal circular buffer
|
|
with two essential guarantees:
|
|
|
|
1. The most recently requested frame is present in the buffer
|
|
and contiguous in memory.
|
|
1. Any extra data after the frame that may have been read from
|
|
the external unit will be preserved, so that it's safe to
|
|
read from a socket, pipe, or tape and not have to worry about
|
|
repositioning and rereading.
|
|
|
|
In end-of-file situations, it's possible that a request to read
|
|
a frame may come up short.
|
|
|
|
As a CRTP class template, `FileFrame` accesses the raw filesystem
|
|
facilities it needs from `*this`.
|
|
|
|
### `ExternalFileUnit`
|
|
|
|
This class mixes in `ConnectionState`, `OpenFile`, and
|
|
`FileFrame<ExternalFileUnit>` to represent the state of an open
|
|
(or soon to be opened) external file descriptor as a Fortran
|
|
I/O unit.
|
|
It has the contextual APIs required to serve as a template actual
|
|
argument to `FormatControl`.
|
|
And it contains a `std::variant<>` suitable for holding the
|
|
state of the active I/O statement in progress on the unit
|
|
(see below).
|
|
|
|
`ExternalFileUnit` instances reside in a `Map` that is allocated
|
|
as a static variable and indexed by Fortran unit number.
|
|
Static member functions `LookUp()`, `LookUpOrCrash()`, and `LookUpOrCreate()`
|
|
probe the map to convert Fortran `UNIT=` numbers from I/O statements
|
|
into references to active units.
|
|
|
|
### `IoStatementBase`
|
|
|
|
The subclasses of `IoStatementBase` each encapsulate and maintain
|
|
the state of one active Fortran I/O statement across the several
|
|
I/O runtime library API function calls it may comprise.
|
|
The subclasses handle the distinctions between internal vs. external I/O,
|
|
formatted vs. list-directed vs. unformatted I/O, input vs. output,
|
|
and so on.
|
|
|
|
`IoStatementBase` inherits default `FORMAT` processing callbacks and
|
|
an `IoErrorHandler`.
|
|
Each of the `IoStatementBase` classes that pertain to formatted I/O
|
|
support the contextual callback interfaces needed by `FormatControl`,
|
|
overriding the default callbacks of the base class, which crash if
|
|
called inappropriately (e.g., if a `CLOSE` statement somehow
|
|
passes a data item from an *io-list*).
|
|
|
|
The lifetimes of these subclasses' instances each begin with a user
|
|
program call to an I/O API routine with a name like `BeginExternalListOutput()`
|
|
and persist until `EndIoStatement()` is called.
|
|
|
|
To reduce dynamic memory allocation, *external* I/O statements allocate
|
|
their per-statement state class instances in space reserved in the
|
|
`ExternalFileUnit` instance.
|
|
Internal I/O statements currently use dynamic allocation, but
|
|
the I/O API supports a means whereby the code generated for the Fortran
|
|
program may supply stack space to the I/O runtime support library
|
|
for this purpose.
|
|
|
|
### `IoStatementState`
|
|
|
|
F18's Fortran I/O runtime support library defines and implements an API
|
|
that uses a sequence of function calls to implement each Fortran I/O
|
|
statement.
|
|
The state of each I/O statement in progress is maintained in some
|
|
subclass of `IoStatementBase`, as noted above.
|
|
The purpose of `IoStatementState` is to provide generic access
|
|
to the specific state classes without recourse to C++ `virtual`
|
|
functions or function pointers, language features that may not be
|
|
available to us in some important execution environments.
|
|
`IoStatementState` comprises a `std::variant<>` of wrapped references
|
|
to the various possibilities, and uses `std::visit()` to
|
|
access them as needed by the I/O API calls that process each specifier
|
|
in the I/O *control-list* and each item in the *io-list*.
|
|
|
|
Pointers to `IoStatementState` instances are the `Cookie` type returned
|
|
in the I/O API for `Begin...` I/O statement calls, passed back for
|
|
the *control-list* specifiers and *io-list* data items, and consumed
|
|
by the `EndIoStatement()` call at the end of the statement.
|
|
|
|
Storage for `IoStatementState` is reserved in `ExternalFileUnit` for
|
|
external I/O units, and in the various final subclasses for internal
|
|
I/O statement states otherwise.
|
|
|
|
Since Fortran permits a `CLOSE` statement to reference a nonexistent
|
|
unit, the library has to treat that (expected to be rare) situation
|
|
as a weird variation of internal I/O since there's no `ExternalFileUnit`
|
|
available to hold its `IoStatementBase` subclass or `IoStatementState`.
|
|
|
|
## A Narrative Overview Of `PRINT *, 'HELLO, WORLD'`
|
|
|
|
1. When the compiled Fortran program begins execution at the `main()`
|
|
entry point exported from its main program, it calls `ProgramStart()`
|
|
with its arguments and environment.
|
|
1. The generated code calls `BeginExternalListOutput()` to
|
|
start the sequence of calls that implement the `PRINT` statement.
|
|
Since the Fortran runtime I/O library has not yet been used in
|
|
this process, its data structures are initialized on this
|
|
first call, and Fortran I/O units 5 and 6 are connected with
|
|
the stadard input and output file descriptors (respectively).
|
|
The default unit code is converted to 6 and passed to
|
|
`ExternalFileUnit::LookUpOrCrash()`, which returns a reference to
|
|
unit 6's instance.
|
|
1. We check that the unit was opened for formatted I/O.
|
|
1. `ExternalFileUnit::BeginIoStatement<>()` is called to initialize
|
|
an instance of `ExternalListIoStatementState<false>` in the unit,
|
|
point to it with an `IoStatementState`, and return a reference to
|
|
that object whose address will be the `Cookie` for this statement.
|
|
1. The generated code calls `OutputAscii()` with that cookie and the
|
|
address and length of the string.
|
|
1. `OutputAscii()` confirms that the cookie corresponds to an output
|
|
statement and determines that it's list-directed.
|
|
1. `ListDirectedStatementState<false>::EmitLeadingSpaceOrAdvance()`
|
|
emits the required initial space on the new current output record
|
|
by calling `IoStatementState::GetConnectionState()` to locate
|
|
the connection state, determining from the record position state
|
|
that the space is necessary, and calling `IoStatementState::Emit()`
|
|
to cough it out. That call is redirected to `ExternalFileUnit::Emit()`,
|
|
which calls `FileFrame<ExternalFileUnit>::WriteFrame()` to extend
|
|
the frame of the current record and then `memcpy()` to fill its
|
|
first byte with the space.
|
|
1. Back in `OutputAscii()`, the mutable modes and connection state
|
|
of the `IoStatementState` are queried to see whether we're in an
|
|
`WRITE(UNIT=,FMT=,DELIM=)` statement with a delimited specifier.
|
|
If we were, the library would emit the appropriate quote marks,
|
|
double up any instances of that character in the text, and split the
|
|
text over multiple records if it's long.
|
|
1. But we don't have a delimiter, so `OutputAscii()` just carves
|
|
up the text into record-sized chunks and emits them. There's just
|
|
one chunk for our short `CHARACTER` string value in this example.
|
|
It's passed to `IoStatementState::Emit()`, which (as above) is
|
|
redirected to `ExternalFileUnit::Emit()`, which interacts with the
|
|
frame to extend the frame and `memcpy` data into the buffer.
|
|
1. A flag is set in `ListDirectedStatementState<false>` to remember
|
|
that the last item emitted in this list-directed output statement
|
|
was an undelimited `CHARACTER` value, so that if the next item is
|
|
also an undelimited `CHARACTER`, no interposing space will be emitted
|
|
between them.
|
|
1. `OutputAscii()` return `true` to its caller.
|
|
1. The generated code calls `EndIoStatement()`, which is redirected to
|
|
`ExternalIoStatementState<false>`'s override of that function.
|
|
As this is not a non-advancing I/O statement, `ExternalFileUnit::AdvanceRecord()`
|
|
is called to end the record. Since this is a sequential formatted
|
|
file, a newline is emitted.
|
|
1. If unit 6 is connected to a terminal, the buffer is flushed.
|
|
`FileFrame<ExternalFileUnit>::Flush()` drives `ExternalFileUnit::Write()`
|
|
to push out the data in maximal contiguous chunks, dealing with any
|
|
short writes that might occur, and collecting I/O errors along the way.
|
|
This statement has no `ERR=` label or `IOSTAT=` specifier, so errors
|
|
arriving at `IoErrorHandler::SignalErrno()` will cause an immediate
|
|
crash.
|
|
1. `ExternalIoStatementBase::EndIoStatement()` is called.
|
|
It gets the final `IOSTAT=` value from `IoStatementBase::EndIoStatement()`,
|
|
tells the `ExternalFileUnit` that no I/O statement remains active, and
|
|
returns the I/O status value back to the program.
|
|
1. Eventually, the program calls `ProgramEndStatement()`, which
|
|
calls `ExternalFileUnit::CloseAll()`, which flushes and closes all
|
|
open files. If the standard output were not a terminal, the output
|
|
would be written now with the same sequence of calls as above.
|
|
1. `exit(EXIT_SUCCESS)`.
|