16 KiB
Fortran I/O Runtime Library Internal Design
This note is meant to be an overview of the design of the implementation of the f18 Fortran compiler's runtime support library for I/O statements.
The interface to the I/O runtime support library is defined in the
C++ header file runtime/io-api.h
.
This interface was designed to minimize the amount of complexity exposed
to its clients, which are of course the sequences of calls generated by
the compiler to implement each I/O statement.
By keeping this interface as simple as possible, we hope that we have
lowered the risk of future incompatible changes that would necessitate
recompilation of Fortran codes in order to link with later versions of
the runtime library.
As one will see in io-api.h
, the interface is also directly callable
from C and C++ programs.
The I/O facilities of the Fortran 2018 language are specified in the
language standard in its clauses 12 (I/O statements) and 13 (FORMAT
).
It's a complicated collection of language features:
- Files can comprise records or streams.
- Records can be fixed-length or variable-length.
- Record files can be accessed sequentially or directly (random access).
- Files can be formatted, or unformatted raw bits.
CHARACTER
scalars and arrays can be used as if they were fixed-length formatted sequential record files.- Formatted I/O can be under control of a
FORMAT
statement orFMT=
specifier, list-directed with default formatting chosen by the runtime, orNAMELIST
, in which a collection of variables can be given a name and passed as a group to the runtime library. - Sequential records of a file can be partially processed by one or more non-advancing I/O statements and eventually completed by another.
FORMAT
strings can manipulate the position in the current record arbitrarily, causing re-reading or overwriting.- Floating-point output formatting supports more rounding modes than the IEEE standard for floating-point arithmetic.
The Fortran I/O runtime support library is written in C++17, and uses some C++17 standard library facilities, but it is intended to not have any link-time dependences on the C++ runtime support library or any LLVM libraries. This is important because there are at least two C++ runtime support libraries, and we don't want Fortran application builders to have to build multiple versions of their codes; neither do we want to require them to ship LLVM libraries along with their products.
Consequently, dynamic memory allocation in the Fortran runtime
uses only C's malloc()
and free()
functions, and the few
C++ standard class templates that we instantiate in the library have been
modified with optional template arguments that override their
allocators and deallocators.
Conversions between the many binary floating-point formats supported by f18 and their decimal representations are performed with the same template library of fast conversion algorithms used to interpret floating-point values in Fortran source programs and to emit them to module files.
Overview of Classes
A suite of C++ classes and class templates are composed to construct
the Fortran I/O runtime support library.
They (mostly) reside in the C++ namespace Fortran::runtime::io
.
They are summarized here in a bottom-up order of dependence.
The header and C++ implementation source file names of these
classes are in the process of being vigorously rearranged and
modified; use grep
or an IDE to discover these classes in
the source for now. (Sorry!)
Terminator
A general facility for the entire library, Terminator
latches a
source program statement location in terms of an unowned pointer to
its source file path name and line number and uses them to construct
a fatal error message if needed.
It is used for both user program errors and internal runtime library crashes.
IoErrorHandler
When I/O error conditions arise at runtime that the Fortran program
might have the privilege to handle itself via ERR=
, END=
, or
EOR=
labels and/or by an IOSTAT=
variable, this subclass of
Terminator
is used to either latch the error indication or to crash.
It sorts out priorities in the case of multiple errors and determines
the final IOSTAT=
value at the end of an I/O statement.
MutableModes
Fortran's formatted I/O statements are affected by a suite of
modes that can be configured by OPEN
statements, overridden by
data transfer I/O statement control lists, and further overridden
between data items with control edit descriptors in a FORMAT
string.
These modes are represented with a MutableModes
instance, and these
are instantiated and copied where one would expect them to be in
order to properly isolate their modifications.
The modes in force at the time each data item is processed constitute
a member of each DataEdit
.
DataEdit
Represents a single data edit descriptor from a FORMAT
statement
or FMT=
character value, with some hidden extensions to also
support formatting of list-directed transfers.
It holds an instance of MutableModes
, and also has a repetition
count for when an array appears as a data item in the io-list.
For simplicity and efficiency, each data edit descriptor is
encoded in the DataEdit
as a simple capitalized character
(or two) and some optional field widths.
FormatControl<>
This class template traverses a FORMAT
statement's contents (or FMT=
character value) to extract data edit descriptors like E20.14
to
serve each item in an I/O data transfer statement's io-list,
making callbacks to an instance of its class template argument
along the way to effect character literal output and record
positioning.
The Fortran language standard defines formatted I/O as if the FORMAT
string were driving the traversal of the data items in the io-list,
but our implementation reverses that perspective to allow a more
convenient (for the compiler) I/O runtime support library API design
in which each data item is presented to the library with a distinct
type-dependent call.
Clients of FormatControl
instantiations call its GetNextDataEdit()
member function to acquire the next data edit descriptor to be processed
from the format, and FinishOutput()
to flush out any remaining
output strings or record positionings at the end of the io-list.
The DefaultFormatControlCallbacks
structure summarizes the API
expected by FormatControl
from its class template actual arguments.
OpenFile
This class encapsulates all (I hope) the operating system interfaces used to interact with the host's filesystems for operations on external units. Asynchronous I/O interfaces are faked for now with synchronous operations and deferred results.
ConnectionState
An active connection to an external or internal unit maintains
the common parts of its state in this subclass of ConnectionAttributes
.
The base class holds state that should not change during the
lifetime of the connection, while the subclass maintains state
that may change during I/O statement execution.
InternalDescriptorUnit
When I/O is being performed from/to a Fortran CHARACTER
array
rather than an external file, this class manages the standard
interoperable descriptor used to access its elements as records.
It has the necessary interfaces to serve as an actual argument
to the FormatControl
class template.
FileFrame<>
This CRTP class template isolates all of the complexity involved between
an external unit's OpenFile
and the buffering requirements
imposed by the capabilities of Fortran FORMAT
control edit
descriptors that allow repositioning within the current record.
Its interface enables its clients to define a "frame" (my term,
not Fortran's) that is a contiguous range of bytes that are
or may soon be in the file.
This frame is defined as a file offset and a byte size.
The FileFrame
instance manages an internal circular buffer
with two essential guarantees:
- The most recently requested frame is present in the buffer and contiguous in memory.
- Any extra data after the frame that may have been read from the external unit will be preserved, so that it's safe to read from a socket, pipe, or tape and not have to worry about repositioning and rereading.
In end-of-file situations, it's possible that a request to read a frame may come up short.
As a CRTP class template, FileFrame
accesses the raw filesystem
facilities it needs from *this
.
ExternalFileUnit
This class mixes in ConnectionState
, OpenFile
, and
FileFrame<ExternalFileUnit>
to represent the state of an open
(or soon to be opened) external file descriptor as a Fortran
I/O unit.
It has the contextual APIs required to serve as a template actual
argument to FormatControl
.
And it contains a std::variant<>
suitable for holding the
state of the active I/O statement in progress on the unit
(see below).
ExternalFileUnit
instances reside in a Map
that is allocated
as a static variable and indexed by Fortran unit number.
Static member functions LookUp()
, LookUpOrCrash()
, and LookUpOrCreate()
probe the map to convert Fortran UNIT=
numbers from I/O statements
into references to active units.
IoStatementBase
The subclasses of IoStatementBase
each encapsulate and maintain
the state of one active Fortran I/O statement across the several
I/O runtime library API function calls it may comprise.
The subclasses handle the distinctions between internal vs. external I/O,
formatted vs. list-directed vs. unformatted I/O, input vs. output,
and so on.
IoStatementBase
inherits default FORMAT
processing callbacks and
an IoErrorHandler
.
Each of the IoStatementBase
classes that pertain to formatted I/O
support the contextual callback interfaces needed by FormatControl
,
overriding the default callbacks of the base class, which crash if
called inappropriately (e.g., if a CLOSE
statement somehow
passes a data item from an io-list).
The lifetimes of these subclasses' instances each begin with a user
program call to an I/O API routine with a name like BeginExternalListOutput()
and persist until EndIoStatement()
is called.
To reduce dynamic memory allocation, external I/O statements allocate
their per-statement state class instances in space reserved in the
ExternalFileUnit
instance.
Internal I/O statements currently use dynamic allocation, but
the I/O API supports a means whereby the code generated for the Fortran
program may supply stack space to the I/O runtime support library
for this purpose.
IoStatementState
F18's Fortran I/O runtime support library defines and implements an API
that uses a sequence of function calls to implement each Fortran I/O
statement.
The state of each I/O statement in progress is maintained in some
subclass of IoStatementBase
, as noted above.
The purpose of IoStatementState
is to provide generic access
to the specific state classes without recourse to C++ virtual
functions or function pointers, language features that may not be
available to us in some important execution environments.
IoStatementState
comprises a std::variant<>
of wrapped references
to the various possibilities, and uses std::visit()
to
access them as needed by the I/O API calls that process each specifier
in the I/O control-list and each item in the io-list.
Pointers to IoStatementState
instances are the Cookie
type returned
in the I/O API for Begin...
I/O statement calls, passed back for
the control-list specifiers and io-list data items, and consumed
by the EndIoStatement()
call at the end of the statement.
Storage for IoStatementState
is reserved in ExternalFileUnit
for
external I/O units, and in the various final subclasses for internal
I/O statement states otherwise.
Since Fortran permits a CLOSE
statement to reference a nonexistent
unit, the library has to treat that (expected to be rare) situation
as a weird variation of internal I/O since there's no ExternalFileUnit
available to hold its IoStatementBase
subclass or IoStatementState
.
A Narrative Overview Of PRINT *, 'HELLO, WORLD'
- When the compiled Fortran program begins execution at the
main()
entry point exported from its main program, it callsProgramStart()
with its arguments and environment. - The generated code calls
BeginExternalListOutput()
to start the sequence of calls that implement thePRINT
statement. Since the Fortran runtime I/O library has not yet been used in this process, its data structures are initialized on this first call, and Fortran I/O units 5 and 6 are connected with the stadard input and output file descriptors (respectively). The default unit code is converted to 6 and passed toExternalFileUnit::LookUpOrCrash()
, which returns a reference to unit 6's instance. - We check that the unit was opened for formatted I/O.
ExternalFileUnit::BeginIoStatement<>()
is called to initialize an instance ofExternalListIoStatementState<false>
in the unit, point to it with anIoStatementState
, and return a reference to that object whose address will be theCookie
for this statement.- The generated code calls
OutputAscii()
with that cookie and the address and length of the string. OutputAscii()
confirms that the cookie corresponds to an output statement and determines that it's list-directed.ListDirectedStatementState<false>::EmitLeadingSpaceOrAdvance()
emits the required initial space on the new current output record by callingIoStatementState::GetConnectionState()
to locate the connection state, determining from the record position state that the space is necessary, and callingIoStatementState::Emit()
to cough it out. That call is redirected toExternalFileUnit::Emit()
, which callsFileFrame<ExternalFileUnit>::WriteFrame()
to extend the frame of the current record and thenmemcpy()
to fill its first byte with the space.- Back in
OutputAscii()
, the mutable modes and connection state of theIoStatementState
are queried to see whether we're in anWRITE(UNIT=,FMT=,DELIM=)
statement with a delimited specifier. If we were, the library would emit the appropriate quote marks, double up any instances of that character in the text, and split the text over multiple records if it's long. - But we don't have a delimiter, so
OutputAscii()
just carves up the text into record-sized chunks and emits them. There's just one chunk for our shortCHARACTER
string value in this example. It's passed toIoStatementState::Emit()
, which (as above) is redirected toExternalFileUnit::Emit()
, which interacts with the frame to extend the frame andmemcpy
data into the buffer. - A flag is set in
ListDirectedStatementState<false>
to remember that the last item emitted in this list-directed output statement was an undelimitedCHARACTER
value, so that if the next item is also an undelimitedCHARACTER
, no interposing space will be emitted between them. OutputAscii()
returntrue
to its caller.- The generated code calls
EndIoStatement()
, which is redirected toExternalIoStatementState<false>
's override of that function. As this is not a non-advancing I/O statement,ExternalFileUnit::AdvanceRecord()
is called to end the record. Since this is a sequential formatted file, a newline is emitted. - If unit 6 is connected to a terminal, the buffer is flushed.
FileFrame<ExternalFileUnit>::Flush()
drivesExternalFileUnit::Write()
to push out the data in maximal contiguous chunks, dealing with any short writes that might occur, and collecting I/O errors along the way. This statement has noERR=
label orIOSTAT=
specifier, so errors arriving atIoErrorHandler::SignalErrno()
will cause an immediate crash. ExternalIoStatementBase::EndIoStatement()
is called. It gets the finalIOSTAT=
value fromIoStatementBase::EndIoStatement()
, tells theExternalFileUnit
that no I/O statement remains active, and returns the I/O status value back to the program.- Eventually, the program calls
ProgramEndStatement()
, which callsExternalFileUnit::CloseAll()
, which flushes and closes all open files. If the standard output were not a terminal, the output would be written now with the same sequence of calls as above. exit(EXIT_SUCCESS)
.