forked from OSchip/llvm-project
150 lines
6.5 KiB
Markdown
150 lines
6.5 KiB
Markdown
<!--===- documentation/C++17.md
|
|
|
|
Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
See https://llvm.org/LICENSE.txt for license information.
|
|
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
|
|
-->
|
|
|
|
## C++14/17 features used in f18
|
|
|
|
The C++ dialect used in this project constitutes a subset of the
|
|
standard C++ programming language and library features.
|
|
We want our dialect to be compatible with the LLVM C++ language
|
|
subset that will be in use at the time that we integrate with that
|
|
project.
|
|
We also want to maximize portability, future-proofing,
|
|
compile-time error checking, and use of best practices.
|
|
|
|
To that end, we have a C++ style guide (q.v.) that lays
|
|
out the details of how our C++ code should look and gives
|
|
guidance about feature usage.
|
|
|
|
We have chosen to use some features of the recent C++17
|
|
language standard in f18.
|
|
The most important of these are:
|
|
* sum types (discriminated unions) in the form of `std::variant`
|
|
* `using` template parameter packs
|
|
* generic lambdas with `auto` argument types
|
|
* product types in the form of `std::tuple`
|
|
* `std::optional`
|
|
|
|
(`std::tuple` is actually a C++11 feature, but I include it
|
|
in this list because it's not particularly well known.)
|
|
|
|
### Sum types
|
|
|
|
First, some background information to explain the need for sum types
|
|
in f18.
|
|
|
|
Fortran is notoriously problematic to lex and parse, as tokenization
|
|
depends on the state of the partial parse;
|
|
the language has no reserved words in the sense that C++ does.
|
|
Fortran parsers implemented with distinct lexing and parsing phases
|
|
(generated by hand or with tools) need to implement them as
|
|
coroutines with complicated state, and experience has shown that
|
|
it's hard to get them right and harder to extend them as the language
|
|
evolves.
|
|
|
|
Alternatively, with the use of backtracking, one can parse Fortran with
|
|
a unified lexer/parser.
|
|
We have chosen to do so because it is simpler and should reduce
|
|
both initial bugs and long-term maintenance.
|
|
|
|
Specifically, f18's parser uses the technique of recursive descent with
|
|
backtracking.
|
|
It is constructed as the incremental composition of pure parsing functions
|
|
that each, when given a context (location in the input stream plus some state),
|
|
either _succeeds_ or _fails_ to recognize some piece of Fortran.
|
|
On success, they return a new state and some semantic value, and this is
|
|
usually an instance of a C++ `struct` type that encodes the semantic
|
|
content of a production in the Fortran grammar.
|
|
|
|
This technique allows us to specify both the Fortran grammar and the
|
|
representation of successfully parsed programs with C++ code
|
|
whose functions and data structures correspond closely to the productions
|
|
of Fortran.
|
|
|
|
The specification of Fortran uses a form of BNF with alternatives,
|
|
optional elements, sequences, and lists. Each of these constructs
|
|
in the Fortran grammar maps directly in the f18 parser to both
|
|
the means of combining other parsers as alternatives, &c., and to
|
|
the declarations of the parse tree data structures that represent
|
|
the results of successful parses.
|
|
Move semantics are used in the parsing functions to acquire and
|
|
combine the results of sub-parses into the result of a larger
|
|
parse.
|
|
|
|
To represent nodes in the Fortran parse tree, we need a means of
|
|
handling sum types for productions that have multiple alternatives.
|
|
The bounded polymorphism supplied by the C++17 `std::variant` fits
|
|
those needs exactly.
|
|
For example, production R502 in Fortran defines the top-level
|
|
program unit of Fortran as being a function, subroutine, module, &c.
|
|
The `struct ProgramUnit` in the f18 parse tree header file
|
|
represents each program unit with a member that is a `std::variant`
|
|
over the six possibilities.
|
|
Similarly, the parser for that type in the f18 grammar has six alternatives,
|
|
each of which constructs an instance of `ProgramUnit` upon the result of
|
|
parsing a `Module`, `FunctionSubprogram`, and so on.
|
|
|
|
Code that performs semantic analysis on the result of a successful
|
|
parse is typically implemented with overloaded functions.
|
|
A function instantiated on `ProgramUnit` will use `std::visit` to
|
|
identify the right alternative and perform the right actions.
|
|
The call to `std::visit` must pass a visitor that can handle all
|
|
of the possibilities, and f18 will fail to build if one is missing.
|
|
|
|
Were we unable to use `std::variant` directly, we would likely
|
|
have chosen to implement a local `SumType` replacement; in the
|
|
absence of C++17's abilities of `using` a template parameter pack
|
|
and allowing `auto` arguments in anonymous lambda functions,
|
|
it would be less convenient to use.
|
|
|
|
The other options for polymorphism in C++ at the level of C++11
|
|
would be to:
|
|
* loosen up compile-time type safety and use a unified parse tree node
|
|
representation with an enumeration type for an operator and generic
|
|
subtree pointers, or
|
|
* define the sum types for the parse tree as abstract base classes from
|
|
which each particular alternative would derive, and then use virtual
|
|
functions (or the forbidden `dynamic_cast`) to identify alternatives
|
|
during analysis
|
|
|
|
### Product types
|
|
|
|
Many productions in the Fortran grammar describe a sequence of various
|
|
sub-parses.
|
|
For example, R504 defines the things that may appear in the "specification
|
|
part" of a subprogram in the order in which they are allowed: `USE`
|
|
statements, then `IMPORT` statements, and so on.
|
|
|
|
The parse tree node that represents such a thing needs to incorporate
|
|
the representations of those parses, of course.
|
|
It turns out to be convenient to allow these data members to be anonymous
|
|
components of a `std::tuple` product type.
|
|
This type facilitates the automation of code that walks over all of the
|
|
members in a type-safe fashion and avoids the need to invent and remember
|
|
needless member names -- the components of a `std::tuple` instance can
|
|
be identified and accessed in terms of their types, and those tend to be
|
|
distinct.
|
|
|
|
So we use `std::tuple` for such things.
|
|
It has also been handy for template metaprogramming that needs to work
|
|
with lists of types.
|
|
|
|
### `std::optional`
|
|
|
|
This simple little type is used wherever a value might or might not be
|
|
present.
|
|
It is especially useful for function results and
|
|
rvalue reference arguments.
|
|
It corresponds directly to the optional elements in the productions
|
|
of the Fortran grammar.
|
|
It is also used as a wrapper around a parse tree node type to define the
|
|
results of the various parsing functions, where presence of a value
|
|
signifies a successful recognition and absence denotes a failed parse.
|
|
It is used in data structures in place of nullable pointers to
|
|
avoid indirection as well as the possible confusion over whether a pointer
|
|
is allowed to be null.
|