forked from OSchip/llvm-project
[flang] Convert parser combinator documentation file to Markdown.
Original-commit: flang-compiler/f18@263865c97a
This commit is contained in:
parent
94c26b688e
commit
1e69ed0c1b
|
@ -20,6 +20,8 @@ in foo.cc.)
|
||||||
1. In the source file "foo.cc", put the #include of "foo.h" first.
|
1. In the source file "foo.cc", put the #include of "foo.h" first.
|
||||||
Then #include other project headers in alphabetic order; then C++ standard
|
Then #include other project headers in alphabetic order; then C++ standard
|
||||||
headers, also alphabetically; then C and system headers.
|
headers, also alphabetically; then C and system headers.
|
||||||
|
1. Don't include the standard iostream header. If you need it for debugging,
|
||||||
|
remove the inclusion before committing.
|
||||||
### Naming
|
### Naming
|
||||||
1. C++ names that correspond to STL names should look like those STL names
|
1. C++ names that correspond to STL names should look like those STL names
|
||||||
(e.g., *clear()* and *size()* member functions in a class that implements
|
(e.g., *clear()* and *size()* member functions in a class that implements
|
||||||
|
@ -40,7 +42,7 @@ especially when you can declare them directly in a for()/while()/if()
|
||||||
condition. Otherwise, prefer complete English words to abbreviations
|
condition. Otherwise, prefer complete English words to abbreviations
|
||||||
when creating names.
|
when creating names.
|
||||||
### Commentary
|
### Commentary
|
||||||
1. Use // for all comments except for short notes within statements.
|
1. Use // for all comments except for short notes within expressions.
|
||||||
1. When // follows code on a line, precede it with two spaces.
|
1. When // follows code on a line, precede it with two spaces.
|
||||||
1. Comments should matter. Assume that the reader knows current C++ at least as
|
1. Comments should matter. Assume that the reader knows current C++ at least as
|
||||||
well as you do and avoid distracting her by calling out usage of new
|
well as you do and avoid distracting her by calling out usage of new
|
||||||
|
|
|
@ -0,0 +1,145 @@
|
||||||
|
## Concept
|
||||||
|
The Fortran language recognizer here can be classified as an LL recursive
|
||||||
|
descent parser. It is composed from a *parser combinator* library that
|
||||||
|
defines a few fundamental parsers and a few ways to compose them into more
|
||||||
|
powerful parsers.
|
||||||
|
|
||||||
|
For our purposes here, a *parser* is any object that can attempt to recognize
|
||||||
|
an instance of some syntax from an input stream. It may succeed or fail.
|
||||||
|
On success, it may return some semantic value to its caller.
|
||||||
|
|
||||||
|
In C++ terms, a parser is any instance of a class that
|
||||||
|
1. has a *constexpr* default constructor,
|
||||||
|
1. defines a resultType type, and
|
||||||
|
1. provides a member or static function that accepts a pointer to a
|
||||||
|
ParseState as its argument and returns a std::optional<resultType> as a
|
||||||
|
result, with the presence or absence of a value in the std::optional<>
|
||||||
|
signifying success or failure, respectively.
|
||||||
|
|
||||||
|
> std::optional<resultType> Parse(ParseState *) const;
|
||||||
|
|
||||||
|
The resultType of a parser is typically the class type of some particular
|
||||||
|
node type in the parse tree.
|
||||||
|
|
||||||
|
*ParseState* is a class that encapsulates a position in the source stream,
|
||||||
|
collects messages, and holds a few state flags that determive tokenization
|
||||||
|
(e.g., are we in a character literal?). Instances of *ParseState* are
|
||||||
|
independent and complete -- they are cheap to duplicate whenever necessary to
|
||||||
|
implement backtracking.
|
||||||
|
|
||||||
|
The constexpr default constructor of a parser is important. The functions
|
||||||
|
(below) that operate on instances of parsers are themselves all constexpr.
|
||||||
|
This use of compile-time expressions allows the entirety of a recursive
|
||||||
|
descent parser for a language to be constructed at compilation time through
|
||||||
|
the use of templates.
|
||||||
|
|
||||||
|
### Fundamental Predefined Parsers
|
||||||
|
These objects and functions are (or return) the fundamental parsers:
|
||||||
|
|
||||||
|
* *ok* is a trivial parser that always succeeds without advancing.
|
||||||
|
* "pure(x)" returns a trivial parser that always succeeds without advancing,
|
||||||
|
returning some value *x*.
|
||||||
|
* "fail<T>(msg)" denotes a trivial parser that always fails, emitting the
|
||||||
|
given message. The template parameter is the type of the value that
|
||||||
|
the parser never returns.
|
||||||
|
* *cut* is a trivial parser that always fails silently.
|
||||||
|
* "guard(pred)" returns a parser that succeeds if and only if the predicate
|
||||||
|
expression evaluates to true.
|
||||||
|
* *rawNextChar* returns the next raw character, and fails at EOF.
|
||||||
|
* *cookedNextChar* returns the next character after preprocessing, skipping
|
||||||
|
Fortran line continuations and comments; it also fails at EOF
|
||||||
|
|
||||||
|
### Combinators
|
||||||
|
These functions and operators combine parsers to generate new parsers.
|
||||||
|
|
||||||
|
* "!p" succeeds if p fails, and fails if p succeeds.
|
||||||
|
* "p >> q" fails if p does, otherwise running q and returning its value when
|
||||||
|
it succeeds.
|
||||||
|
* "p / q" fails if p does, otherwise running q and returning *p's* value
|
||||||
|
if q succeeds.
|
||||||
|
* "p || q" succeeds if p does, otherwise running q. The two parsers must
|
||||||
|
have the same type, and the value returned by the first succeeding parser
|
||||||
|
is the value of the combination.
|
||||||
|
* "lookAhead(p)" succeeds if p does, but doesn't modify any state.
|
||||||
|
* "attempt(p)" succeeds if p does, safely preserving state on failure.
|
||||||
|
* "many(p)" recognizes a greedy sequence of zero or more nonempty successes
|
||||||
|
of *p*, and returns std::list<> of their values. It always succeeds.
|
||||||
|
* "some(p)" recognized a greedy sequence of one or more successes of *p*.
|
||||||
|
It fails if p immediately fails.
|
||||||
|
* "skipMany(p)" is the same as "many(p)", but it discards the results.
|
||||||
|
* "maybe(p)" tries to match *p*, returning an "std::optional<T>" value.
|
||||||
|
It always succeeds.
|
||||||
|
* "defaulted(p)" matches *p*, and when *p* fails it returns a
|
||||||
|
default-constructed instance of *p*'s resultType. It always succeeds.
|
||||||
|
* "nonemptySeparated(p, q)" repeatedly matches "p q p q p q ... p",
|
||||||
|
returning a std::list<> of only the values of the p's. It fails if
|
||||||
|
*p* immediately fails.
|
||||||
|
* "extension(p)" parses *p* if strict standard compliance is disabled,
|
||||||
|
or with a warning if nonstandard usage warnings are enabled.
|
||||||
|
* "deprecated(p)" parses *p* if strict standard compliance is disabled,
|
||||||
|
with a warning if deprecated usage warnings are enabled.
|
||||||
|
* "inContext(..., p)" runs *p* within an error message context.
|
||||||
|
|
||||||
|
Note that "a >> b >> c / d / e" matches a sequence of five parsers,
|
||||||
|
but returns only the result that was obtained by matching c.
|
||||||
|
|
||||||
|
### Applicatives
|
||||||
|
The following *applicative* combinators combine parsers and modify or
|
||||||
|
collect the values that they return.
|
||||||
|
|
||||||
|
* "construct<T>{}(p1, p2, ...)" matches zero or more parsers in succession,
|
||||||
|
collecting their results and then passing them with move semantics to a
|
||||||
|
constructor for the type *T* if they all succeed.
|
||||||
|
* "applyFunction(f, p1, p2, ...)" matches one or more parsers in succession,
|
||||||
|
collecting their results and passing them as rvalue reference arguments to
|
||||||
|
some function, returning its result.
|
||||||
|
* "applyLambda([](&&x){}, p1, p2, ...)" is the same thing, but for lambdas
|
||||||
|
and other function objects.
|
||||||
|
* "applyMem(mf, p1, p2, ...)" is the same thing, but invokes a member
|
||||||
|
function of the result of the first parser for updates in place.
|
||||||
|
|
||||||
|
### Non-Advancing State Inquiries and Updates
|
||||||
|
These are non-advancing state inquiry and update parsers:
|
||||||
|
|
||||||
|
* *getColumn* returns the 1-based column position.
|
||||||
|
* *inCharLiteral* succeeds under withinCharLiteral.
|
||||||
|
* *inFortran* succeeds unless in a preprocessing directive.
|
||||||
|
* *inFixedForm* succeeds in fixed-form source.
|
||||||
|
* *setInFixedForm* sets the fixed-form flag, returning its prior value.
|
||||||
|
* *columns* returns the 1-based column number after which source is clipped.
|
||||||
|
* "setColumns(c)" sets the column limit and returns its prior value.
|
||||||
|
|
||||||
|
### Monadic Combination
|
||||||
|
When parsing depends on the result values of earlier parses, the
|
||||||
|
"monadic bind" combinator is available.
|
||||||
|
Please try to avoid using it, as it makes automatic analysis of the
|
||||||
|
grammar difficult.
|
||||||
|
It has the syntax "p >>= f", and it constructs a parser that matches p,
|
||||||
|
yielding some value x on success, then matches the parser returned from
|
||||||
|
the function call "f(x)".
|
||||||
|
|
||||||
|
### Token Parsers
|
||||||
|
Last, we have these basic parsers on which the actual grammar of the Fortran
|
||||||
|
is built. All of the following parsers consume characters acquired from
|
||||||
|
*cookedNextChar*.
|
||||||
|
|
||||||
|
* *spaces* always succeeds after consuming any spaces or tabs
|
||||||
|
* *digit* matches one cooked decimal digit (0-9)
|
||||||
|
* *letter* matches one cooked letter (A-Z)
|
||||||
|
* "CharMatch<'c'>{}" matches one specific cooked character.
|
||||||
|
* "..."_tok match the content of the string, skipping spaces before and
|
||||||
|
after, and with multiple spaces accepted for any internal space.
|
||||||
|
(Note that the _tok suffix is optional when the parser appears before
|
||||||
|
the combinator ">>" or after "/".)
|
||||||
|
* "parenthesized(p)" is shorthand for "(" >> p / ")".
|
||||||
|
* "bracketed(p)" is shorthand for "[" >> p / "]".
|
||||||
|
* "withinCharLiteral(p)" applies the parser *p*, tokenizing for
|
||||||
|
CHARACTER/Hollerith literals.
|
||||||
|
* "nonEmptyListOf(p)" matches a comma-separated list of one or more
|
||||||
|
instances of *p*.
|
||||||
|
* "optionalListOf(p)" is the same thing, but can be empty, and always succeeds.
|
||||||
|
|
||||||
|
### Debugging Parser
|
||||||
|
Last, the parser "..."_debug emit the string to the standard error and succeeds.
|
||||||
|
It is useful for tracing while debugging a parser but should obviously not
|
||||||
|
be committed for production code.
|
|
@ -1,127 +0,0 @@
|
||||||
The Fortran language recognizer here is an LL recursive descent parser
|
|
||||||
composed from a "parser combinator" library that defines a few fundamental
|
|
||||||
parsers and a few ways to compose them into more powerful parsers.
|
|
||||||
|
|
||||||
For our purposes here, a *parser* is any object that can attempt to recognize
|
|
||||||
an instance of some syntax from an input stream. It may succeed or fail.
|
|
||||||
On success, it may return some semantic value to its caller.
|
|
||||||
|
|
||||||
In C++ terms, a parser is any instance of a class that
|
|
||||||
(1) has a constexpr default constructor,
|
|
||||||
(2) defines a resultType typedef, and
|
|
||||||
(3) provides a member or static function
|
|
||||||
|
|
||||||
std::optional<resultType> Parse(ParseState *) const;
|
|
||||||
static std::optional<resultType> Parse(ParseState *);
|
|
||||||
|
|
||||||
that accepts a pointer to a ParseState as its argument and returns
|
|
||||||
a std::optional<resultType> as a result, with the presence or absence
|
|
||||||
of a value in the std::optional<> signifying success or failure
|
|
||||||
respectively.
|
|
||||||
|
|
||||||
The resultType of a parser is typically the class type of some particular
|
|
||||||
node type in the parse tree.
|
|
||||||
|
|
||||||
ParseState is a class that encapsulates a position in the source stream,
|
|
||||||
collects messages, and holds a few state flags that can affect tokenization
|
|
||||||
(e.g., are we in a character literal?). Instances of ParseState are
|
|
||||||
independent and complete -- they are cheap to duplicate when necessary to
|
|
||||||
implement backtracking.
|
|
||||||
|
|
||||||
The constexpr default constructor of a parser is important. The functions
|
|
||||||
(below) that operate on instances of parsers are themselves all constexpr.
|
|
||||||
This use of compile-time expressions allows the entirety of a recursive
|
|
||||||
descent parser for a language to be constructed at compilation time through
|
|
||||||
the use of templates.
|
|
||||||
|
|
||||||
These objects and functions are (or return) the fundamental parsers:
|
|
||||||
|
|
||||||
ok always succeeds without advancing
|
|
||||||
pure(x) always succeeds without advancing, returning some value x
|
|
||||||
fail<T>(msg) always fails with the given message; optionally typed
|
|
||||||
cut always fails, with no message
|
|
||||||
guard(pred) succeeds if the predicate expression evaluates to true
|
|
||||||
rawNextChar returns the next raw character; fails at EOF
|
|
||||||
cookedNextChar returns the next character after preprocessing, skipping
|
|
||||||
Fortran line continuations and comments; fails at EOF
|
|
||||||
|
|
||||||
These functions and operators generate new parsers from combinations of
|
|
||||||
other parsers:
|
|
||||||
|
|
||||||
!p ok if p fails, cut if p succeeds
|
|
||||||
p >> q match p, then q, returning q's value
|
|
||||||
p / q match p, then q, returning p's value
|
|
||||||
p || q match p if it succeeds, else match q; p and q must be same type
|
|
||||||
lookAhead(p) succeeds iff p does, but doesn't modify state
|
|
||||||
attempt(p) succeeds iff p does, safely preserving state on failure
|
|
||||||
many(p) a greedy sequence of zero or more nonempty successes of p;
|
|
||||||
returns std::list<> of values
|
|
||||||
some(p) a greedy sequence of one or more successes of p
|
|
||||||
skipMany(p) same as many(p), but discards result (performance optimizer)
|
|
||||||
maybe(p) try to match p, returning optional<T>
|
|
||||||
defaulted(p) matches p, or else returns a default-constructed instance
|
|
||||||
of p's resultType
|
|
||||||
nonemptySeparated(p, q) repeatedly match p q p q p q ... p, returning
|
|
||||||
the values of the p's
|
|
||||||
extension(p) parses p if strict standard compliance is disabled,
|
|
||||||
with a warning if nonstandard usage warnings are enabled
|
|
||||||
deprecated(p) parses p if strict standard compliance is disabled,
|
|
||||||
with a warning if deprecated usage warnings are enabled
|
|
||||||
inContext("...", p) run p within an error message context
|
|
||||||
|
|
||||||
Note that "a >> b >> c / d / e" matches a sequence of five parsers,
|
|
||||||
but returns only the result that was obtained by matching c.
|
|
||||||
|
|
||||||
The following "applicative" combinators modify or combine the values returned
|
|
||||||
by parsers:
|
|
||||||
|
|
||||||
construct<T>{}(p1, p2, ...)
|
|
||||||
matches zero or more parsers in succession, collecting their
|
|
||||||
results and then passing them with move semantics to a
|
|
||||||
constructor for the type T if they all succeed
|
|
||||||
applyFunction(f, p1, p2, ...)
|
|
||||||
matches one or more parsers in succession, collecting their
|
|
||||||
results and passing them as rvalue reference arguments to
|
|
||||||
some function, returning its result
|
|
||||||
applyLambda([](&&x){}, p1, p2, ...)
|
|
||||||
is the same thing, but for lambdas and other function objects
|
|
||||||
applyMem(mf, p1, p2, ...)
|
|
||||||
is the same thing, but invokes a member function of the
|
|
||||||
result of the first parser
|
|
||||||
|
|
||||||
These are non-advancing state inquiry and update parsers:
|
|
||||||
|
|
||||||
getColumn returns 1-based column position
|
|
||||||
inCharLiteral succeeds under withinCharLiteral
|
|
||||||
inFortran succeeds unless in a preprocessing directive
|
|
||||||
inFixedForm succeeds in fixed-form source
|
|
||||||
setInFixedForm sets the fixed-form flag, returns prior value
|
|
||||||
columns returns the 1-based column number after which source is clipped
|
|
||||||
setColumns(c) sets "columns", returns prior value
|
|
||||||
|
|
||||||
When parsing depends on the result values of earlier parses, the
|
|
||||||
"monadic bind" combinator is available (but please try to avoid using it,
|
|
||||||
as it makes automatic analysis of the grammar difficult):
|
|
||||||
|
|
||||||
p >>= f match p, yielding some value x on success, then match the
|
|
||||||
parser returned from the function call f(x)
|
|
||||||
|
|
||||||
Last, we have these basic parsers on which the actual grammar of the Fortran
|
|
||||||
is built. All of the following parsers consume characters acquired from
|
|
||||||
"cookedNextChar".
|
|
||||||
|
|
||||||
spaces always succeeds after consuming any spaces or tabs
|
|
||||||
digit matches one cooked decimal digit (0-9)
|
|
||||||
letter matches one cooked letter (A-Z)
|
|
||||||
CharMatch<'c'>{} matches one specific cooked character
|
|
||||||
"..."_tok match contents, skipping spaces before and after, and
|
|
||||||
with multiple spaces accepted for any internal space
|
|
||||||
"..." >> p the tok suffix is optional on a string before >> and after /
|
|
||||||
parenthesized(p) shorthand for "(" >> p / ")"
|
|
||||||
bracketed(p) shorthand for "[" >> p / "]"
|
|
||||||
|
|
||||||
withinCharLiteral(p) apply p, tokenizing for CHARACTER/Hollerith literals
|
|
||||||
nonEmptyListOf(p) matches a comma-separated list of one or more p's
|
|
||||||
optionalListOf(p) ditto, but can be empty
|
|
||||||
|
|
||||||
"..."_debug emit the string and succeed, for parser debugging
|
|
Loading…
Reference in New Issue