forked from OSchip/llvm-project
[flang] Convert parser combinator documentation file to Markdown.
Original-commit: flang-compiler/f18@263865c97a
This commit is contained in:
parent
94c26b688e
commit
1e69ed0c1b
|
@ -20,6 +20,8 @@ in foo.cc.)
|
|||
1. In the source file "foo.cc", put the #include of "foo.h" first.
|
||||
Then #include other project headers in alphabetic order; then C++ standard
|
||||
headers, also alphabetically; then C and system headers.
|
||||
1. Don't include the standard iostream header. If you need it for debugging,
|
||||
remove the inclusion before committing.
|
||||
### Naming
|
||||
1. C++ names that correspond to STL names should look like those STL names
|
||||
(e.g., *clear()* and *size()* member functions in a class that implements
|
||||
|
@ -40,7 +42,7 @@ especially when you can declare them directly in a for()/while()/if()
|
|||
condition. Otherwise, prefer complete English words to abbreviations
|
||||
when creating names.
|
||||
### Commentary
|
||||
1. Use // for all comments except for short notes within statements.
|
||||
1. Use // for all comments except for short notes within expressions.
|
||||
1. When // follows code on a line, precede it with two spaces.
|
||||
1. Comments should matter. Assume that the reader knows current C++ at least as
|
||||
well as you do and avoid distracting her by calling out usage of new
|
||||
|
|
|
@ -0,0 +1,145 @@
|
|||
## Concept
|
||||
The Fortran language recognizer here can be classified as an LL recursive
|
||||
descent parser. It is composed from a *parser combinator* library that
|
||||
defines a few fundamental parsers and a few ways to compose them into more
|
||||
powerful parsers.
|
||||
|
||||
For our purposes here, a *parser* is any object that can attempt to recognize
|
||||
an instance of some syntax from an input stream. It may succeed or fail.
|
||||
On success, it may return some semantic value to its caller.
|
||||
|
||||
In C++ terms, a parser is any instance of a class that
|
||||
1. has a *constexpr* default constructor,
|
||||
1. defines a resultType type, and
|
||||
1. provides a member or static function that accepts a pointer to a
|
||||
ParseState as its argument and returns a std::optional<resultType> as a
|
||||
result, with the presence or absence of a value in the std::optional<>
|
||||
signifying success or failure, respectively.
|
||||
|
||||
> std::optional<resultType> Parse(ParseState *) const;
|
||||
|
||||
The resultType of a parser is typically the class type of some particular
|
||||
node type in the parse tree.
|
||||
|
||||
*ParseState* is a class that encapsulates a position in the source stream,
|
||||
collects messages, and holds a few state flags that determive tokenization
|
||||
(e.g., are we in a character literal?). Instances of *ParseState* are
|
||||
independent and complete -- they are cheap to duplicate whenever necessary to
|
||||
implement backtracking.
|
||||
|
||||
The constexpr default constructor of a parser is important. The functions
|
||||
(below) that operate on instances of parsers are themselves all constexpr.
|
||||
This use of compile-time expressions allows the entirety of a recursive
|
||||
descent parser for a language to be constructed at compilation time through
|
||||
the use of templates.
|
||||
|
||||
### Fundamental Predefined Parsers
|
||||
These objects and functions are (or return) the fundamental parsers:
|
||||
|
||||
* *ok* is a trivial parser that always succeeds without advancing.
|
||||
* "pure(x)" returns a trivial parser that always succeeds without advancing,
|
||||
returning some value *x*.
|
||||
* "fail<T>(msg)" denotes a trivial parser that always fails, emitting the
|
||||
given message. The template parameter is the type of the value that
|
||||
the parser never returns.
|
||||
* *cut* is a trivial parser that always fails silently.
|
||||
* "guard(pred)" returns a parser that succeeds if and only if the predicate
|
||||
expression evaluates to true.
|
||||
* *rawNextChar* returns the next raw character, and fails at EOF.
|
||||
* *cookedNextChar* returns the next character after preprocessing, skipping
|
||||
Fortran line continuations and comments; it also fails at EOF
|
||||
|
||||
### Combinators
|
||||
These functions and operators combine parsers to generate new parsers.
|
||||
|
||||
* "!p" succeeds if p fails, and fails if p succeeds.
|
||||
* "p >> q" fails if p does, otherwise running q and returning its value when
|
||||
it succeeds.
|
||||
* "p / q" fails if p does, otherwise running q and returning *p's* value
|
||||
if q succeeds.
|
||||
* "p || q" succeeds if p does, otherwise running q. The two parsers must
|
||||
have the same type, and the value returned by the first succeeding parser
|
||||
is the value of the combination.
|
||||
* "lookAhead(p)" succeeds if p does, but doesn't modify any state.
|
||||
* "attempt(p)" succeeds if p does, safely preserving state on failure.
|
||||
* "many(p)" recognizes a greedy sequence of zero or more nonempty successes
|
||||
of *p*, and returns std::list<> of their values. It always succeeds.
|
||||
* "some(p)" recognized a greedy sequence of one or more successes of *p*.
|
||||
It fails if p immediately fails.
|
||||
* "skipMany(p)" is the same as "many(p)", but it discards the results.
|
||||
* "maybe(p)" tries to match *p*, returning an "std::optional<T>" value.
|
||||
It always succeeds.
|
||||
* "defaulted(p)" matches *p*, and when *p* fails it returns a
|
||||
default-constructed instance of *p*'s resultType. It always succeeds.
|
||||
* "nonemptySeparated(p, q)" repeatedly matches "p q p q p q ... p",
|
||||
returning a std::list<> of only the values of the p's. It fails if
|
||||
*p* immediately fails.
|
||||
* "extension(p)" parses *p* if strict standard compliance is disabled,
|
||||
or with a warning if nonstandard usage warnings are enabled.
|
||||
* "deprecated(p)" parses *p* if strict standard compliance is disabled,
|
||||
with a warning if deprecated usage warnings are enabled.
|
||||
* "inContext(..., p)" runs *p* within an error message context.
|
||||
|
||||
Note that "a >> b >> c / d / e" matches a sequence of five parsers,
|
||||
but returns only the result that was obtained by matching c.
|
||||
|
||||
### Applicatives
|
||||
The following *applicative* combinators combine parsers and modify or
|
||||
collect the values that they return.
|
||||
|
||||
* "construct<T>{}(p1, p2, ...)" matches zero or more parsers in succession,
|
||||
collecting their results and then passing them with move semantics to a
|
||||
constructor for the type *T* if they all succeed.
|
||||
* "applyFunction(f, p1, p2, ...)" matches one or more parsers in succession,
|
||||
collecting their results and passing them as rvalue reference arguments to
|
||||
some function, returning its result.
|
||||
* "applyLambda([](&&x){}, p1, p2, ...)" is the same thing, but for lambdas
|
||||
and other function objects.
|
||||
* "applyMem(mf, p1, p2, ...)" is the same thing, but invokes a member
|
||||
function of the result of the first parser for updates in place.
|
||||
|
||||
### Non-Advancing State Inquiries and Updates
|
||||
These are non-advancing state inquiry and update parsers:
|
||||
|
||||
* *getColumn* returns the 1-based column position.
|
||||
* *inCharLiteral* succeeds under withinCharLiteral.
|
||||
* *inFortran* succeeds unless in a preprocessing directive.
|
||||
* *inFixedForm* succeeds in fixed-form source.
|
||||
* *setInFixedForm* sets the fixed-form flag, returning its prior value.
|
||||
* *columns* returns the 1-based column number after which source is clipped.
|
||||
* "setColumns(c)" sets the column limit and returns its prior value.
|
||||
|
||||
### Monadic Combination
|
||||
When parsing depends on the result values of earlier parses, the
|
||||
"monadic bind" combinator is available.
|
||||
Please try to avoid using it, as it makes automatic analysis of the
|
||||
grammar difficult.
|
||||
It has the syntax "p >>= f", and it constructs a parser that matches p,
|
||||
yielding some value x on success, then matches the parser returned from
|
||||
the function call "f(x)".
|
||||
|
||||
### Token Parsers
|
||||
Last, we have these basic parsers on which the actual grammar of the Fortran
|
||||
is built. All of the following parsers consume characters acquired from
|
||||
*cookedNextChar*.
|
||||
|
||||
* *spaces* always succeeds after consuming any spaces or tabs
|
||||
* *digit* matches one cooked decimal digit (0-9)
|
||||
* *letter* matches one cooked letter (A-Z)
|
||||
* "CharMatch<'c'>{}" matches one specific cooked character.
|
||||
* "..."_tok match the content of the string, skipping spaces before and
|
||||
after, and with multiple spaces accepted for any internal space.
|
||||
(Note that the _tok suffix is optional when the parser appears before
|
||||
the combinator ">>" or after "/".)
|
||||
* "parenthesized(p)" is shorthand for "(" >> p / ")".
|
||||
* "bracketed(p)" is shorthand for "[" >> p / "]".
|
||||
* "withinCharLiteral(p)" applies the parser *p*, tokenizing for
|
||||
CHARACTER/Hollerith literals.
|
||||
* "nonEmptyListOf(p)" matches a comma-separated list of one or more
|
||||
instances of *p*.
|
||||
* "optionalListOf(p)" is the same thing, but can be empty, and always succeeds.
|
||||
|
||||
### Debugging Parser
|
||||
Last, the parser "..."_debug emit the string to the standard error and succeeds.
|
||||
It is useful for tracing while debugging a parser but should obviously not
|
||||
be committed for production code.
|
|
@ -1,127 +0,0 @@
|
|||
The Fortran language recognizer here is an LL recursive descent parser
|
||||
composed from a "parser combinator" library that defines a few fundamental
|
||||
parsers and a few ways to compose them into more powerful parsers.
|
||||
|
||||
For our purposes here, a *parser* is any object that can attempt to recognize
|
||||
an instance of some syntax from an input stream. It may succeed or fail.
|
||||
On success, it may return some semantic value to its caller.
|
||||
|
||||
In C++ terms, a parser is any instance of a class that
|
||||
(1) has a constexpr default constructor,
|
||||
(2) defines a resultType typedef, and
|
||||
(3) provides a member or static function
|
||||
|
||||
std::optional<resultType> Parse(ParseState *) const;
|
||||
static std::optional<resultType> Parse(ParseState *);
|
||||
|
||||
that accepts a pointer to a ParseState as its argument and returns
|
||||
a std::optional<resultType> as a result, with the presence or absence
|
||||
of a value in the std::optional<> signifying success or failure
|
||||
respectively.
|
||||
|
||||
The resultType of a parser is typically the class type of some particular
|
||||
node type in the parse tree.
|
||||
|
||||
ParseState is a class that encapsulates a position in the source stream,
|
||||
collects messages, and holds a few state flags that can affect tokenization
|
||||
(e.g., are we in a character literal?). Instances of ParseState are
|
||||
independent and complete -- they are cheap to duplicate when necessary to
|
||||
implement backtracking.
|
||||
|
||||
The constexpr default constructor of a parser is important. The functions
|
||||
(below) that operate on instances of parsers are themselves all constexpr.
|
||||
This use of compile-time expressions allows the entirety of a recursive
|
||||
descent parser for a language to be constructed at compilation time through
|
||||
the use of templates.
|
||||
|
||||
These objects and functions are (or return) the fundamental parsers:
|
||||
|
||||
ok always succeeds without advancing
|
||||
pure(x) always succeeds without advancing, returning some value x
|
||||
fail<T>(msg) always fails with the given message; optionally typed
|
||||
cut always fails, with no message
|
||||
guard(pred) succeeds if the predicate expression evaluates to true
|
||||
rawNextChar returns the next raw character; fails at EOF
|
||||
cookedNextChar returns the next character after preprocessing, skipping
|
||||
Fortran line continuations and comments; fails at EOF
|
||||
|
||||
These functions and operators generate new parsers from combinations of
|
||||
other parsers:
|
||||
|
||||
!p ok if p fails, cut if p succeeds
|
||||
p >> q match p, then q, returning q's value
|
||||
p / q match p, then q, returning p's value
|
||||
p || q match p if it succeeds, else match q; p and q must be same type
|
||||
lookAhead(p) succeeds iff p does, but doesn't modify state
|
||||
attempt(p) succeeds iff p does, safely preserving state on failure
|
||||
many(p) a greedy sequence of zero or more nonempty successes of p;
|
||||
returns std::list<> of values
|
||||
some(p) a greedy sequence of one or more successes of p
|
||||
skipMany(p) same as many(p), but discards result (performance optimizer)
|
||||
maybe(p) try to match p, returning optional<T>
|
||||
defaulted(p) matches p, or else returns a default-constructed instance
|
||||
of p's resultType
|
||||
nonemptySeparated(p, q) repeatedly match p q p q p q ... p, returning
|
||||
the values of the p's
|
||||
extension(p) parses p if strict standard compliance is disabled,
|
||||
with a warning if nonstandard usage warnings are enabled
|
||||
deprecated(p) parses p if strict standard compliance is disabled,
|
||||
with a warning if deprecated usage warnings are enabled
|
||||
inContext("...", p) run p within an error message context
|
||||
|
||||
Note that "a >> b >> c / d / e" matches a sequence of five parsers,
|
||||
but returns only the result that was obtained by matching c.
|
||||
|
||||
The following "applicative" combinators modify or combine the values returned
|
||||
by parsers:
|
||||
|
||||
construct<T>{}(p1, p2, ...)
|
||||
matches zero or more parsers in succession, collecting their
|
||||
results and then passing them with move semantics to a
|
||||
constructor for the type T if they all succeed
|
||||
applyFunction(f, p1, p2, ...)
|
||||
matches one or more parsers in succession, collecting their
|
||||
results and passing them as rvalue reference arguments to
|
||||
some function, returning its result
|
||||
applyLambda([](&&x){}, p1, p2, ...)
|
||||
is the same thing, but for lambdas and other function objects
|
||||
applyMem(mf, p1, p2, ...)
|
||||
is the same thing, but invokes a member function of the
|
||||
result of the first parser
|
||||
|
||||
These are non-advancing state inquiry and update parsers:
|
||||
|
||||
getColumn returns 1-based column position
|
||||
inCharLiteral succeeds under withinCharLiteral
|
||||
inFortran succeeds unless in a preprocessing directive
|
||||
inFixedForm succeeds in fixed-form source
|
||||
setInFixedForm sets the fixed-form flag, returns prior value
|
||||
columns returns the 1-based column number after which source is clipped
|
||||
setColumns(c) sets "columns", returns prior value
|
||||
|
||||
When parsing depends on the result values of earlier parses, the
|
||||
"monadic bind" combinator is available (but please try to avoid using it,
|
||||
as it makes automatic analysis of the grammar difficult):
|
||||
|
||||
p >>= f match p, yielding some value x on success, then match the
|
||||
parser returned from the function call f(x)
|
||||
|
||||
Last, we have these basic parsers on which the actual grammar of the Fortran
|
||||
is built. All of the following parsers consume characters acquired from
|
||||
"cookedNextChar".
|
||||
|
||||
spaces always succeeds after consuming any spaces or tabs
|
||||
digit matches one cooked decimal digit (0-9)
|
||||
letter matches one cooked letter (A-Z)
|
||||
CharMatch<'c'>{} matches one specific cooked character
|
||||
"..."_tok match contents, skipping spaces before and after, and
|
||||
with multiple spaces accepted for any internal space
|
||||
"..." >> p the tok suffix is optional on a string before >> and after /
|
||||
parenthesized(p) shorthand for "(" >> p / ")"
|
||||
bracketed(p) shorthand for "[" >> p / "]"
|
||||
|
||||
withinCharLiteral(p) apply p, tokenizing for CHARACTER/Hollerith literals
|
||||
nonEmptyListOf(p) matches a comma-separated list of one or more p's
|
||||
optionalListOf(p) ditto, but can be empty
|
||||
|
||||
"..."_debug emit the string and succeed, for parser debugging
|
Loading…
Reference in New Issue