forked from OSchip/llvm-project
[flang] Add parser-combinators.txt documentation file.
Original-commit: flang-compiler/f18@c4634a44b9
This commit is contained in:
parent
6ae0a5aca6
commit
e4e52073c2
|
@ -0,0 +1,127 @@
|
|||
The Fortran language recognizer here is an LL recursive descent parser
|
||||
composed from a "parser combinator" library that defines a few fundamental
|
||||
parsers and a few ways to compose them into more powerful parsers.
|
||||
|
||||
For our purposes here, a *parser* is any object that can attempt to recognize
|
||||
an instance of some syntax from an input stream. It may succeed or fail.
|
||||
On success, it may return some semantic value to its caller.
|
||||
|
||||
In C++ terms, a parser is any instance of a class that
|
||||
(1) has a constexpr default constructor,
|
||||
(2) defines a resultType typedef, and
|
||||
(3) provides a member or static function
|
||||
|
||||
std::optional<resultType> Parse(ParseState *) const;
|
||||
static std::optional<resultType> Parse(ParseState *);
|
||||
|
||||
that accepts a pointer to a ParseState as its argument and returns
|
||||
a std::optional<resultType> as a result, with the presence or absence
|
||||
of a value in the std::optional<> signifying success or failure
|
||||
respectively.
|
||||
|
||||
The resultType of a parser is typically the class type of some particular
|
||||
node type in the parse tree.
|
||||
|
||||
ParseState is a class that encapsulates a position in the source stream,
|
||||
collects messages, and holds a few state flags that can affect tokenization
|
||||
(e.g., are we in a character literal?). Instances of ParseState are
|
||||
independent and complete -- they are cheap to duplicate when necessary to
|
||||
implement backtracking.
|
||||
|
||||
The constexpr default constructor of a parser is important. The functions
|
||||
(below) that operate on instances of parsers are themselves all constexpr.
|
||||
This use of compile-time expressions allows the entirety of a recursive
|
||||
descent parser for a language to be constructed at compilation time through
|
||||
the use of templates.
|
||||
|
||||
These objects and functions are (or return) the fundamental parsers:
|
||||
|
||||
ok always succeeds without advancing
|
||||
pure(x) always succeeds without advancing, returning some value x
|
||||
fail<T>(msg) always fails with the given message; optionally typed
|
||||
cut always fails, with no message
|
||||
guard(pred) succeeds if the predicate expression evaluates to true
|
||||
rawNextChar returns the next raw character; fails at EOF
|
||||
cookedNextChar returns the next character after preprocessing, skipping
|
||||
Fortran line continuations and comments; fails at EOF
|
||||
|
||||
These functions and operators generate new parsers from combinations of
|
||||
other parsers:
|
||||
|
||||
!p ok if p fails, cut if p succeeds
|
||||
p >> q match p, then q, returning q's value
|
||||
p / q match p, then q, returning p's value
|
||||
p || q match p if it succeeds, else match q; p and q must be same type
|
||||
lookAhead(p) succeeds iff p does, but doesn't modify state
|
||||
attempt(p) succeeds iff p does, safely preserving state on failure
|
||||
many(p) a greedy sequence of zero or more nonempty successes of p;
|
||||
returns std::list<> of values
|
||||
some(p) a greedy sequence of one or more successes of p
|
||||
skipMany(p) same as many(p), but discards result (performance optimizer)
|
||||
maybe(p) try to match p, returning optional<T>
|
||||
defaulted(p) matches p, or else returns a default-constructed instance
|
||||
of p's resultType
|
||||
nonemptySeparated(p, q) repeatedly match p q p q p q ... p, returning
|
||||
the values of the p's
|
||||
extension(p) parses p if strict standard compliance is disabled,
|
||||
with a warning if nonstandard usage warnings are enabled
|
||||
deprecated(p) parses p if strict standard compliance is disabled,
|
||||
with a warning if deprecated usage warnings are enabled
|
||||
inContext("...", p) run p within an error message context
|
||||
|
||||
Note that "a >> b >> c / d / e" matches a sequence of five parsers,
|
||||
but returns only the result that was obtained by matching c.
|
||||
|
||||
The following "applicative" combinators modify or combine the values returned
|
||||
by parsers:
|
||||
|
||||
construct<T>{}(p1, p2, ...)
|
||||
matches zero or more parsers in succession, collecting their
|
||||
results and then passing them with move semantics to a
|
||||
constructor for the type T if they all succeed
|
||||
applyFunction(f, p1, p2, ...)
|
||||
matches one or more parsers in succession, collecting their
|
||||
results and passing them as rvalue reference arguments to
|
||||
some function, returning its result
|
||||
applyLambda([](&&x){}, p1, p2, ...)
|
||||
is the same thing, but for lambdas and other function objects
|
||||
applyMem(mf, p1, p2, ...)
|
||||
is the same thing, but invokes a member function of the
|
||||
result of the first parser
|
||||
|
||||
These are non-advancing state inquiry and update parsers:
|
||||
|
||||
getColumn returns 1-based column position
|
||||
inCharLiteral succeeds under withinCharLiteral
|
||||
inFortran succeeds unless in a preprocessing directive
|
||||
inFixedForm succeeds in fixed-form source
|
||||
setInFixedForm sets the fixed-form flag, returns prior value
|
||||
columns returns the 1-based column number after which source is clipped
|
||||
setColumns(c) sets "columns", returns prior value
|
||||
|
||||
When parsing depends on the result values of earlier parses, the
|
||||
"monadic bind" combinator is available (but please try to avoid using it,
|
||||
as it makes automatic analysis of the grammar difficult):
|
||||
|
||||
p >>= f match p, yielding some value x on success, then match the
|
||||
parser returned from the function call f(x)
|
||||
|
||||
Last, we have these basic parsers on which the actual grammar of the Fortran
|
||||
is built. All of the following parsers consume characters acquired from
|
||||
"cookedNextChar".
|
||||
|
||||
spaces always succeeds after consuming any spaces or tabs
|
||||
digit matches one cooked decimal digit (0-9)
|
||||
letter matches one cooked letter (A-Z)
|
||||
CharMatch<'c'>{} matches one specific cooked character
|
||||
"..."_tok match contents, skipping spaces before and after, and
|
||||
with multiple spaces accepted for any internal space
|
||||
"..." >> p the tok suffix is optional on a string before >> and after /
|
||||
parenthesized(p) shorthand for "(" >> p / ")"
|
||||
bracketed(p) shorthand for "[" >> p / "]"
|
||||
|
||||
withinCharLiteral(p) apply p, tokenizing for CHARACTER/Hollerith literals
|
||||
nonEmptyListOf(p) matches a comma-separated list of one or more p's
|
||||
optionalListOf(p) ditto, but can be empty
|
||||
|
||||
"..."_debug emit the string and succeed, for parser debugging
|
Loading…
Reference in New Issue