[flang] Add parser-combinators.txt documentation file.

Original-commit: flang-compiler/f18@c4634a44b9
2018-01-29 15:39:42 -08:00 · 2018-01-29 15:39:42 -08:00 · e4e52073c2
parent 6ae0a5aca6
commit e4e52073c2
1 changed files with 127 additions and 0 deletions
--- a/flang/parser-combinators.txt
+++ b/flang/parser-combinators.txt
@ -0,0 +1,127 @@
+The Fortran language recognizer here is an LL recursive descent parser
+composed from a "parser combinator" library that defines a few fundamental
+parsers and a few ways to compose them into more powerful parsers.
+
+For our purposes here, a *parser* is any object that can attempt to recognize
+an instance of some syntax from an input stream.  It may succeed or fail.
+On success, it may return some semantic value to its caller.
+
+In C++ terms, a parser is any instance of a class that
+  (1) has a constexpr default constructor,
+  (2) defines a resultType typedef, and
+  (3) provides a member or static function
+
+        std::optional<resultType> Parse(ParseState *) const;
+        static std::optional<resultType> Parse(ParseState *);
+
+      that accepts a pointer to a ParseState as its argument and returns
+      a std::optional<resultType> as a result, with the presence or absence
+      of a value in the std::optional<> signifying success or failure
+      respectively.
+
+The resultType of a parser is typically the class type of some particular
+node type in the parse tree.
+
+ParseState is a class that encapsulates a position in the source stream,
+collects messages, and holds a few state flags that can affect tokenization
+(e.g., are we in a character literal?).  Instances of ParseState are
+independent and complete -- they are cheap to duplicate when necessary to
+implement backtracking.
+
+The constexpr default constructor of a parser is important.  The functions
+(below) that operate on instances of parsers are themselves all constexpr.
+This use of compile-time expressions allows the entirety of a recursive
+descent parser for a language to be constructed at compilation time through
+the use of templates.
+
+These objects and functions are (or return) the fundamental parsers:
+
+  ok           always succeeds without advancing
+  pure(x)      always succeeds without advancing, returning some value x
+  fail<T>(msg)  always fails with the given message; optionally typed
+  cut          always fails, with no message
+  guard(pred)  succeeds if the predicate expression evaluates to true
+  rawNextChar  returns the next raw character; fails at EOF
+  cookedNextChar returns the next character after preprocessing, skipping
+                 Fortran line continuations and comments; fails at EOF
+
+These functions and operators generate new parsers from combinations of
+other parsers:
+
+  !p           ok if p fails, cut if p succeeds
+  p >> q       match p, then q, returning q's value
+  p / q        match p, then q, returning p's value
+  p || q       match p if it succeeds, else match q; p and q must be same type
+  lookAhead(p) succeeds iff p does, but doesn't modify state
+  attempt(p)   succeeds iff p does, safely preserving state on failure
+  many(p)      a greedy sequence of zero or more nonempty successes of p;
+                 returns std::list<> of values
+  some(p)      a greedy sequence of one or more successes of p
+  skipMany(p)  same as many(p), but discards result (performance optimizer)
+  maybe(p)     try to match p, returning optional<T>
+  defaulted(p) matches p, or else returns a default-constructed instance
+                     of p's resultType
+  nonemptySeparated(p, q) repeatedly match p q p q p q ... p, returning
+                            the values of the p's
+  extension(p) parses p if strict standard compliance is disabled,
+                 with a warning if nonstandard usage warnings are enabled
+  deprecated(p) parses p if strict standard compliance is disabled,
+                 with a warning if deprecated usage warnings are enabled
+  inContext("...", p)  run p within an error message context
+
+Note that "a >> b >> c / d / e" matches a sequence of five parsers,
+but returns only the result that was obtained by matching c.
+
+The following "applicative" combinators modify or combine the values returned
+by parsers:
+
+  construct<T>{}(p1, p2, ...)
+               matches zero or more parsers in succession, collecting their
+               results and then passing them with move semantics to a
+               constructor for the type T if they all succeed
+  applyFunction(f, p1, p2, ...)
+               matches one or more parsers in succession, collecting their
+               results and passing them as rvalue reference arguments to
+               some function, returning its result
+  applyLambda([](&&x){}, p1, p2, ...)
+               is the same thing, but for lambdas and other function objects
+  applyMem(mf, p1, p2, ...)
+               is the same thing, but invokes a member function of the
+               result of the first parser
+
+These are non-advancing state inquiry and update parsers:
+
+  getColumn    returns 1-based column position
+  inCharLiteral succeeds under withinCharLiteral
+  inFortran    succeeds unless in a preprocessing directive
+  inFixedForm  succeeds in fixed-form source
+  setInFixedForm  sets the fixed-form flag, returns prior value
+  columns      returns the 1-based column number after which source is clipped
+  setColumns(c) sets "columns", returns prior value
+
+When parsing depends on the result values of earlier parses, the
+"monadic bind" combinator is available (but please try to avoid using it,
+as it makes automatic analysis of the grammar difficult):
+
+  p >>= f      match p, yielding some value x on success, then match the
+                 parser returned from the function call f(x)
+
+Last, we have these basic parsers on which the actual grammar of the Fortran
+is built.  All of the following parsers consume characters acquired from
+"cookedNextChar".
+
+  spaces       always succeeds after consuming any spaces or tabs
+  digit        matches one cooked decimal digit (0-9)
+  letter       matches one cooked letter (A-Z)
+  CharMatch<'c'>{} matches one specific cooked character
+  "..."_tok    match contents, skipping spaces before and after, and
+                 with multiple spaces accepted for any internal space
+  "..." >> p   the tok suffix is optional on a string before >> and after /
+  parenthesized(p)  shorthand for "(" >> p / ")"
+  bracketed(p) shorthand for "[" >> p / "]"
+
+  withinCharLiteral(p) apply p, tokenizing for CHARACTER/Hollerith literals
+  nonEmptyListOf(p) matches a comma-separated list of one or more p's
+  optionalListOf(p) ditto, but can be empty
+
+  "..."_debug  emit the string and succeed, for parser debugging