[flang] Convert parser combinator documentation file to Markdown.

Original-commit: flang-compiler/f18@263865c97a
2018-02-05 16:53:38 -08:00 · 2018-02-05 16:53:38 -08:00 · 1e69ed0c1b
parent 94c26b688e
commit 1e69ed0c1b
3 changed files with 148 additions and 128 deletions
--- a/flang/C++style.md
+++ b/flang/C++style.md
@ -20,6 +20,8 @@ in foo.cc.)
 1. In the source file "foo.cc", put the #include of "foo.h" first.
 Then #include other project headers in alphabetic order; then C++ standard
 headers, also alphabetically; then C and system headers.
 1. Don't include the standard iostream header.  If you need it for debugging,
 remove the inclusion before committing.
 ### Naming
 1. C++ names that correspond to STL names should look like those STL names
 (e.g., *clear()* and *size()* member functions in a class that implements
@ -40,7 +42,7 @@ especially when you can declare them directly in a for()/while()/if()
 condition.  Otherwise, prefer complete English words to abbreviations
 when creating names.
 ### Commentary
-1. Use // for all comments except for short notes within statements.
+1. Use // for all comments except for short notes within expressions.
 1. When // follows code on a line, precede it with two spaces.
 1. Comments should matter.  Assume that the reader knows current C++ at least as
 well as you do and avoid distracting her by calling out usage of new
--- a/flang/ParserCombinators.md
+++ b/flang/ParserCombinators.md
@ -0,0 +1,145 @@
 ## Concept
 The Fortran language recognizer here can be classified as an LL recursive
 descent parser.  It is composed from a *parser combinator* library that
 defines a few fundamental parsers and a few ways to compose them into more
 powerful parsers.
 For our purposes here, a *parser* is any object that can attempt to recognize
 an instance of some syntax from an input stream.  It may succeed or fail.
 On success, it may return some semantic value to its caller.
 In C++ terms, a parser is any instance of a class that
 1. has a *constexpr* default constructor,
 1. defines a resultType type, and
 1. provides a member or static function that accepts a pointer to a
 ParseState as its argument and returns a std::optional<resultType> as a
 result, with the presence or absence of a value in the std::optional<>
 signifying success or failure, respectively.
 > std::optional<resultType> Parse(ParseState *) const;
 The resultType of a parser is typically the class type of some particular
 node type in the parse tree.
 *ParseState* is a class that encapsulates a position in the source stream,
 collects messages, and holds a few state flags that determive tokenization
 (e.g., are we in a character literal?).  Instances of *ParseState* are
 independent and complete -- they are cheap to duplicate whenever necessary to
 implement backtracking.
 The constexpr default constructor of a parser is important.  The functions
 (below) that operate on instances of parsers are themselves all constexpr.
 This use of compile-time expressions allows the entirety of a recursive
 descent parser for a language to be constructed at compilation time through
 the use of templates.
 ### Fundamental Predefined Parsers
 These objects and functions are (or return) the fundamental parsers:
 * *ok* is a trivial parser that always succeeds without advancing.
 * "pure(x)" returns a trivial parser that always succeeds without advancing,
  returning some value *x*.
 * "fail<T>(msg)" denotes a trivial parser that always fails, emitting the
  given message.  The template parameter is the type of the value that
  the parser never returns.
 * *cut* is a trivial parser that always fails silently.
 * "guard(pred)" returns a parser that succeeds if and only if the predicate
  expression evaluates to true.
 * *rawNextChar* returns the next raw character, and fails at EOF.
 * *cookedNextChar* returns the next character after preprocessing, skipping
  Fortran line continuations and comments; it also fails at EOF
 ### Combinators
 These functions and operators combine parsers to generate new parsers.
 * "!p" succeeds if p fails, and fails if p succeeds.
 * "p >> q" fails if p does, otherwise running q and returning its value when
  it succeeds.
 * "p / q" fails if p does, otherwise running q and returning *p's* value
  if q succeeds.
 * "p || q" succeeds if p does, otherwise running q.  The two parsers must
  have the same type, and the value returned by the first succeeding parser
  is the value of the combination.
 * "lookAhead(p)" succeeds if p does, but doesn't modify any state.
 * "attempt(p)" succeeds if p does, safely preserving state on failure.
 * "many(p)" recognizes a greedy sequence of zero or more nonempty successes
  of *p*, and returns std::list<> of their values.  It always succeeds.
 * "some(p)" recognized a greedy sequence of one or more successes of *p*.
  It fails if p immediately fails.
 * "skipMany(p)" is the same as "many(p)", but it discards the results.
 * "maybe(p)" tries to match *p*, returning an "std::optional<T>" value.
  It always succeeds.
 * "defaulted(p)" matches *p*, and when *p* fails it returns a
  default-constructed instance of *p*'s resultType.  It always succeeds.
 * "nonemptySeparated(p, q)" repeatedly matches "p q p q p q ... p",
  returning a std::list<> of only the values of the p's.  It fails if
  *p* immediately fails.
 * "extension(p)" parses *p* if strict standard compliance is disabled,
   or with a warning if nonstandard usage warnings are enabled.
 * "deprecated(p)" parses *p* if strict standard compliance is disabled,
  with a warning if deprecated usage warnings are enabled.
 * "inContext(..., p)" runs *p* within an error message context.
 Note that "a >> b >> c / d / e" matches a sequence of five parsers,
 but returns only the result that was obtained by matching c.
 ### Applicatives
 The following *applicative* combinators combine parsers and modify or
 collect the values that they return.
 * "construct<T>{}(p1, p2, ...)" matches zero or more parsers in succession,
  collecting their results and then passing them with move semantics to a
  constructor for the type *T* if they all succeed.
 * "applyFunction(f, p1, p2, ...)" matches one or more parsers in succession,
  collecting their results and passing them as rvalue reference arguments to
  some function, returning its result.
 * "applyLambda([](&&x){}, p1, p2, ...)" is the same thing, but for lambdas
  and other function objects.
 * "applyMem(mf, p1, p2, ...)" is the same thing, but invokes a member
  function of the result of the first parser for updates in place.
 ### Non-Advancing State Inquiries and Updates
 These are non-advancing state inquiry and update parsers:
 * *getColumn* returns the 1-based column position.
 * *inCharLiteral* succeeds under withinCharLiteral.
 * *inFortran* succeeds unless in a preprocessing directive.
 * *inFixedForm* succeeds in fixed-form source.
 * *setInFixedForm* sets the fixed-form flag, returning its prior value.
 * *columns* returns the 1-based column number after which source is clipped.
 * "setColumns(c)" sets the column limit and returns its prior value.
 ### Monadic Combination
 When parsing depends on the result values of earlier parses, the
 "monadic bind" combinator is available.
 Please try to avoid using it, as it makes automatic analysis of the
 grammar difficult.
 It has the syntax "p >>= f", and it constructs a parser that matches p,
 yielding some value x on success, then matches the parser returned from
 the function call "f(x)".
 ### Token Parsers
 Last, we have these basic parsers on which the actual grammar of the Fortran
 is built.  All of the following parsers consume characters acquired from
 *cookedNextChar*.
 * *spaces* always succeeds after consuming any spaces or tabs
 * *digit* matches one cooked decimal digit (0-9)
 * *letter* matches one cooked letter (A-Z)
 * "CharMatch<'c'>{}" matches one specific cooked character.
 * "..."_tok match the content of the string, skipping spaces before and
  after, and with multiple spaces accepted for any internal space.
  (Note that the _tok suffix is optional when the parser appears before
  the combinator ">>" or after "/".)
 * "parenthesized(p)" is shorthand for "(" >> p / ")".
 * "bracketed(p)" is shorthand for "[" >> p / "]".
 * "withinCharLiteral(p)" applies the parser *p*, tokenizing for
  CHARACTER/Hollerith literals.
 * "nonEmptyListOf(p)" matches a comma-separated list of one or more
  instances of *p*.
 * "optionalListOf(p)" is the same thing, but can be empty, and always succeeds.
 ### Debugging Parser
 Last, the parser "..."_debug emit the string to the standard error and succeeds.
 It is useful for tracing while debugging a parser but should obviously not
 be committed for production code.
--- a/flang/parser-combinators.txt
+++ b/flang/parser-combinators.txt
@ -1,127 +0,0 @@
 The Fortran language recognizer here is an LL recursive descent parser
 composed from a "parser combinator" library that defines a few fundamental
 parsers and a few ways to compose them into more powerful parsers.
 For our purposes here, a *parser* is any object that can attempt to recognize
 an instance of some syntax from an input stream.  It may succeed or fail.
 On success, it may return some semantic value to its caller.
 In C++ terms, a parser is any instance of a class that
  (1) has a constexpr default constructor,
  (2) defines a resultType typedef, and
  (3) provides a member or static function
        std::optional<resultType> Parse(ParseState *) const;
        static std::optional<resultType> Parse(ParseState *);
      that accepts a pointer to a ParseState as its argument and returns
      a std::optional<resultType> as a result, with the presence or absence
      of a value in the std::optional<> signifying success or failure
      respectively.
 The resultType of a parser is typically the class type of some particular
 node type in the parse tree.
 ParseState is a class that encapsulates a position in the source stream,
 collects messages, and holds a few state flags that can affect tokenization
 (e.g., are we in a character literal?).  Instances of ParseState are
 independent and complete -- they are cheap to duplicate when necessary to
 implement backtracking.
 The constexpr default constructor of a parser is important.  The functions
 (below) that operate on instances of parsers are themselves all constexpr.
 This use of compile-time expressions allows the entirety of a recursive
 descent parser for a language to be constructed at compilation time through
 the use of templates.
 These objects and functions are (or return) the fundamental parsers:
  ok           always succeeds without advancing
  pure(x)      always succeeds without advancing, returning some value x
  fail<T>(msg)  always fails with the given message; optionally typed
  cut          always fails, with no message
  guard(pred)  succeeds if the predicate expression evaluates to true
  rawNextChar  returns the next raw character; fails at EOF
  cookedNextChar returns the next character after preprocessing, skipping
                 Fortran line continuations and comments; fails at EOF
 These functions and operators generate new parsers from combinations of
 other parsers:
  !p           ok if p fails, cut if p succeeds
  p >> q       match p, then q, returning q's value
  p / q        match p, then q, returning p's value
  p || q       match p if it succeeds, else match q; p and q must be same type
  lookAhead(p) succeeds iff p does, but doesn't modify state
  attempt(p)   succeeds iff p does, safely preserving state on failure
  many(p)      a greedy sequence of zero or more nonempty successes of p;
                 returns std::list<> of values
  some(p)      a greedy sequence of one or more successes of p
  skipMany(p)  same as many(p), but discards result (performance optimizer)
  maybe(p)     try to match p, returning optional<T>
  defaulted(p) matches p, or else returns a default-constructed instance
                     of p's resultType
  nonemptySeparated(p, q) repeatedly match p q p q p q ... p, returning
                            the values of the p's
  extension(p) parses p if strict standard compliance is disabled,
                 with a warning if nonstandard usage warnings are enabled
  deprecated(p) parses p if strict standard compliance is disabled,
                 with a warning if deprecated usage warnings are enabled
  inContext("...", p)  run p within an error message context
 Note that "a >> b >> c / d / e" matches a sequence of five parsers,
 but returns only the result that was obtained by matching c.
 The following "applicative" combinators modify or combine the values returned
 by parsers:
  construct<T>{}(p1, p2, ...)
               matches zero or more parsers in succession, collecting their
               results and then passing them with move semantics to a
               constructor for the type T if they all succeed
  applyFunction(f, p1, p2, ...)
               matches one or more parsers in succession, collecting their
               results and passing them as rvalue reference arguments to
               some function, returning its result
  applyLambda([](&&x){}, p1, p2, ...)
               is the same thing, but for lambdas and other function objects
  applyMem(mf, p1, p2, ...)
               is the same thing, but invokes a member function of the
               result of the first parser
 These are non-advancing state inquiry and update parsers:
  getColumn    returns 1-based column position
  inCharLiteral succeeds under withinCharLiteral
  inFortran    succeeds unless in a preprocessing directive
  inFixedForm  succeeds in fixed-form source
  setInFixedForm  sets the fixed-form flag, returns prior value
  columns      returns the 1-based column number after which source is clipped
  setColumns(c) sets "columns", returns prior value
 When parsing depends on the result values of earlier parses, the
 "monadic bind" combinator is available (but please try to avoid using it,
 as it makes automatic analysis of the grammar difficult):
  p >>= f      match p, yielding some value x on success, then match the
                 parser returned from the function call f(x)
 Last, we have these basic parsers on which the actual grammar of the Fortran
 is built.  All of the following parsers consume characters acquired from
 "cookedNextChar".
  spaces       always succeeds after consuming any spaces or tabs
  digit        matches one cooked decimal digit (0-9)
  letter       matches one cooked letter (A-Z)
  CharMatch<'c'>{} matches one specific cooked character
  "..."_tok    match contents, skipping spaces before and after, and
                 with multiple spaces accepted for any internal space
  "..." >> p   the tok suffix is optional on a string before >> and after /
  parenthesized(p)  shorthand for "(" >> p / ")"
  bracketed(p) shorthand for "[" >> p / "]"
  withinCharLiteral(p) apply p, tokenizing for CHARACTER/Hollerith literals
  nonEmptyListOf(p) matches a comma-separated list of one or more p's
  optionalListOf(p) ditto, but can be empty
  "..."_debug  emit the string and succeed, for parser debugging