llvm-project/clang-tools-extra/pseudo
Sam McCall bd5cc6575b [pseudo] Start rules are `_ := start-symbol EOF`, improve recovery.
Previously we were calling glrRecover() ad-hoc at the end of input.
Two main problems with this:
 - glrRecover() on two separate code paths is inelegant
 - We may have to recover several times in succession (e.g. to exit from
   nested scopes), so we need a loop at end-of-file
Having an actual shift action for an EOF terminal allows us to handle
both concerns in the main shift/recover/reduce loop.

This revealed a recovery design bug where recovery could enter a loop by
repeatedly choosing the same parent to identically recover from.
Addressed this by allowing each node to be used as a recovery base once.

Differential Revision: https://reviews.llvm.org/D130550
2022-08-19 16:49:37 +02:00
..
benchmarks [pseudo] Implement guard extension. 2022-07-05 15:55:15 +02:00
fuzzer [pseudo] Implement guard extension. 2022-07-05 15:55:15 +02:00
gen [clang-tools-extra] Fixed a number of typos 2022-08-01 15:32:25 +02:00
include [pseudo] Start rules are `_ := start-symbol EOF`, improve recovery. 2022-08-19 16:49:37 +02:00
lib [pseudo] Start rules are `_ := start-symbol EOF`, improve recovery. 2022-08-19 16:49:37 +02:00
test [pseudo] Start rules are `_ := start-symbol EOF`, improve recovery. 2022-08-19 16:49:37 +02:00
tool [pseudo] Add ambiguity & unparseability metrics to -print-statistics 2022-07-22 10:35:06 +02:00
unittests [pseudo] Start rules are `_ := start-symbol EOF`, improve recovery. 2022-08-19 16:49:37 +02:00
CMakeLists.txt [pseudo] A basic implementation of compiling cxx grammar at build time. 2022-05-25 11:26:06 +02:00
DesignNotes.md [pseudo] Design notes from discussion today. NFC 2022-05-18 00:08:47 +02:00
README.md Reapply [pseudo] Move pseudoparser from clang to clang-tools-extra" 2022-03-16 01:10:55 +01:00

README.md

clang pseudoparser

This directory implements an approximate heuristic parser for C++, based on the clang lexer, the C++ grammar, and the GLR parsing algorithm.

It parses a file in isolation, without reading its included headers. The result is a strict syntactic tree whose structure follows the C++ grammar. There is no semantic analysis, apart from guesses to disambiguate the parse. Disambiguation can optionally be guided by an AST or a symbol index.

For now, the best reference on intended scope is the design proposal, with further discussion on the RFC.

Dependencies between pseudoparser and clang

Dependencies are limited because they don't make sense, but also to avoid placing a burden on clang mantainers.

The pseudoparser reuses the clang lexer (clangLex and clangBasic libraries) but not the higher-level libraries (Parse, Sema, AST, Frontend...).

When the pseudoparser should be used together with an AST (e.g. to guide disambiguation), this is a separate "bridge" library that depends on both.

Clang does not depend on the pseudoparser at all. If this seems useful in future it should be discussed by RFC.

Parity between pseudoparser and clang

The pseudoparser aims to understand real-world code, and particularly the languages and extensions supported by Clang.

However we don't try to keep these in lockstep: there's no expectation that Clang parser changes are accompanied by pseudoparser changes or vice versa.