llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	32aa35b504	Drop empty string literals from static_assert (NFC) Identified with modernize-unary-static-assert.	2022-09-03 11:17:47 -07:00
Sam McCall	56c54cf66b	[pseudo] Placeholder disambiguation strategy: always choose second Mostly mechanics here. Interesting decisions: - apply disambiguation in-place instead of copying the forest debatable, but even the final tree size is significant - split decide/apply into different functions - this allows the hard part (decide) to be tested non-destructively and combined with HTML forest easily - add non-const accessors to forest to enable apply - unit tests but no lit tests: my plan is to test actual C++ disambiguation heuristics with lit, generic disambiguation mechanics without the C++ grammar Differential Revision: https://reviews.llvm.org/D132487	2022-08-26 13:16:09 +02:00
Haojian Wu	f7dc91ad56	[pseudo] Eliminate a false parse of structured binding declaration. Using the guard to implement part of the rule https://eel.is/c++draft/dcl.pre#6. ``` void foo() { // can be parsed as // - structured-binding declaration (a false parse) // - assignment expression array[index] = value; } ``` Differential Revision: https://reviews.llvm.org/D132260	2022-08-23 15:25:52 +02:00
Haojian Wu	edb8fb2659	[pseudo] Fix HeadsPartition is not initialized correctly. The bug was that if we recover from the token 0, we will make the Heads empty (Line646), which results no recovery being applied. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D132388	2022-08-23 15:08:33 +02:00
Sam McCall	bd5cc6575b	[pseudo] Start rules are `_ := start-symbol EOF`, improve recovery. Previously we were calling glrRecover() ad-hoc at the end of input. Two main problems with this: - glrRecover() on two separate code paths is inelegant - We may have to recover several times in succession (e.g. to exit from nested scopes), so we need a loop at end-of-file Having an actual shift action for an EOF terminal allows us to handle both concerns in the main shift/recover/reduce loop. This revealed a recovery design bug where recovery could enter a loop by repeatedly choosing the same parent to identically recover from. Addressed this by allowing each node to be used as a recovery base once. Differential Revision: https://reviews.llvm.org/D130550	2022-08-19 16:49:37 +02:00
Haojian Wu	e32799d1d6	[pseudo] NFC, remove redundant ;	2022-08-19 15:55:19 +02:00
Sam McCall	605035bf45	[pseudo] Changes omitted from previous commit	2022-08-19 15:15:37 +02:00
Sam McCall	2cc7463c85	[pseudo] Perform unconstrained reduction prior to recovery. Our GLR uses lookahead: only perform reductions that might be consumed by the shift immediately following. However when shift fails and so reduce is followed by recovery instead, this restriction is incorrect and leads to missing heads. In turn this means certain recovery strategies can't be made to work. e.g. ``` ns := NAMESPACE { namespace-body } [recover=Skip] ns-body := namespace_opt ``` When `namespace { namespace {` is parsed, we can recover the inner `ns` (using the `Skip` strategy to ignore the missing `}`). However this `namespace` will not be reduced to a `namespace-body` as EOF is not in the follow-set, and so we are unable to recover the outer `ns`. This patch fixes this by tracking which heads were produced by constrained reduce, and discarding and rebuilding them before performing recovery. This is a prerequisite for the `Skip` strategy mentioned above, though there are some other limitations we need to address too. Reviewed By: hokein Differential Revision: https://reviews.llvm.org/D130523	2022-08-19 15:07:36 +02:00
Haojian Wu	6a9f79e102	[pseudo] Eliminate the type-name identifier ambiguities in the grammar. See https://reviews.llvm.org/D130626 for motivation. Identifier in the grammar has different categories (type-name, template-name, namespace-name), they requires semantic information to resolve. This patch is to eliminate the "local" ambiguities in type-name, and namespace-name, which gives us a performance boost of the parser: - eliminate all different type rules (class-name, enum-name, typedef-name), and fold them into a unified type-name, this removes the #1 type-name ambiguity, and gives us a big performance boost; - remove the namespace-alis rules, as they're hard and uninteresting; Note that we could eliminate more and gain more performance (like fold template-name, type-name, namespace together), but at current stage, we'd like keep all existing categories of the identifier (as they might assist in correlated disambiguation & keep the representation of important concepts uniform). \| file \|ambiguous nodes \| forest size \| glrParse performance \| \|SemaCodeComplete.cpp\| 11k -> 5.7K \| 10.4MB -> 7.9MB \| 7.1MB/s -> 9.98MB/s \| \| AST.cpp \| 1.3k -> 0.73K \| 0.99MB -> 0.77MB \| 6.7MB/s -> 8.4MB/s \| Differential Revision: https://reviews.llvm.org/D130747	2022-08-17 14:30:53 +02:00
Sam McCall	0b90e136ee	[pseudo] Style tweaks forgotten in D130337. NFC	2022-08-16 10:26:25 +02:00
Kazu Hirata	2b43bd0bd9	Remove unused forward declarations (NFC)	2022-08-13 12:55:47 -07:00
Haojian Wu	1828c75d5f	[pseudo] Apply the function-declarator to member functions. A followup patch of `d489b3807f`, but for member functions, this will eliminate a false parse of member declaration. Differential Revision: https://reviews.llvm.org/D131720	2022-08-12 13:49:01 +02:00
Haojian Wu	a1a1a78ac8	[pseudo] Eliminate an ambiguity for the empty member declaration. We happened to introduce a `member-declaration := ;` rule when inlining the `member-declaration := decl-specifier-seq_opt member-declarator-list_opt ;`. And with the `member-declaration := empty-declaration` rule, we had two parses of `;`. This patch is to restrict the grammar to eliminate the `member-declaration := ;` rule. Differential Revision: https://reviews.llvm.org/D131724	2022-08-12 13:46:26 +02:00
Haojian Wu	bf0e219d04	[pseudo] Use C++17 variant to simplify the DirectiveTree::Chunk class, NFC. Differential Revision: https://reviews.llvm.org/D131396	2022-08-11 14:27:38 +02:00
Haojian Wu	e935f7fd0c	[pseudo] Fix a bug in checking the duplicated grammar rules.	2022-08-11 13:16:01 +02:00
Haojian Wu	c2c5c39c40	[pseudo] Fix a suspicious usage of `sizeof(this)`. It should be `sizeof(*this)`.	2022-08-09 21:46:56 +02:00
Simon Pilgrim	d9e5462da6	[clang-pseudo] Forest.h - don't inherit from std::iterator Now that we've updated to C++17 MSVC gives very verbose warnings about not creating classes that inherit from std::iterator - use llvm::iterator_facade_base instead Fixes #57005	2022-08-09 10:18:53 +01:00
Gabriel Ravier	0ed2bd9311	[clang-tools-extra] Fixed a number of typos I went over the output of the following mess of a command: `(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z \| parallel --xargs -0 cat \| aspell list --mode=none --ignore-case \| grep -E '^[A-Za-z][a-z]*$' \| sort \| uniq -c \| sort -n \| grep -vE '.{25}' \| aspell pipe -W3 \| grep : \| cut -d' ' -f2 \| less)` and proceeded to spend a few days looking at it to find probable typos and fixed a few hundred of them in all of the llvm project (note, the ones I found are not anywhere near all of them, but it seems like a good start). Reviewed By: kadircet Differential Revision: https://reviews.llvm.org/D130826	2022-08-01 15:32:25 +02:00
Kazu Hirata	5bc0e7b73c	Convert for_each to range-based for loops (NFC)	2022-07-30 10:35:52 -07:00
Haojian Wu	6f6c40a875	[pseudo] Eliminate the false `::` nested-name-specifier ambiguity The solution is to favor the longest possible nest-name-specifier, and drop other alternatives by using the guard, per per C++ [basic.lookup.qual.general]. Motivated cases: ``` Foo::Foo() {}; // the constructor can be parsed as: // - Foo ::Foo(); // where the first Foo is return-type, and ::Foo is the function declarator // + Foo::Foo(); // where Foo::Foo is the function declarator ``` ``` void test() { // a very slow parsing case when there are many qualifers! X::Y::Z; // The statement can be parsed as: // - X ::Y::Z; // ::Y::Z is the declarator // - X::Y ::Z; // ::Z is the declarator // + X::Y::Z; // a declaration without declarator (X::Y::Z is decl-specifier-seq) // + X::Y::Z; // a qualifed-id expression } ``` Differential Revision: https://reviews.llvm.org/D130511	2022-07-28 11:01:15 +02:00
Sam McCall	7b70c2e75c	[pseudo] Fix initializer of string table Apparently new string[/no size/]{"foo", "bar"} is a clang/gcc extension?	2022-07-27 11:04:12 +02:00
Sam McCall	afc4958f5a	[pseudo] Add dangling-else guard to missing if-statement variants	2022-07-27 09:08:34 +02:00
Sam McCall	89f284bc23	[pseudo] Remove dead header This was an earlier draft of Language.h that got committed accidentally	2022-07-27 09:05:59 +02:00
Sam McCall	6bdb15fe84	[pseudo] Reorganize CXX.h enums - Place rules under rule::lhs::rhs__rhs__rhs - Change mangling of keywords to ALL_CAPS (needed to turn keywords that appear alone on RHS into valid identifiers) - Make enums implicitly convertible to underlying type (though still scoped, using alias tricks) In principle this lets us exhaustively write a switch over all rules of a NT: switch ((rule::declarator)N->rule()) { case rule::declarator::noptr_declarator: ... } In practice we don't do this anywhere yet as we're often switching over multiple nonterminal kinds at once. Differential Revision: https://reviews.llvm.org/D130414	2022-07-27 09:03:29 +02:00
Sam McCall	07b7ff9838	[pseudo] Allow opaque nodes to represent terminals This allows incomplete code such as `namespace foo {` to be modeled as a normal sequence with the missing } represented by an empty opaque node. Differential Revision: https://reviews.llvm.org/D130551	2022-07-26 13:56:26 +02:00
Sam McCall	b2b993a6ae	[pseudo] Eliminate multiple-specified-types ambiguities using guards Motivating case: `foo bar;` is not a declaration of nothing with `foo` and `bar` both types. This is a common and critical ambiguity, clangd/AST.cpp has 20% fewer ambiguous nodes (1674->1332) after this change. Differential Revision: https://reviews.llvm.org/D130337	2022-07-25 12:57:07 +02:00
Sam McCall	661e0b63f7	[pseudo] Fix minor errors in module grammar	2022-07-25 10:04:56 +02:00
Dmitri Gribenko	aba43035bd	Use llvm::sort instead of std::sort where possible llvm::sort is beneficial even when we use the iterator-based overload, since it can optionally shuffle the elements (to detect non-determinism). However llvm::sort is not usable everywhere, for example, in compiler-rt. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D130406	2022-07-23 15:19:05 +02:00
Sam McCall	d9d554a3f4	[pseudo] Add ambiguity & unparseability metrics to -print-statistics These can be used to quantify parsing improvements from a change. Differential Revision: https://reviews.llvm.org/D130199	2022-07-22 10:35:06 +02:00
Haojian Wu	2a88fb2ecb	[pseudo] Eliminate the dangling-else syntax ambiguity. - the grammar ambiguity is eliminated by a guard; - modify the guard function signatures, now all parameters are folded in to a single object, avoid a long parameter list (as we will add more parameters in the near future); Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D130160	2022-07-22 09:13:09 +02:00
Haojian Wu	18cee95919	[pseudo] Tweak the cli option messages, NFC.	2022-07-22 08:53:24 +02:00
Sam McCall	d26ee284de	[pseudo] Fix link error after `3132e9cd7c`	2022-07-22 08:43:56 +02:00
Sam McCall	3132e9cd7c	[pseudo] Key guards by RuleID, add guards to literals (and 0). After this, NUMERIC_CONSTANT and strings should parse only one way. There are 8 types of literals, and 24 valid (literal, TokenKind) pairs. This means adding 8 new named guards (or 24, if we want to assert the token). It seems fairly clear to me at this point that the guard names are unneccesary indirection: the guards are in fact coupled to the rule signature. (Also add the zero guard I forgot in the previous patch.) Differential Revision: https://reviews.llvm.org/D130066	2022-07-21 22:42:31 +02:00
Haojian Wu	65c8e24622	[pseudo] Fix an invalid assertion on recoveryBrackets. The `Begin` is not the index of the left bracket, `Begin-1` is, otherwise the assertion will be triggered on case `Foo().call();`.	2022-07-21 14:02:11 +02:00
Haojian Wu	2955192df8	[pseudo] Make sure we rebuild pseudo_gen tool.	2022-07-21 10:09:21 +02:00
Sam McCall	c91ce94144	[pseudo] Add `clang-pseudo -html-forest=<output.html>`, an HTML forest browser It generates a standalone HTML file with all needed JS/CSS embedded. This allows navigating the tree both with a tree widget and in the code, inspecting nodes, and selecting ambiguous alternatives. Demo: https://htmlpreview.github.io/?https://gist.githubusercontent.com/sam-mccall/03882f7499d293196594e8a50599a503/raw/ASTSignals.cpp.html Differential Revision: https://reviews.llvm.org/D130004	2022-07-19 22:32:11 +02:00
Haojian Wu	d489b3807f	[pseudo] Implement a guard to determine function declarator. This eliminates some simple-declaration/function-definition false parses. - implement a function to determine whether a declarator ForestNode is a function declarator; - extend the standard declarator to two guarded function-declarator and non-function-declarator nonterminals; Differential Revision: https://reviews.llvm.org/D129222	2022-07-19 09:44:45 +02:00
Sam McCall	fa0c7639e9	[pseudo] Add guards for module contextual keywords	2022-07-18 22:38:41 +02:00
Haojian Wu	098488e09a	[pseduo] More precise on printing the error message, NFC	2022-07-18 13:23:18 +02:00
Utkarsh Saxena	70914aa631	Use pseudo parser for folding ranges This first version only uses bracket matching. We plan to extend this to use DirectiveTree as well. Also includes changes to Token to allow retrieving corresponding token in token stream of original source file. Differential Revision: https://reviews.llvm.org/D129648	2022-07-18 11:35:34 +02:00
Haojian Wu	b94ea8b3eb	[pseudo] Add bracket recovery for function parameters.	2022-07-18 10:23:15 +02:00
Haojian Wu	76910d4a56	[pseudo] Share the underly payload when stripping comments for a token stream `stripComments(cook(...))` is a common pattern being written. Without this patch, this has a use-after-free issue (cook returns a temporary TokenStream object which has its own payload, but the payload is not shared with the one returned by stripComments). Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D125311	2022-07-15 15:20:48 +02:00
Haojian Wu	2315358906	[pseudo] Generate an enum type for identifying grammar rules. The Rule enum type enables us to identify a grammar rule within C++'s type system. Differential Revision: https://reviews.llvm.org/D129359	2022-07-15 15:09:31 +02:00
Kazu Hirata	53daa177f8	[clang, clang-tools-extra] Use has_value instead of hasValue (NFC)	2022-07-12 22:47:41 -07:00
Haojian Wu	cd3aa338c7	[pseudo] NFC, fix the header guard for Language.h	2022-07-07 14:42:26 +02:00
Sam McCall	7d8e2742d9	[pseudo] Define recovery strategy as grammar extension. Differential Revision: https://reviews.llvm.org/D129158	2022-07-06 15:03:38 +02:00
Sam McCall	3121167488	[pseudo] Add error-recovery framework & brace-based recovery The idea is: - a parse failure is detected when all heads die when trying to shift the next token - we can recover by choosing a nonterminal we're partway through parsing, and determining where it ends through nonlocal means (e.g. matching brackets) - we can find candidates by walking up the stack from the (ex-)heads - the token range is defined using heuristics attached to grammar rules - the unparsed region is represented in the forest by an Opaque node This patch has the core GLR functionality. It does not allow recovery heuristics to be attached as extensions to the grammar, but rather infers a brace-based heuristic. Expected followups: - make recovery heuristics grammar extensions (depends on D127448) - add recovery to our grammar for bracketed constructs and sequence nodes - change the structure of our augmented `_ := start` rules to eliminate some special-cases in glrParse. - (if I can work out how): avoid some spurious recovery cases described in comments (Previously mistakenly committed as `a0f4c10ae2`) Differential Revision: https://reviews.llvm.org/D128486	2022-07-05 20:49:41 +02:00
Haojian Wu	9ab67cc8bf	[pseudo] Implement guard extension. - Extend the GLR parser to allow conditional reduction based on the guard functions; - Implement two simple guards (contextual-override/final) for cxx.bnf; - layering: clangPseudoCXX depends on clangPseudo (as the guard function need to access the TokenStream); Differential Revision: https://reviews.llvm.org/D127448	2022-07-05 15:55:15 +02:00
Haojian Wu	d263447311	[pseudo] Fix the build for the benchmark tool.	2022-07-05 15:42:41 +02:00
Haojian Wu	70c0d92930	[pseudo] Use the prebuilt cxx grammar for the lit tests, NFC. Differential Revision: https://reviews.llvm.org/D129074	2022-07-05 15:17:18 +02:00

1 2 3 4

157 Commits