llvm-project

Commit Graph

Author	SHA1	Message	Date
Haojian Wu	2315358906	[pseudo] Generate an enum type for identifying grammar rules. The Rule enum type enables us to identify a grammar rule within C++'s type system. Differential Revision: https://reviews.llvm.org/D129359	2022-07-15 15:09:31 +02:00
Sam McCall	7d8e2742d9	[pseudo] Define recovery strategy as grammar extension. Differential Revision: https://reviews.llvm.org/D129158	2022-07-06 15:03:38 +02:00
Sam McCall	3121167488	[pseudo] Add error-recovery framework & brace-based recovery The idea is: - a parse failure is detected when all heads die when trying to shift the next token - we can recover by choosing a nonterminal we're partway through parsing, and determining where it ends through nonlocal means (e.g. matching brackets) - we can find candidates by walking up the stack from the (ex-)heads - the token range is defined using heuristics attached to grammar rules - the unparsed region is represented in the forest by an Opaque node This patch has the core GLR functionality. It does not allow recovery heuristics to be attached as extensions to the grammar, but rather infers a brace-based heuristic. Expected followups: - make recovery heuristics grammar extensions (depends on D127448) - add recovery to our grammar for bracketed constructs and sequence nodes - change the structure of our augmented `_ := start` rules to eliminate some special-cases in glrParse. - (if I can work out how): avoid some spurious recovery cases described in comments (Previously mistakenly committed as `a0f4c10ae2`) Differential Revision: https://reviews.llvm.org/D128486	2022-07-05 20:49:41 +02:00
Sam McCall	9fbf1107cc	[pseudo] Eliminate LRTable::Action. NFC The last remaining uses are in tests/test builders. Replace with a builder struct. Differential Revision: https://reviews.llvm.org/D129093	2022-07-05 14:35:41 +02:00
Sam McCall	b37dafd5dc	[pseudo] Store shift and goto actions in a compact structure with faster lookup. The actions table is very compact but the binary search to find the correct action is relatively expensive. A hashtable is faster but pretty large (64 bits per value, plus empty slots, and lookup is constant time but not trivial due to collisions). The structure in this patch uses 1.25 bits per entry (whether present or absent) plus the size of the values, and lookup is trivial. The Shift table is 119KB = 27KB values + 92KB keys. The Goto table is 86KB = 30KB values + 57KB keys. (Goto has a smaller keyspace as #nonterminals < #terminals, and more entries). This patch improves glrParse speed by 28%: 4.69 => 5.99 MB/s Overall the table grows by 60%: 142 => 228KB. By comparison, DenseMap<unsigned, StateID> is "only" 16% faster (5.43 MB/s), and results in a 285% larger table (547 KB) vs the baseline. Differential Revision: https://reviews.llvm.org/D128485	2022-07-04 19:40:04 +02:00
Sam McCall	743971faaf	Revert "[pseudo] Add error-recovery framework & brace-based recovery" This reverts commit `a0f4c10ae2`. This commit hadn't been reviewed yet, and was unintentionally included on another branch.	2022-06-28 21:11:09 +02:00
Sam McCall	a0f4c10ae2	[pseudo] Add error-recovery framework & brace-based recovery The idea is: - a parse failure is detected when all heads die when trying to shift the next token - we can recover by choosing a nonterminal we're partway through parsing, and determining where it ends through nonlocal means (e.g. matching brackets) - we can find candidates by walking up the stack from the (ex-)heads - the token range is defined using heuristics attached to grammar rules - the unparsed region is represented in the forest by an Opaque node This patch has the core GLR functionality. It does not allow recovery heuristics to be attached as extensions to the grammar, but rather infers a brace-based heuristic. Expected followups: - make recovery heuristics grammar extensions (depends on D127448) - add recover to our grammar for bracketed constructs and sequence nodes - change the structure of our augmented `_ := start` rules to eliminate some special-cases in glrParse. - (if I can work out how): avoid some spurious recovery cases described in comments - grammar changes to eliminate the hard distinction between init-list and designated-init-list shown in the recovery-init-list.cpp testcase Differential Revision: https://reviews.llvm.org/D128486	2022-06-28 21:08:43 +02:00
Sam McCall	3f028c02ba	[pseudo] Grammar::parseBNF returns Grammar not unique_ptr. NFC	2022-06-28 16:34:21 +02:00
Sam McCall	85eaecbe8e	[pseudo] Check follow-sets instead of tying reduce actions to lookahead tokens. Previously, the action table stores a reduce action for each lookahead token it should allow. These tokens are the followSet(action.rule.target). In practice, the follow sets are large, so we spend a bunch of time binary searching around all these essentially-duplicates to check whether our lookahead token is there. However the number of reduces for a given state is very small, so we're much better off linear scanning over them and performing a fast check for each. D128318 was an attempt at this, storing a bitmap for each reduce. However it's even more compact just to use the follow sets directly, as there are fewer nonterminals than (state, rule) pairs. It's also faster. This specialized approach means unbundling Reduce from other actions in LRTable, so it's no longer useful to support it in Action. I suspect Action will soon go away, as we store each kind of action separately. This improves glrParse speed by 42% (3.30 -> 4.69 MB/s). It also reduces LR table size by 59% (343 -> 142kB). Differential Revision: https://reviews.llvm.org/D128472	2022-06-28 00:36:16 +02:00
Sam McCall	b70ee9d984	Reland "[pseudo] Track heads as GSS nodes, rather than as "pending actions"." This reverts commit `2c80b53198`. Fixes LRTable::buildForTest to create states that are referenced but have no actions.	2022-06-23 18:21:44 +02:00
Sam McCall	2c80b53198	Revert "[pseudo] Track heads as GSS nodes, rather than as "pending actions"." This reverts commit `e3ec054dfd`. Tests fail in asserts mode: https://lab.llvm.org/buildbot/#/builders/109/builds/41217	2022-06-23 18:16:38 +02:00
Sam McCall	e3ec054dfd	[pseudo] Track heads as GSS nodes, rather than as "pending actions". IMO this model is simpler to understand (borrowed from the LR0 patch D127357). It also makes error recovery easier to implement, as we have a simple list of head nodes lying around to recover from when needed. (It's not quite as nice as LR0 in this respect though). It's slightly slower (2.24 -> 2.12 MB/S on my machine = 5%) but nothing close to as bad as LR0. However - I think we'd have to eat a litle performance loss otherwise to implement error recovery. - this frees up some complexity budget for optimizations like fastpath push/pop (this + fastpath is already faster than head) - I haven't changed the data structure here and it's now pretty dumb, we can make it faster Differential Revision: https://reviews.llvm.org/D128297	2022-06-23 17:26:42 +02:00
Haojian Wu	c70aeaad2b	[pseudo] Move grammar-related headers to a separate dir, NFC. We did that for .cpp, but forgot the headers. Differential Revision: https://reviews.llvm.org/D127388	2022-06-09 14:58:05 +02:00
Haojian Wu	9ce232fba9	[pseudo] Fix the missing-field-initializers warning from `f1ac00c9b0`, NFC	2022-06-09 14:10:36 +02:00
Haojian Wu	f1ac00c9b0	[pseudo] Add grammar annotations support. Add annotation handling ([key=value]) in the BNF grammar parser, which will be used in the conditional reduction, and error recovery. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D126536	2022-06-09 12:06:22 +02:00
Haojian Wu	7a05942dd0	[pseudo] Remove the explicit Accept actions. As pointed out in the previous review section, having a dedicated accept action doesn't seem to be necessary. This patch implements the the same behavior without accept acction, which will save some code complexity. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D125677	2022-06-09 11:19:07 +02:00
Haojian Wu	075449da80	[pseudo] Fix a sign-compare warning in debug build, NFC.	2022-06-09 11:18:03 +02:00
Sam McCall	bbc58c5e9b	[pseudo] Restore accidentally removed debug print	2022-06-08 23:39:34 +02:00
Sam McCall	93bcff8aa8	[pseudo] Invert rows/columns of LRTable storage for speedup. NFC There are more states than symbols. This means first partioning the action list by state leaves us with a smaller range to binary search over. This improves find() a lot and glrParse() by 7%. The tradeoff is storing more smaller ranges increases the size of the offsets array, overall grammar memory is +1% (337->340KB). Before: glrParse 188795975 ns 188778003 ns 77 bytes_per_second=1.98068M/s After: glrParse 175936203 ns 175916873 ns 81 bytes_per_second=2.12548M/s Differential Revision: https://reviews.llvm.org/D127006	2022-06-08 23:35:14 +02:00
Fangrui Song	47ec8b5574	[pseudo] Fix leaks after D126731 Array Operator new Cookies help lsan find allocations, while std::array can't.	2022-06-03 18:43:16 -07:00
Sam McCall	dc63ad8878	[pseudo] Eliminate dependencies from clang-pseudo-gen. NFC ClangBasic dependency eliminated by replacing our usage of tok::getPunctuatorSpelling etc with direct use of the *.def file. Implicit dependencies on clang-tablegen-targets removed as we manage to avoid any transitive tablegen deps. After these changes, `ninja clean; ninja pseudo-gen` runs 169 actions only (basically Support and Demangle). Differential Revision: https://reviews.llvm.org/D126731	2022-06-03 20:42:38 +02:00
Haojian Wu	a5ddd4a238	[pseudo] Remove an unnecessary nullable check diagnostic in the bnf grammar, NFC. This diagnostic has been handled in eliminateOptional.	2022-05-30 09:04:47 +02:00
Haojian Wu	cd2292ef82	[pseudo] A basic implementation of compiling cxx grammar at build time. The main idea is to compile the cxx grammar at build time, and construct the core pieces (Grammar, LRTable) of the pseudoparse based on the compiled data sources. This is a tiny implementation, which is good for start: - defines how the public API should look like; - integrates the cxx grammar compilation workflow with the cmake system. - onlynonterminal symbols of the C++ grammar are compiled, anything else are still doing the real compilation work at runtime, we can opt-in more bits in the future; - splits the monolithic clangPsuedo library for better layering; Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D125667	2022-05-25 11:26:06 +02:00

23 Commits