llvm-project

Commit Graph

Author	SHA1	Message	Date
Haojian Wu	94552f0216	[pseudo] Build inc files when cxx.bnf changes. Add the cxx.bnf file as a dependency of custom gen commands, so that the inc files can be rebuilt when cxx.bnf changes.	2022-06-01 13:48:09 +02:00
Sam McCall	9d991da60d	[pseudo] Respect LLVM_USE_HOST_TOOLS This is the intended way to request that build-time tools be built in a distinct configuration. This is set implicitly by LLVM_OPTIMIZED_TABLEGEN, which may be surprising, but if undesired this should be fixed elsewhere. Should fix crbug.com/1330304	2022-05-31 20:47:57 +02:00
Haojian Wu	a5ddd4a238	[pseudo] Remove an unnecessary nullable check diagnostic in the bnf grammar, NFC. This diagnostic has been handled in eliminateOptional.	2022-05-30 09:04:47 +02:00
Shoaib Meenai	4baae166ce	[pseudo] Fix pseudo-gen usage when cross-compiling Use the LLVM build system's cross-compilation support for the tool, so that the build works for both host and cross-compilation scenarios. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D126397	2022-05-25 11:08:21 -07:00
Nico Weber	788463e72a	[pseudo-gen] Add -o flag, make --grammar required Virtually all LLVM tools accept a `-o` flag, so add one. This will make it possible to possibly add a --write-if-changed flag later. It also makes it so that the file isn't partially written if the tool oesn't run successfully. Marking --grammar as `Required` allows removing some manual verification code for it. Differential Revision: https://reviews.llvm.org/D126373	2022-05-25 09:11:42 -04:00
Haojian Wu	f1df6515e3	[pseudo] Add missing dependency, fix shared library build.	2022-05-25 12:38:23 +02:00
Haojian Wu	cd2292ef82	[pseudo] A basic implementation of compiling cxx grammar at build time. The main idea is to compile the cxx grammar at build time, and construct the core pieces (Grammar, LRTable) of the pseudoparse based on the compiled data sources. This is a tiny implementation, which is good for start: - defines how the public API should look like; - integrates the cxx grammar compilation workflow with the cmake system. - onlynonterminal symbols of the C++ grammar are compiled, anything else are still doing the real compilation work at runtime, we can opt-in more bits in the future; - splits the monolithic clangPsuedo library for better layering; Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D125667	2022-05-25 11:26:06 +02:00
Sam McCall	0360b9f159	[pseudo] (trivial) bracket-matching Error-tolerant bracket matching enables our error-tolerant parsing strategies. The implementation here is not yet error tolerant: this patch sets up the APIs and plumbing, and describes the planned approach. Differential Revision: https://reviews.llvm.org/D125911	2022-05-24 15:13:36 +02:00
Sam McCall	cd387e43bf	[pseudo] Squash some warnings. NFC Explicitly sizing Kind enum suggests that too-large values are allowed, and that putting it in a bitfield is dangerous. GCC doesn't like condition ? integer : enum.	2022-05-19 08:20:12 +02:00
Sam McCall	79ca4ed3e7	[pseudo] Design notes from discussion today. NFC	2022-05-18 00:08:47 +02:00
Sam McCall	e8e00e342c	[pseudo] benchmark cleanups. NFC - add missing benchmark for lex/preprocess steps - name benchmarks after the function they're benchmarking, when appropriate - remove unergonomic "run" prefixes from benchmark names - give a useful error message if --grammar or --source are missing - Use realistic example of how to run, run all benchmarks by default. (for someone who doesn't know the commands, this is the most useful action) - Improve typos/wording in comment - clean up unused vars - avoid "parseable stream" name, which isn't a great name & not one I expected to escape from ClangPseudoMain Differential Revision: https://reviews.llvm.org/D125312	2022-05-17 20:22:42 +02:00
Dmitri Gribenko	9c6a2f2966	Fix an unused variable warning in no-asserts build mode	2022-05-17 15:27:44 +02:00
Haojian Wu	86bc6399a0	[pseudo] Add the missing ; terminal for module-declaration rule.	2022-05-17 15:14:46 +02:00
Haojian Wu	1a65c491be	[pseudo] Support parsing variant target symbols. With this patch, we're able to parse smaller chunks of C++ code (statement, declaration), rather than translation-unit. The start symbol is listed in the grammar in a form of `_ := statement`, each start symbol has a dedicated state (`_ := • statement`). We create and track all these separate states in the LRTable. When we start parsing, we lookup the corresponding state to start the parser. LR pasing table changes with this patch: - number of states: 1467 -> 1471 - number of actions: 82891 -> 83578 - size of the table (bytes): 334248 -> 336996 Differential Revision: https://reviews.llvm.org/D125006	2022-05-16 10:38:16 +02:00
Haojian Wu	be895d5768	[pseudo] Add benchmarks for pseudoparser. Running on SemaDecl.cpp with the cxx.bnf grammar: ``` -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- runParseBNFGrammar 649389 ns 649365 ns 1013 runBuildLR 34591903 ns 34591380 ns 20 runPreprocessTokens 11418744 ns 11418703 ns 61 bytes_per_second=63.8971M/s runGLRParse 282996863 ns 282988726 ns 2 bytes_per_second=2.57827M/s runParseOverall 294969719 ns 294951870 ns 2 bytes_per_second=2.4737M/s ``` Differential Revision: https://reviews.llvm.org/D125226	2022-05-10 14:13:46 +02:00
Sam McCall	e571e1a6c3	Reland "[FuzzMutate] Split out FuzzerCLI library that doesn't depend on IR." This reverts commit `a1bb952e83`. I'd somehow missed updating llvm-yaml-parser-fuzzer, now fixed.	2022-05-07 13:49:54 +02:00
Aaron Ballman	a1bb952e83	Revert "[FuzzMutate] Split out FuzzerCLI library that doesn't depend on IR." This reverts commit `1c5e85b3da`. It broke a lot of bots with a link error: https://lab.llvm.org/buildbot/#/builders/171/builds/14222 https://lab.llvm.org/buildbot/#/builders/188/builds/13748 https://lab.llvm.org/buildbot/#/builders/109/builds/38127	2022-05-07 07:29:57 -04:00
Sam McCall	1c5e85b3da	[FuzzMutate] Split out FuzzerCLI library that doesn't depend on IR. All llvm-project fuzzers use this library to parse command-line arguments. Many of them don't deal with LLVM IR or modules in any way. Bundling those functions in one library forces build dependencies that don't need to be there. Among other things, this means check-clang-pseudo no longer depends on most of LLVM. Differential Revision: https://reviews.llvm.org/D125081	2022-05-07 12:11:51 +02:00
Sam McCall	7dc3c6190e	[pseudo] Strip directives from a token stream This includes only the taken branch of conditional sections. The API allows for producing a stream for a particular PP branch, which will be used later for the secondary GLR parses of not-taken branches. Differential Revision: https://reviews.llvm.org/D123243	2022-05-06 12:15:08 +02:00
Sam McCall	1616bd9ef4	[pseudo] Add fuzzer for the pseudoparser. As confirmation, running this locally found 2 crashes: - trivial: crashes on file with no tokens - lexer: hits an assertion failure on bytes: 0x5c,0xa,0x5c,0x1,0x65,0x5c,0xa Differential Revision: https://reviews.llvm.org/D125037	2022-05-06 09:22:28 +02:00
Sam McCall	232cc446ff	[pseudo] Only expand UCNs for raw_identifiers It turns out clang::expandUCNs only works on tokens that contain valid UCNs and no other random escapes, and clang only uses it on raw_identifiers. Currently we can hit an assertion by creating tokens with stray non-valid-UCN backslashes in them. Fortunately, expanding UCNs in raw_identifiers is actually all we need. Most tokens (keywords, punctuation) can't have them. UCNs in literals can be treated as escape sequences like \n even this isn't the standard's interpretation. This more or less matches how clang works. (See https://isocpp.org/files/papers/P2194R0.pdf which points out that the standard's description of how UCNs work is misaligned with real implementations) Differential Revision: https://reviews.llvm.org/D125049	2022-05-06 08:53:31 +02:00
Weverything	0e86cddf98	[psuedo] Fix for unused warning by moving code into debug macro.	2022-05-03 16:07:59 -07:00
Haojian Wu	c4546091ed	[pseudo] Use a real language option in the parser. Differential Revision: https://reviews.llvm.org/D124831	2022-05-03 22:24:56 +02:00
Haojian Wu	ed1b32791d	[pseudo] Print the GSS::Node details when the unittest fails, NFC.	2022-05-03 22:06:10 +02:00
Haojian Wu	9f38da258e	[pseudo] Implement the GLR parsing algorithm. This patch implements a standard GLR parsing algorithm, the core piece of the pseudoparser. - it parses preprocessed C++ code, currently it supports correct code only and parse them as a translation-unit; - it produces a forest which stores all possible trees in an efficient manner (only a single node being build for per (SymbolID, Token Range)); no disambiguation yet; Reland with a fix for g++'s -fpermissive error on previous declaration `GSS& GSS;`. Differential Revision: https://reviews.llvm.org/D121150	2022-05-03 20:25:23 +02:00
Haojian Wu	860eabb395	Revert "[pseudo] Implement the GLR parsing algorithm." This breaks some buildbots (on the declaration GSS& GSS), will fix it later. This reverts commit `eac22d0754`.	2022-05-03 15:54:10 +02:00
Sam McCall	eac22d0754	[pseudo] Implement the GLR parsing algorithm. This patch implements a standard GLR parsing algorithm, the core piece of the pseudoparser. - it parses preprocessed C++ code, currently it supports correct code only and parse them as a translation-unit; - it produces a forest which stores all possible trees in an efficient manner (only a single node being build for per (SymbolID, Token Range)); no disambiguation yet; Differential Revision: https://reviews.llvm.org/D121150	2022-05-03 15:42:07 +02:00
Haojian Wu	b18abde8ad	[pseudo] Simplify the forest dump, NFC. The code was written to handle nullable grammar, and we disallow nullable grammar, so it is not necessary to keep it around. Differential Revision: https://reviews.llvm.org/D124827	2022-05-03 14:14:57 +02:00
Haojian Wu	910fb5d7e0	[pseudo] NFC, fix some code-style naming violations.	2022-04-26 10:50:50 +02:00
Christopher Di Bella	e9a902c7f7	Revert "Revert "Revert "[clang][pp] adds '#pragma include_instead'""" > Includes regression test for problem noted by @hans. > is reverts commit `973de71`. > > Differential Revision: https://reviews.llvm.org/D106898 Feature implemented as-is is fairly expensive and hasn't been used by libc++. A potential reimplementation is possible if libc++ become interested in this feature again. Differential Revision: https://reviews.llvm.org/D123885	2022-04-22 16:37:20 +00:00
Sam McCall	60502ed11a	[pseudo] Remove unused clangTesting dep. NFC	2022-04-12 16:17:43 +02:00
Sam McCall	5749a261c5	[pseudo] Include missing `count` in test deps. We don't use this for testing, but one of the lit python modules requires it :-\ After this, check-clang-pseudo passes with a clean build tree.	2022-04-07 00:15:18 +02:00
Sam McCall	c03d6257c5	[pseudo] Rename DirectiveMap -> DirectiveTree. NFC Addressing comment from previous review https://reviews.llvm.org/D121165?id=413636#inline-1160757	2022-04-06 21:36:57 +02:00
Sam McCall	af89e4792d	[pseudo] Add crude heuristics to choose taken preprocessor branches. In files where different preprocessing paths are possible, our goal is to choose a preprocessed token sequence which we can parse that pins down as much of the grammatical structure as possible. This forms the "primary parse", and the not-taken branches get parsed later, and are constrained to be compatible with the primary parse. Concretely: int x = #ifdef // TAKEN 2 + 2 + 2 // determined during primary parse to be an expression #else 2 // constrained to be an expression during a secondary parse #endif ; Differential Revision: https://reviews.llvm.org/D121165	2022-04-06 17:22:35 +02:00
Sam McCall	72ae6cc3a6	[pseudo] respect CLANG_INCLUDE_TESTS	2022-04-04 15:30:11 +02:00
Haojian Wu	16eaa5240e	[pseudo] Fix the wrong rule ids in ForestTest.	2022-03-26 00:05:37 +01:00
Haojian Wu	41e69fb245	[pseudo] Add missing header guard for Forest.h	2022-03-25 23:51:19 +01:00
Sam McCall	57ee624d79	[cmake] Provide CURRENT_TOOLS_DIR centrally, replacing CLANG_TOOLS_DIR CLANG_TOOLS_DIR holds the the current bin/ directory, maybe with a %(build_mode) placeholder. It is used to add the just-built binaries to $PATH for lit tests. In most cases it equals LLVM_TOOLS_DIR, which is used for the same purpose. But for a standalone build of clang, CLANG_TOOLS_DIR points at the build tree and LLVM_TOOLS_DIR points at the provided LLVM binaries. Currently CLANG_TOOLS_DIR is set in clang/test/, clang-tools-extra/test/, and other things always built with clang. This is a few cryptic lines of CMake in each place. Meanwhile LLVM_TOOLS_DIR is provided by configure_site_lit_cfg(). This patch moves CLANG_TOOLS_DIR to configure_site_lit_cfg() and renames it: - there's nothing clang-specific about the value - it will also replace LLD_TOOLS_DIR, LLDB_TOOLS_DIR etc (not in this patch) It also defines CURRENT_LIBS_DIR. While I removed the last usage of CLANG_LIBS_DIR in `e4cab4e24d`, there are LLD_LIBS_DIR usages etc that may be live, and I'd like to mechanically update them in a followup patch. Differential Revision: https://reviews.llvm.org/D121763	2022-03-25 20:22:01 +01:00
Sam McCall	72864d9bfe	[pseudo] Use box-drawing chars to prettify debug dumps. NFC	2022-03-25 14:17:38 +01:00
Haojian Wu	62d5f254cc	[pseudo] Introduce parse forest. Parse forest is the output of the GLR parser, it is a tree-like DAG which presents all possible parse trees without duplicating subparse structures. This is a patch split from https://reviews.llvm.org/D121150. Differential Revision: https://reviews.llvm.org/D122139	2022-03-24 14:47:17 +01:00
Haojian Wu	f383b88d82	[pseudo] Sort nonterminals based on their reduction order. Reductions need to be performed in a careful order in GLR parser, to make sure we gather all alternatives before creating an ambigous forest node. This patch encodes the nonterminal order into the rule id, so that we can efficiently to determinal ordering of reductions in GLR parser. This patch also abstracts to a TestGrammar, which is shared among tests. This is a part of the GLR parser, https://reviews.llvm.org/D121368, https://reviews.llvm.org/D121150 Differential Revision: https://reviews.llvm.org/D122303	2022-03-24 14:30:12 +01:00
Haojian Wu	1579090141	Reland "[pseudo] Split greatergreater token." It was reverted, because the test had a lift-time issue. Reland `f66d3758bd` with a fix.	2022-03-22 10:27:52 +01:00
Sam McCall	1f92f44ec9	[pseudo] fix typo'd test assertions	2022-03-21 14:05:21 +01:00
Zequan Wu	217f267efe	Revert "[pseudo] Split greatergreater token." This reverts commit `f66d3758bd`. It breaks windows bot.	2022-03-18 10:15:48 -07:00
Haojian Wu	30de15e100	[pseudo] Tweak some docs, NFC Consitently use the "nonterminal", "pseudoparser" terms.	2022-03-17 13:58:42 +01:00
Haojian Wu	f66d3758bd	[pseudo] Split greatergreater token. For a >> token (a right shift operator, or a nested template?), the clang lexer always returns a single greatergreater token, as a result, the grammar-based GLR parser never try to parse the nested template case. We derive a token stream by always splitting the >> token, so that the GLR parser is able to pursue both options during parsing (usually 1 path fails). Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D121678	2022-03-17 13:46:58 +01:00
Haojian Wu	5a624956ce	[pseudo] Fix some naming-style violations.	2022-03-17 09:47:24 +01:00
Haojian Wu	e5b1b9edb8	[pseudo] Cleanup the leftover header guards after the movement, NFC.	2022-03-16 16:25:18 +01:00
Sam McCall	89cd86bbc5	Reapply [pseudo] Move pseudoparser from clang to clang-tools-extra" This reverts commit `049f4e4eab`. The problem was a stray dependency in CLANG_TEST_DEPS which caused cmake to fail if clang-pseudo wasn't built. This is now removed.	2022-03-16 01:10:55 +01:00
Sam McCall	049f4e4eab	Revert "[pseudo] Move pseudoparser from clang to clang-tools-extra" This reverts commit `b97856c4cf`. Breaks a bunch of bots: https://lab.llvm.org/buildbot/#/builders/193/builds/8513	2022-03-16 01:06:24 +01:00

1 2

51 Commits