Commit Graph

51 Commits

Author SHA1 Message Date
Haojian Wu 94552f0216 [pseudo] Build inc files when cxx.bnf changes.
Add the cxx.bnf file as a dependency of custom gen commands, so that the
inc files can be rebuilt when cxx.bnf changes.
2022-06-01 13:48:09 +02:00
Sam McCall 9d991da60d [pseudo] Respect LLVM_USE_HOST_TOOLS
This is the intended way to request that build-time tools be built in a
distinct configuration.

This is set implicitly by LLVM_OPTIMIZED_TABLEGEN, which may be
surprising, but if undesired this should be fixed elsewhere.

Should fix crbug.com/1330304
2022-05-31 20:47:57 +02:00
Haojian Wu a5ddd4a238 [pseudo] Remove an unnecessary nullable check diagnostic in the bnf
grammar, NFC.

This diagnostic has been handled in eliminateOptional.
2022-05-30 09:04:47 +02:00
Shoaib Meenai 4baae166ce [pseudo] Fix pseudo-gen usage when cross-compiling
Use the LLVM build system's cross-compilation support for the tool, so
that the build works for both host and cross-compilation scenarios.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D126397
2022-05-25 11:08:21 -07:00
Nico Weber 788463e72a [pseudo-gen] Add -o flag, make --grammar required
Virtually all LLVM tools accept a `-o` flag, so add one. This will make it
possible to possibly add a --write-if-changed flag later. It also makes it
so that the file isn't partially written if the tool oesn't run successfully.

Marking --grammar as `Required` allows removing some manual
verification code for it.

Differential Revision: https://reviews.llvm.org/D126373
2022-05-25 09:11:42 -04:00
Haojian Wu f1df6515e3 [pseudo] Add missing dependency, fix shared library build. 2022-05-25 12:38:23 +02:00
Haojian Wu cd2292ef82 [pseudo] A basic implementation of compiling cxx grammar at build time.
The main idea is to compile the cxx grammar at build time, and construct
the core pieces (Grammar, LRTable) of the pseudoparse based on the compiled
data sources.

This is a tiny implementation, which is good for start:

- defines how the public API should look like;
- integrates the cxx grammar compilation workflow with the cmake system.
- onlynonterminal symbols of the C++ grammar are compiled, anything
  else are still doing the real compilation work at runtime, we can opt-in more
  bits in the future;
- splits the monolithic clangPsuedo library for better layering;

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D125667
2022-05-25 11:26:06 +02:00
Sam McCall 0360b9f159 [pseudo] (trivial) bracket-matching
Error-tolerant bracket matching enables our error-tolerant parsing strategies.
The implementation here is *not* yet error tolerant: this patch sets up the APIs
and plumbing, and describes the planned approach.

Differential Revision: https://reviews.llvm.org/D125911
2022-05-24 15:13:36 +02:00
Sam McCall cd387e43bf [pseudo] Squash some warnings. NFC
Explicitly sizing Kind enum suggests that too-large values are allowed,
and that putting it in a bitfield is dangerous.

GCC doesn't like condition ? integer : enum.
2022-05-19 08:20:12 +02:00
Sam McCall 79ca4ed3e7 [pseudo] Design notes from discussion today. NFC 2022-05-18 00:08:47 +02:00
Sam McCall e8e00e342c [pseudo] benchmark cleanups. NFC
- add missing benchmark for lex/preprocess steps
- name benchmarks after the function they're benchmarking, when appropriate
- remove unergonomic "run" prefixes from benchmark names
- give a useful error message if --grammar or --source are missing
- Use realistic example of how to run, run all benchmarks by default.
  (for someone who doesn't know the commands, this is the most useful action)
- Improve typos/wording in comment
- clean up unused vars
- avoid "parseable stream" name, which isn't a great name & not one I expected
  to escape from ClangPseudoMain

Differential Revision: https://reviews.llvm.org/D125312
2022-05-17 20:22:42 +02:00
Dmitri Gribenko 9c6a2f2966 Fix an unused variable warning in no-asserts build mode 2022-05-17 15:27:44 +02:00
Haojian Wu 86bc6399a0 [pseudo] Add the missing ; terminal for module-declaration rule. 2022-05-17 15:14:46 +02:00
Haojian Wu 1a65c491be [pseudo] Support parsing variant target symbols.
With this patch, we're able to parse smaller chunks of C++ code (statement,
declaration), rather than translation-unit.

The start symbol is listed in the grammar in a form of `_ :=
statement`, each start symbol has a dedicated state (`_ := • statement`).
We create and track all these separate states in the LRTable. When we
start parsing, we lookup the corresponding state to start the parser.

LR pasing table changes with this patch:
- number of states: 1467 -> 1471
- number of actions: 82891 -> 83578
- size of the table (bytes): 334248 -> 336996

Differential Revision: https://reviews.llvm.org/D125006
2022-05-16 10:38:16 +02:00
Haojian Wu be895d5768 [pseudo] Add benchmarks for pseudoparser.
Running on SemaDecl.cpp with the cxx.bnf grammar:

```
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
runParseBNFGrammar      649389 ns       649365 ns         1013
runBuildLR            34591903 ns     34591380 ns           20
runPreprocessTokens   11418744 ns     11418703 ns           61 bytes_per_second=63.8971M/s
runGLRParse          282996863 ns    282988726 ns            2 bytes_per_second=2.57827M/s
runParseOverall      294969719 ns    294951870 ns            2 bytes_per_second=2.4737M/s
```

Differential Revision: https://reviews.llvm.org/D125226
2022-05-10 14:13:46 +02:00
Sam McCall e571e1a6c3 Reland "[FuzzMutate] Split out FuzzerCLI library that doesn't depend on IR."
This reverts commit a1bb952e83.

I'd somehow missed updating llvm-yaml-parser-fuzzer, now fixed.
2022-05-07 13:49:54 +02:00
Aaron Ballman a1bb952e83 Revert "[FuzzMutate] Split out FuzzerCLI library that doesn't depend on IR."
This reverts commit 1c5e85b3da.

It broke a lot of bots with a link error:
https://lab.llvm.org/buildbot/#/builders/171/builds/14222
https://lab.llvm.org/buildbot/#/builders/188/builds/13748
https://lab.llvm.org/buildbot/#/builders/109/builds/38127
2022-05-07 07:29:57 -04:00
Sam McCall 1c5e85b3da [FuzzMutate] Split out FuzzerCLI library that doesn't depend on IR.
All llvm-project fuzzers use this library to parse command-line arguments.
Many of them don't deal with LLVM IR or modules in any way. Bundling those
functions in one library forces build dependencies that don't need to be there.

Among other things, this means check-clang-pseudo no longer depends on most of
LLVM.

Differential Revision: https://reviews.llvm.org/D125081
2022-05-07 12:11:51 +02:00
Sam McCall 7dc3c6190e [pseudo] Strip directives from a token stream
This includes only the taken branch of conditional sections.
The API allows for producing a stream for a particular PP branch, which
will be used later for the secondary GLR parses of not-taken branches.

Differential Revision: https://reviews.llvm.org/D123243
2022-05-06 12:15:08 +02:00
Sam McCall 1616bd9ef4 [pseudo] Add fuzzer for the pseudoparser.
As confirmation, running this locally found 2 crashes:
 - trivial: crashes on file with no tokens
 - lexer: hits an assertion failure on bytes: 0x5c,0xa,0x5c,0x1,0x65,0x5c,0xa

Differential Revision: https://reviews.llvm.org/D125037
2022-05-06 09:22:28 +02:00
Sam McCall 232cc446ff [pseudo] Only expand UCNs for raw_identifiers
It turns out clang::expandUCNs only works on tokens that contain valid UCNs
and no other random escapes, and clang only uses it on raw_identifiers.

Currently we can hit an assertion by creating tokens with stray non-valid-UCN
backslashes in them.

Fortunately, expanding UCNs in raw_identifiers is actually all we need.
Most tokens (keywords, punctuation) can't have them. UCNs in literals can be
treated as escape sequences like \n even this isn't the standard's
interpretation. This more or less matches how clang works.
(See https://isocpp.org/files/papers/P2194R0.pdf which points out that the
standard's description of how UCNs work is misaligned with real implementations)

Differential Revision: https://reviews.llvm.org/D125049
2022-05-06 08:53:31 +02:00
Weverything 0e86cddf98 [psuedo] Fix for unused warning by moving code into debug macro. 2022-05-03 16:07:59 -07:00
Haojian Wu c4546091ed [pseudo] Use a real language option in the parser.
Differential Revision: https://reviews.llvm.org/D124831
2022-05-03 22:24:56 +02:00
Haojian Wu ed1b32791d [pseudo] Print the GSS::Node details when the unittest fails, NFC. 2022-05-03 22:06:10 +02:00
Haojian Wu 9f38da258e [pseudo] Implement the GLR parsing algorithm.
This patch implements a standard GLR parsing algorithm, the
core piece of the pseudoparser.

- it parses preprocessed C++ code, currently it supports correct code
  only and parse them as a translation-unit;
- it produces a forest which stores all possible trees in an efficient
  manner (only a single node being build for per (SymbolID, Token Range));
  no disambiguation yet;

Reland with a fix for g++'s -fpermissive error on previous declaration `GSS& GSS;`.

Differential Revision: https://reviews.llvm.org/D121150
2022-05-03 20:25:23 +02:00
Haojian Wu 860eabb395 Revert "[pseudo] Implement the GLR parsing algorithm."
This breaks some buildbots (on the declaration GSS& GSS), will fix it
later.

This reverts commit eac22d0754.
2022-05-03 15:54:10 +02:00
Sam McCall eac22d0754 [pseudo] Implement the GLR parsing algorithm.
This patch implements a standard GLR parsing algorithm, the
core piece of the pseudoparser.

- it parses preprocessed C++ code, currently it supports correct code
  only and parse them as a translation-unit;
- it produces a forest which stores all possible trees in an efficient
  manner (only a single node being build for per (SymbolID, Token Range));
  no disambiguation yet;

Differential Revision: https://reviews.llvm.org/D121150
2022-05-03 15:42:07 +02:00
Haojian Wu b18abde8ad [pseudo] Simplify the forest dump, NFC.
The code was written to handle nullable grammar, and we disallow
nullable grammar, so it is not necessary to keep it around.

Differential Revision: https://reviews.llvm.org/D124827
2022-05-03 14:14:57 +02:00
Haojian Wu 910fb5d7e0 [pseudo] NFC, fix some code-style naming violations. 2022-04-26 10:50:50 +02:00
Christopher Di Bella e9a902c7f7 Revert "Revert "Revert "[clang][pp] adds '#pragma include_instead'"""
> Includes regression test for problem noted by @hans.
> is reverts commit 973de71.
>
> Differential Revision: https://reviews.llvm.org/D106898

Feature implemented as-is is fairly expensive and hasn't been used by
libc++. A potential reimplementation is possible if libc++ become
interested in this feature again.

Differential Revision: https://reviews.llvm.org/D123885
2022-04-22 16:37:20 +00:00
Sam McCall 60502ed11a [pseudo] Remove unused clangTesting dep. NFC 2022-04-12 16:17:43 +02:00
Sam McCall 5749a261c5 [pseudo] Include missing `count` in test deps.
We don't use this for testing, but one of the lit python modules requires it :-\

After this, check-clang-pseudo passes with a clean build tree.
2022-04-07 00:15:18 +02:00
Sam McCall c03d6257c5 [pseudo] Rename DirectiveMap -> DirectiveTree. NFC
Addressing comment from previous review
https://reviews.llvm.org/D121165?id=413636#inline-1160757
2022-04-06 21:36:57 +02:00
Sam McCall af89e4792d [pseudo] Add crude heuristics to choose taken preprocessor branches.
In files where different preprocessing paths are possible, our goal is to
choose a preprocessed token sequence which we can parse that pins down as much
of the grammatical structure as possible.
This forms the "primary parse", and the not-taken branches get parsed later,
and are constrained to be compatible with the primary parse.

Concretely:
  int x =
    #ifdef // TAKEN
      2 + 2 + 2 // determined during primary parse to be an expression
    #else
      2 // constrained to be an expression during a secondary parse
    #endif
    ;

Differential Revision: https://reviews.llvm.org/D121165
2022-04-06 17:22:35 +02:00
Sam McCall 72ae6cc3a6 [pseudo] respect CLANG_INCLUDE_TESTS 2022-04-04 15:30:11 +02:00
Haojian Wu 16eaa5240e [pseudo] Fix the wrong rule ids in ForestTest. 2022-03-26 00:05:37 +01:00
Haojian Wu 41e69fb245 [pseudo] Add missing header guard for Forest.h 2022-03-25 23:51:19 +01:00
Sam McCall 57ee624d79 [cmake] Provide CURRENT_TOOLS_DIR centrally, replacing CLANG_TOOLS_DIR
CLANG_TOOLS_DIR holds the the current bin/ directory, maybe with a %(build_mode)
placeholder. It is used to add the just-built binaries to $PATH for lit tests.
In most cases it equals LLVM_TOOLS_DIR, which is used for the same purpose.
But for a standalone build of clang, CLANG_TOOLS_DIR points at the build tree
and LLVM_TOOLS_DIR points at the provided LLVM binaries.

Currently CLANG_TOOLS_DIR is set in clang/test/, clang-tools-extra/test/, and
other things always built with clang. This is a few cryptic lines of CMake in
each place. Meanwhile LLVM_TOOLS_DIR is provided by configure_site_lit_cfg().

This patch moves CLANG_TOOLS_DIR to configure_site_lit_cfg() and renames it:
 - there's nothing clang-specific about the value
 - it will also replace LLD_TOOLS_DIR, LLDB_TOOLS_DIR etc (not in this patch)

It also defines CURRENT_LIBS_DIR. While I removed the last usage of
CLANG_LIBS_DIR in e4cab4e24d, there are LLD_LIBS_DIR usages etc that
may be live, and I'd like to mechanically update them in a followup patch.

Differential Revision: https://reviews.llvm.org/D121763
2022-03-25 20:22:01 +01:00
Sam McCall 72864d9bfe [pseudo] Use box-drawing chars to prettify debug dumps. NFC 2022-03-25 14:17:38 +01:00
Haojian Wu 62d5f254cc [pseudo] Introduce parse forest.
Parse forest is the output of the GLR parser, it is a tree-like DAG
which presents all possible parse trees without duplicating subparse structures.

This is a patch split from https://reviews.llvm.org/D121150.

Differential Revision: https://reviews.llvm.org/D122139
2022-03-24 14:47:17 +01:00
Haojian Wu f383b88d82 [pseudo] Sort nonterminals based on their reduction order.
Reductions need to be performed in a careful order in GLR parser, to
make sure we gather all alternatives before creating an ambigous forest
node.

This patch encodes the nonterminal order into the rule id, so that we
can efficiently to determinal ordering of reductions in GLR parser.

This patch also abstracts to a TestGrammar, which is shared among tests.

This is a part of the GLR parser, https://reviews.llvm.org/D121368,
https://reviews.llvm.org/D121150

Differential Revision: https://reviews.llvm.org/D122303
2022-03-24 14:30:12 +01:00
Haojian Wu 1579090141 Reland "[pseudo] Split greatergreater token."
It was reverted, because the test had a lift-time issue.
Reland f66d3758bd with a fix.
2022-03-22 10:27:52 +01:00
Sam McCall 1f92f44ec9 [pseudo] fix typo'd test assertions 2022-03-21 14:05:21 +01:00
Zequan Wu 217f267efe Revert "[pseudo] Split greatergreater token."
This reverts commit f66d3758bd.

It breaks windows bot.
2022-03-18 10:15:48 -07:00
Haojian Wu 30de15e100 [pseudo] Tweak some docs, NFC
Consitently use the "nonterminal", "pseudoparser" terms.
2022-03-17 13:58:42 +01:00
Haojian Wu f66d3758bd [pseudo] Split greatergreater token.
For a >> token (a right shift operator, or a nested template?), the clang
lexer always returns a single greatergreater token, as a result,
the grammar-based GLR parser never try to parse the nested template
case.

We derive a token stream by always splitting the >> token, so that the
GLR parser is able to pursue both options during parsing (usually 1
path fails).

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D121678
2022-03-17 13:46:58 +01:00
Haojian Wu 5a624956ce [pseudo] Fix some naming-style violations. 2022-03-17 09:47:24 +01:00
Haojian Wu e5b1b9edb8 [pseudo] Cleanup the leftover header guards after the movement, NFC. 2022-03-16 16:25:18 +01:00
Sam McCall 89cd86bbc5 Reapply [pseudo] Move pseudoparser from clang to clang-tools-extra"
This reverts commit 049f4e4eab.

The problem was a stray dependency in CLANG_TEST_DEPS which caused cmake
to fail if clang-pseudo wasn't built. This is now removed.
2022-03-16 01:10:55 +01:00
Sam McCall 049f4e4eab Revert "[pseudo] Move pseudoparser from clang to clang-tools-extra"
This reverts commit b97856c4cf.

Breaks a bunch of bots:
https://lab.llvm.org/buildbot/#/builders/193/builds/8513
2022-03-16 01:06:24 +01:00