This is the intended way to request that build-time tools be built in a
distinct configuration.
This is set implicitly by LLVM_OPTIMIZED_TABLEGEN, which may be
surprising, but if undesired this should be fixed elsewhere.
Should fix crbug.com/1330304
Use the LLVM build system's cross-compilation support for the tool, so
that the build works for both host and cross-compilation scenarios.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D126397
Virtually all LLVM tools accept a `-o` flag, so add one. This will make it
possible to possibly add a --write-if-changed flag later. It also makes it
so that the file isn't partially written if the tool oesn't run successfully.
Marking --grammar as `Required` allows removing some manual
verification code for it.
Differential Revision: https://reviews.llvm.org/D126373
The main idea is to compile the cxx grammar at build time, and construct
the core pieces (Grammar, LRTable) of the pseudoparse based on the compiled
data sources.
This is a tiny implementation, which is good for start:
- defines how the public API should look like;
- integrates the cxx grammar compilation workflow with the cmake system.
- onlynonterminal symbols of the C++ grammar are compiled, anything
else are still doing the real compilation work at runtime, we can opt-in more
bits in the future;
- splits the monolithic clangPsuedo library for better layering;
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D125667
Error-tolerant bracket matching enables our error-tolerant parsing strategies.
The implementation here is *not* yet error tolerant: this patch sets up the APIs
and plumbing, and describes the planned approach.
Differential Revision: https://reviews.llvm.org/D125911
Explicitly sizing Kind enum suggests that too-large values are allowed,
and that putting it in a bitfield is dangerous.
GCC doesn't like condition ? integer : enum.
- add missing benchmark for lex/preprocess steps
- name benchmarks after the function they're benchmarking, when appropriate
- remove unergonomic "run" prefixes from benchmark names
- give a useful error message if --grammar or --source are missing
- Use realistic example of how to run, run all benchmarks by default.
(for someone who doesn't know the commands, this is the most useful action)
- Improve typos/wording in comment
- clean up unused vars
- avoid "parseable stream" name, which isn't a great name & not one I expected
to escape from ClangPseudoMain
Differential Revision: https://reviews.llvm.org/D125312
With this patch, we're able to parse smaller chunks of C++ code (statement,
declaration), rather than translation-unit.
The start symbol is listed in the grammar in a form of `_ :=
statement`, each start symbol has a dedicated state (`_ := • statement`).
We create and track all these separate states in the LRTable. When we
start parsing, we lookup the corresponding state to start the parser.
LR pasing table changes with this patch:
- number of states: 1467 -> 1471
- number of actions: 82891 -> 83578
- size of the table (bytes): 334248 -> 336996
Differential Revision: https://reviews.llvm.org/D125006
All llvm-project fuzzers use this library to parse command-line arguments.
Many of them don't deal with LLVM IR or modules in any way. Bundling those
functions in one library forces build dependencies that don't need to be there.
Among other things, this means check-clang-pseudo no longer depends on most of
LLVM.
Differential Revision: https://reviews.llvm.org/D125081
This includes only the taken branch of conditional sections.
The API allows for producing a stream for a particular PP branch, which
will be used later for the secondary GLR parses of not-taken branches.
Differential Revision: https://reviews.llvm.org/D123243
As confirmation, running this locally found 2 crashes:
- trivial: crashes on file with no tokens
- lexer: hits an assertion failure on bytes: 0x5c,0xa,0x5c,0x1,0x65,0x5c,0xa
Differential Revision: https://reviews.llvm.org/D125037
It turns out clang::expandUCNs only works on tokens that contain valid UCNs
and no other random escapes, and clang only uses it on raw_identifiers.
Currently we can hit an assertion by creating tokens with stray non-valid-UCN
backslashes in them.
Fortunately, expanding UCNs in raw_identifiers is actually all we need.
Most tokens (keywords, punctuation) can't have them. UCNs in literals can be
treated as escape sequences like \n even this isn't the standard's
interpretation. This more or less matches how clang works.
(See https://isocpp.org/files/papers/P2194R0.pdf which points out that the
standard's description of how UCNs work is misaligned with real implementations)
Differential Revision: https://reviews.llvm.org/D125049
This patch implements a standard GLR parsing algorithm, the
core piece of the pseudoparser.
- it parses preprocessed C++ code, currently it supports correct code
only and parse them as a translation-unit;
- it produces a forest which stores all possible trees in an efficient
manner (only a single node being build for per (SymbolID, Token Range));
no disambiguation yet;
Reland with a fix for g++'s -fpermissive error on previous declaration `GSS& GSS;`.
Differential Revision: https://reviews.llvm.org/D121150
This patch implements a standard GLR parsing algorithm, the
core piece of the pseudoparser.
- it parses preprocessed C++ code, currently it supports correct code
only and parse them as a translation-unit;
- it produces a forest which stores all possible trees in an efficient
manner (only a single node being build for per (SymbolID, Token Range));
no disambiguation yet;
Differential Revision: https://reviews.llvm.org/D121150
The code was written to handle nullable grammar, and we disallow
nullable grammar, so it is not necessary to keep it around.
Differential Revision: https://reviews.llvm.org/D124827
> Includes regression test for problem noted by @hans.
> is reverts commit 973de71.
>
> Differential Revision: https://reviews.llvm.org/D106898
Feature implemented as-is is fairly expensive and hasn't been used by
libc++. A potential reimplementation is possible if libc++ become
interested in this feature again.
Differential Revision: https://reviews.llvm.org/D123885
In files where different preprocessing paths are possible, our goal is to
choose a preprocessed token sequence which we can parse that pins down as much
of the grammatical structure as possible.
This forms the "primary parse", and the not-taken branches get parsed later,
and are constrained to be compatible with the primary parse.
Concretely:
int x =
#ifdef // TAKEN
2 + 2 + 2 // determined during primary parse to be an expression
#else
2 // constrained to be an expression during a secondary parse
#endif
;
Differential Revision: https://reviews.llvm.org/D121165
CLANG_TOOLS_DIR holds the the current bin/ directory, maybe with a %(build_mode)
placeholder. It is used to add the just-built binaries to $PATH for lit tests.
In most cases it equals LLVM_TOOLS_DIR, which is used for the same purpose.
But for a standalone build of clang, CLANG_TOOLS_DIR points at the build tree
and LLVM_TOOLS_DIR points at the provided LLVM binaries.
Currently CLANG_TOOLS_DIR is set in clang/test/, clang-tools-extra/test/, and
other things always built with clang. This is a few cryptic lines of CMake in
each place. Meanwhile LLVM_TOOLS_DIR is provided by configure_site_lit_cfg().
This patch moves CLANG_TOOLS_DIR to configure_site_lit_cfg() and renames it:
- there's nothing clang-specific about the value
- it will also replace LLD_TOOLS_DIR, LLDB_TOOLS_DIR etc (not in this patch)
It also defines CURRENT_LIBS_DIR. While I removed the last usage of
CLANG_LIBS_DIR in e4cab4e24d, there are LLD_LIBS_DIR usages etc that
may be live, and I'd like to mechanically update them in a followup patch.
Differential Revision: https://reviews.llvm.org/D121763
Parse forest is the output of the GLR parser, it is a tree-like DAG
which presents all possible parse trees without duplicating subparse structures.
This is a patch split from https://reviews.llvm.org/D121150.
Differential Revision: https://reviews.llvm.org/D122139
Reductions need to be performed in a careful order in GLR parser, to
make sure we gather all alternatives before creating an ambigous forest
node.
This patch encodes the nonterminal order into the rule id, so that we
can efficiently to determinal ordering of reductions in GLR parser.
This patch also abstracts to a TestGrammar, which is shared among tests.
This is a part of the GLR parser, https://reviews.llvm.org/D121368,
https://reviews.llvm.org/D121150
Differential Revision: https://reviews.llvm.org/D122303
For a >> token (a right shift operator, or a nested template?), the clang
lexer always returns a single greatergreater token, as a result,
the grammar-based GLR parser never try to parse the nested template
case.
We derive a token stream by always splitting the >> token, so that the
GLR parser is able to pursue both options during parsing (usually 1
path fails).
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D121678
This reverts commit 049f4e4eab.
The problem was a stray dependency in CLANG_TEST_DEPS which caused cmake
to fail if clang-pseudo wasn't built. This is now removed.