This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.
Reviewed-by: ymandel, xazax.hun
Differential Revision: https://reviews.llvm.org/D120149
This patch introduces a dense implementation of the LR parsing table, which is
used by LR parsers.
We build a SLR(1) parsing table from the LR(0) graph.
Statistics of the LR parsing table on the C++ spec grammar:
- number of states: 1449
- number of actions: 83069
- size of the table (bytes): 334928
Differential Revision: https://reviews.llvm.org/D118196
In 529aa4b011
by setting the identifier info to nullptr, we started to subtly
interfere with the parts in the beginning of the function,
529aa4b011/clang/lib/Format/UnwrappedLineParser.cpp (L991)
causing the preprocessor nesting to change in some cases. E.g., for the
added regression test, clang-format started incorrectly guessing the
language as C++.
This tries to address this by introducing an internal identifier info
element to use instead.
Reviewed By: curdeius, MyDeveloperDay
Differential Revision: https://reviews.llvm.org/D120315
Adds a new option InsertBraces to insert the optional braces after
if, else, for, while, and do in C++.
Differential Revision: https://reviews.llvm.org/D120217
This reverts commit e021987273.
This commit provokes failures in formatting tests of polly.
Cf. https://lab.llvm.org/buildbot/#/builders/205/builds/3320.
That's probably because of `)` being annotated as `CastRParen` instead of `Unknown` before, hence being kept on the same line with the next token.
Fixes https://github.com/llvm/llvm-project/issues/53876.
This is a solution for standard C++ casts: const_cast, dynamic_cast, reinterpret_cast, static_cast.
A general approach handling all possible casts is not possible without semantic information.
Consider the code:
```
static_cast<T>(*function_pointer_variable)(arguments);
```
vs.
```
some_return_type<T> (*function_pointer_variable)(parameters);
// Later used as:
function_pointer_variable = &some_function;
return function_pointer_variable(args);
```
In the latter case, it's not a cast but a variable declaration of a pointer to function.
Without knowing what `some_return_type<T>` is (and clang-format does not know it), it's hard to distinguish between the two cases. Theoretically, one could check whether "parameters" are types (not a cast) and "arguments" are value/expressions (a cast), but that might be inefficient (needs lots of lookahead).
Reviewed By: MyDeveloperDay, HazardyKnusperkeks, owenpan
Differential Revision: https://reviews.llvm.org/D120140
Add `ObjCProtocolLoc` which behaves like `TypeLoc` but for
`ObjCProtocolDecl` references.
RecursiveASTVisitor now synthesizes `ObjCProtocolLoc` during traversal
and the `ObjCProtocolLoc` can be stored in a `DynTypedNode`.
In a follow up patch, I'll update clangd to make use of this
to properly support protocol references for hover + goto definition.
Differential Revision: https://reviews.llvm.org/D119363
This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.
Reviewed-by: xazax.hun
Differential Revision: https://reviews.llvm.org/D119953
Fixes https://github.com/llvm/llvm-project/issues/24781.
Fixes https://github.com/llvm/llvm-project/issues/38160.
This patch splits `TT_RecordLBrace` for classes/enums/structs/unions (and other records, e.g. interfaces) and uses the brace type to avoid the error-prone scanning for record token.
The mentioned bugs were provoked by the scanning being too limited (and so not considering `const` or `constexpr`, or other qualifiers, on an anonymous struct variable declaration).
Moreover, the proposed solution is more efficient as we parse tokens once only (scanning being parsing too).
Reviewed By: MyDeveloperDay, HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D119785
We can now configure the space between requires and the following paren,
seperate for clauses and expressions.
Differential Revision: https://reviews.llvm.org/D113369
Detect requires expressions in more unusable contexts. This is far from
perfect, but currently we have no good metric to decide between a
requires expression and a trailing requires clause.
Differential Revision: https://reviews.llvm.org/D119138
Previously, Transformer would invoke the consumer once per file modified per
match, in addition to any errors encountered. The consumer is not aware of which
AtomicChanges come from any particular match. It is unclear which sets of edits
may be related or whether an error invalidates any previously emitted changes.
Modify the signature of the consumer to accept a set of changes. This keeps
related changes (i.e. all edits from a single match) together, and clarifies
that errors don't produce partial changes.
Reviewed By: ymandel
Differential Revision: https://reviews.llvm.org/D119745
The minimizer strips out single-line comments (introduced by `//`). This sequence of characters can also appear in `#include` or `#import` directives where they play the role of path separators. We already avoid stripping this character sequence for `#include` but not for `#import` (which has the same semantics). This patch makes it so `#import <A//A.h>` is not affected by minimization. Previously, we would incorrectly reduce it into `#import <A`.
Reviewed By: arphaman
Differential Revision: https://reviews.llvm.org/D119226
The minimizer tries to squash multi-line macro definitions into single line. For that to work, contents of each line need to be separated by a space. Since we always strip leading whitespace on lines of a macro definition, the code currently tries to preserve exactly one space that appeared before the backslash.
This means the following code:
```
#define FOO(BAR) \
#BAR \
baz
```
gets minimized into:
```
#define FOO(BAR) #BAR baz
```
However, if there are no spaces before the backslash on line 2:
```
#define FOO(BAR) \
#BAR\
baz
```
no space can be preserved, leading to (most likely) malformed macro definition:
```
#define FOO(BAR) #BARbaz
```
This patch makes sure we always put exactly one space at the end of line ending with a backslash.
Reviewed By: arphaman
Differential Revision: https://reviews.llvm.org/D119231
Recently we observed high memory pressure caused by clang during some parallel builds.
We discovered that we have several projects that have a large number of #define directives
in their TUs (on the order of millions), which caused huge memory consumption in clang due
to a lot of allocations for MacroInfo. We would like to reduce the memory overhead of
clang for a single #define to reduce the memory overhead for these files, to allow us to
reduce the memory pressure on the system during highly parallel builds. This change achieves
that by removing the SmallVector in MacroInfo and instead storing the tokens in an array
allocated using the bump pointer allocator, after all tokens are lexed.
The added unit test with 1000000 #define directives illustrates the problem. Prior to this
change, on arm64 macOS, clang's PP bump pointer allocator allocated 272007616 bytes, and
used roughly 272 bytes per #define. After this change, clang's PP bump pointer allocator
allocates 120002016 bytes, and uses only roughly 120 bytes per #define.
For an example test file that we have internally with 7.8 million #define directives, this
change produces the following improvement on arm64 macOS: Persistent allocation footprint for
this test case file as it's being compiled to LLVM IR went down 22% from 5.28 GB to 4.07 GB
and the total allocations went down 14% from 8.26 GB to 7.05 GB. Furthermore, this change
reduced the total number of allocations made by the system for this clang invocation from
1454853 to 133663, an order of magnitude improvement.
The recommit fixes the LLDB build failure.
Differential Revision: https://reviews.llvm.org/D117348
At import of a member it may require that the record is already set to complete.
(For example 'computeDependence' at create of some Expr nodes.)
The record at this time may not be completely imported, the result of layout
calculations can be incorrect, but at least no crash occurs this way.
A good solution would be if fields of every encountered record are imported
before other members of all records. This is much more difficult to implement.
Differential Revision: https://reviews.llvm.org/D116155
Fixes https://github.com/llvm/llvm-project/issues/53758.
Braces in loops and in `if` statements with leading (block) comments were formatted according to `BraceWrapping.AfterFunction` and not `AllowShortBlocksOnASingleLine`/`AllowShortLoopsOnASingleLine`/`AllowShortIfStatementsOnASingleLine`.
Previously, the code:
```
while (true) {
f();
}
/*comment*/ while (true) {
f();
}
```
was incorrectly formatted to:
```
while (true) {
f();
}
/*comment*/ while (true) { f(); }
```
when using config:
```
BasedOnStyle: LLVM
BreakBeforeBraces: Custom
BraceWrapping:
AfterFunction: false
AllowShortBlocksOnASingleLine: false
AllowShortLoopsOnASingleLine: false
```
and it was (correctly but by chance) formatted to:
```
while (true) {
f();
}
/*comment*/ while (true) {
f();
}
```
when using enabling brace wrapping after functions:
```
BasedOnStyle: LLVM
BreakBeforeBraces: Custom
BraceWrapping:
AfterFunction: true
AllowShortBlocksOnASingleLine: false
AllowShortLoopsOnASingleLine: false
```
Reviewed By: MyDeveloperDay, HazardyKnusperkeks, owenpan
Differential Revision: https://reviews.llvm.org/D119649
Recently we observed high memory pressure caused by clang during some parallel builds.
We discovered that we have several projects that have a large number of #define directives
in their TUs (on the order of millions), which caused huge memory consumption in clang due
to a lot of allocations for MacroInfo. We would like to reduce the memory overhead of
clang for a single #define to reduce the memory overhead for these files, to allow us to
reduce the memory pressure on the system during highly parallel builds. This change achieves
that by removing the SmallVector in MacroInfo and instead storing the tokens in an array
allocated using the bump pointer allocator, after all tokens are lexed.
The added unit test with 1000000 #define directives illustrates the problem. Prior to this
change, on arm64 macOS, clang's PP bump pointer allocator allocated 272007616 bytes, and
used roughly 272 bytes per #define. After this change, clang's PP bump pointer allocator
allocates 120002016 bytes, and uses only roughly 120 bytes per #define.
For an example test file that we have internally with 7.8 million #define directives, this
change produces the following improvement on arm64 macOS: Persistent allocation footprint for
this test case file as it's being compiled to LLVM IR went down 22% from 5.28 GB to 4.07 GB
and the total allocations went down 14% from 8.26 GB to 7.05 GB. Furthermore, this change
reduced the total number of allocations made by the system for this clang invocation from
1454853 to 133663, an order of magnitude improvement.
Differential Revision: https://reviews.llvm.org/D117348
`CallDescriptions` for builtin functions relaxes the match rules
somewhat, so that the `CallDescription` will match for calls that have
some prefix or suffix. This was achieved by doing a `StringRef::contains()`.
However, this is somewhat problematic for builtins that are substrings
of each other.
Consider the following:
`CallDescription{ builtin, "memcpy"}` will match for
`__builtin_wmemcpy()` calls, which is unfortunate.
This patch addresses/works around the issue by checking if the
characters around the function's name are not part of the 'name'
semantically. In other words, to accept a match for `"memcpy"` the call
should not have alphanumeric (`[a-zA-Z]`) characters around the 'match'.
So, `CallDescription{ builtin, "memcpy"}` will not match on:
- `__builtin_wmemcpy: there is a `w` alphanumeric character before the match.
- `__builtin_memcpyFOoBar_inline`: there is a `F` character after the match.
- `__builtin_memcpyX_inline`: there is an `X` character after the match.
But it will still match for:
- `memcpy`: exact match
- `__builtin_memcpy`: there is an _ before the match
- `__builtin_memcpy_inline`: there is an _ after the match
- `memcpy_inline_builtinFooBar`: there is an _ after the match
Reviewed By: NoQ
Differential Revision: https://reviews.llvm.org/D118388
There is a clangd crash at `__memcmp_avx2_movbe`. Short problem description is below.
The method `HeaderIncludes::addExistingInclude` stores `Include` objects by reference at 2 places: `ExistingIncludes` (primary storage) and `IncludesByPriority` (pointer to the object's location at ExistingIncludes). `ExistingIncludes` is a map where value is a `SmallVector`. A new element is inserted by `push_back`. The operation might do resize. As result pointers stored at `IncludesByPriority` might become invalid.
Typical stack trace
```
frame #0: 0x00007f11460dcd94 libc.so.6`__memcmp_avx2_movbe + 308
frame #1: 0x00000000004782b8 clangd`llvm::StringRef::compareMemory(Lhs="
\"t2.h\"", Rhs="", Length=6) at StringRef.h:76:22
frame #2: 0x0000000000701253 clangd`llvm::StringRef::compare(this=0x0000
7f10de7d8610, RHS=(Data = "", Length = 7166742329480737377)) const at String
Ref.h:206:34
* frame #3: 0x00000000007603ab clangd`llvm::operator<(llvm::StringRef, llv
m::StringRef)(LHS=(Data = "\"t2.h\"", Length = 6), RHS=(Data = "", Length =
7166742329480737377)) at StringRef.h:907:23
frame #4: 0x0000000002d0ad9f clangd`clang::tooling::HeaderIncludes::inse
rt(this=0x00007f10de7fb1a0, IncludeName=(Data = "t2.h\"", Length = 4), IsAng
led=false) const at HeaderIncludes.cpp:365:22
frame #5: 0x00000000012ebfdd clangd`clang::clangd::IncludeInserter::inse
rt(this=0x00007f10de7fb148, VerbatimHeader=(Data = "\"t2.h\"", Length = 6))
const at Headers.cpp:262:70
```
A unit test test for the crash was created (`HeaderIncludesTest.RepeatedIncludes`). The proposed solution is to use std::list instead of llvm::SmallVector
Test Plan
```
./tools/clang/unittests/Tooling/ToolingTests --gtest_filter=HeaderIncludesTest.RepeatedIncludes
```
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D118755
Fixes https://github.com/llvm/llvm-project/issues/53576.
There was an inconsistency in formatting of delete expressions.
Before:
```
delete (void*)a;
delete[](void*) a;
```
After this patch:
```
delete (void*)a;
delete[] (void*)a;
```
Reviewed By: HazardyKnusperkeks, owenpan
Differential Revision: https://reviews.llvm.org/D119117
LRGraph is the key component of the clang pseudo parser, it is a
deterministic handle-finding finite-state machine, which is used to
generated the LR parsing table.
Separate from https://reviews.llvm.org/D118196.
Differential Revision: https://reviews.llvm.org/D119172
This will allow moving the IncludeCleaner library essentials to Clang
and decoupling them from the majority of clangd.
The patch itself just moves the code, it doesn't change existing
functionality.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D119130
- Add or remove empty lines surrounding union blocks.
- Fixes https://github.com/llvm/llvm-project/issues/53229, in which
keywords like class and struct in a line ending with left brace or
whose next line is left brace only, will be falsely recognized as
definition line, causing extra empty lines inserted surrounding blocks
with no need to be formatted.
Reviewed By: MyDeveloperDay, curdeius, HazardyKnusperkeks, owenpan
Differential Revision: https://reviews.llvm.org/D119067