forked from OSchip/llvm-project
d46fa023ca
Fixes [[ https://github.com/llvm/llvm-project/issues/38995 | #38995 ]] This is an attempt to modify the regular expression to identify `@import` and `import` alongside the regular `#include`. The challenging part was not to support `@` in addition to `#` but how to handle everything that comes after the `include|import` keywords. Previously everything that wasn't `"` or `<` was consumed. But as you can see in this example from the issue #38995, there is no `"` or `<` following the keyword: ``` @import Foundation; ``` I experimented with a lot of fancy and useful expressions in [this online regex tool](https://regex101.com) only to find out that some things are simply not supported by the regex implementation in LLVM. * For example the beginning `[\t\ ]*` should be replacable by the horizontal whitespace character `\h*` but this will break the `SortIncludesTest.LeadingWhitespace` test. That's why I've chosen to come back to the basic building blocks. The essential change in this patch is the change from this regular expression: ``` ^[\t\ ]*#[\t\ ]*(import|include)[^"<]*(["<][^">]*[">]) ~ ~~~~~~~~~~~~~~ ^ ^ | | only support # prefix not @ | only support "" and <> as delimiters no support for C++ modules and ; ending. Also this allows for "> or <" or "" or <> which all seems either off or wrong. ``` to this: ``` ^[\t\ ]*[@#][\t\ ]*(import|include)([^"]*("[^"]+")|[^<]*(<[^>]+>)|[\t\ ]*([^;]+;)) ~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ^ ^ ^ ^ ^ | | | | | Now support @ and #. Clearly support "" and <> as well as an include name without enclosing characters. Allows for no mixture of "> or <" or empty include names. ``` Here is how I've tested this patch: ``` ninja clang-Format ninja FormatTests ./tools/clang/unittests/Format/FormatTests --gtest_filter=SortIncludesTest* ``` And if that worked I doubled checked that nothing else broke by running all format checks: ``` ./tools/clang/unittests/Format/FormatTests ``` One side effect of this change is it should partially support [C++20 Module](https://en.cppreference.com/w/cpp/language/modules) `import` lines without the optional `export` in front. Adding this can be a change on its own that shouldn't be too hard. I say partially because the `@` or `#` are currently *NOT* optional in the regular expression. I see an opportunity to optimized the matching to exclude `@include` for example. But eventually these should be caught by the compiler, so... With my change, the matching group is not at a fixed position any longer. I decided to choose the last match (group) that is not empty. Reviewed By: HazardyKnusperkeks Differential Revision: https://reviews.llvm.org/D121370 |
||
---|---|---|
.. | ||
AffectedRangeManager.cpp | ||
AffectedRangeManager.h | ||
BreakableToken.cpp | ||
BreakableToken.h | ||
CMakeLists.txt | ||
ContinuationIndenter.cpp | ||
ContinuationIndenter.h | ||
DefinitionBlockSeparator.cpp | ||
DefinitionBlockSeparator.h | ||
Encoding.h | ||
Format.cpp | ||
FormatInternal.h | ||
FormatToken.cpp | ||
FormatToken.h | ||
FormatTokenLexer.cpp | ||
FormatTokenLexer.h | ||
MacroExpander.cpp | ||
Macros.h | ||
NamespaceEndCommentsFixer.cpp | ||
NamespaceEndCommentsFixer.h | ||
QualifierAlignmentFixer.cpp | ||
QualifierAlignmentFixer.h | ||
SortJavaScriptImports.cpp | ||
SortJavaScriptImports.h | ||
TokenAnalyzer.cpp | ||
TokenAnalyzer.h | ||
TokenAnnotator.cpp | ||
TokenAnnotator.h | ||
UnwrappedLineFormatter.cpp | ||
UnwrappedLineFormatter.h | ||
UnwrappedLineParser.cpp | ||
UnwrappedLineParser.h | ||
UsingDeclarationsSorter.cpp | ||
UsingDeclarationsSorter.h | ||
WhitespaceManager.cpp | ||
WhitespaceManager.h |