Summary:
Add fuzzy SymbolIndex, where identifier needn't match exactly.
The purpose for this is global autocomplete in clangd. The query will be a
partial identifier up to the cursor, and the results will be suggestions.
It's in include-fixer because:
- it handles SymbolInfos, actually SymbolIndex is exactly the right interface
- it's a good harness for lit testing the fuzzy YAML index
- (Laziness: we can't unit test clangd until reorganizing with a tool/ dir)
Other questionable choices:
- FuzzySymbolIndex, which just refines the contract of SymbolIndex. This is
an interface to allow extension to large monorepos (*cough*)
- an always-true safety check that Identifier == Name is removed from
SymbolIndexManager, as it's not true for fuzzy matching
- exposing -db=fuzzyYaml from include-fixer is not a very useful feature, and
a non-orthogonal ui (fuzziness vs data source). -db=fixed is similar though.
Reviewers: bkramer
Subscribers: cfe-commits, mgorny
Differential Revision: https://reviews.llvm.org/D30720
llvm-svn: 297630
Summary:
Remove line number from Symbol identity.
For our purposes (include-fixer and clangd autocomplete), function overloads
within the same header should mostly be treated as a single combined symbol.
We may want to track individual occurrences (line number, full type info)
and aggregate this during mapreduce, but that's not done here.
Reviewers: hokein, bkramer
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D30685
llvm-svn: 297371
Summary:
Add usage count to find-all-symbols.
FindAllSymbols now finds (most!) main-file usages of the discovered symbols.
The per-TU map output has NumUses=0 or 1 (only one use per file is counted).
The reducer aggregates these to find the number of files that use a symbol.
The NumOccurrences is now set to 1 in the mapper rather than being inferred by
the reducer, for consistency.
The idea here is to use NumUses for ranking: intuitively number of files that
use a symbol is more meaningful than number of files that include the header.
Reviewers: hokein, bkramer
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D30210
llvm-svn: 296446
Instead of just using popularity, we also take into account how similar the
path of the current file is to the path of the header.
Our first approach is to get popularity into a reasonably small scale by taking
log2 (which is roughly intuitive to how humans would bucket popularity), and
multiply that with the number of matching prefix path fragments of the included
header with the current file.
Note that currently we do not take special care for unclean paths containing
"../" or "./".
Differential Revision: https://reviews.llvm.org/D28548
llvm-svn: 291664
If prefix search finds something where nothing can be nested under (e.g.
a variable or macro) don't add it to the result.
This is for cases like:
header.h:
extern int a;
file.cc:
namespace a {
SOME_MACRO
}
We will look up a::SOME_MACRO, which doesn't have any results. Then we
look up 'a' and find something before we ever look up just 'SOME_MACRO'.
With some basic filtering we can avoid this case.
Differential Revision: http://reviews.llvm.org/D20960
llvm-svn: 271671
This sorts based on the popularity of the header, not the symbol. If
there are mutliple matching symbols in one header we take the maximum
popularity for that header and deduplicate. If we know nothing we sort
lexicographically based on the header path.
Differential Revision: http://reviews.llvm.org/D20814
llvm-svn: 271283
Summary:
[include-fixer] collect the number of times a symbols is found in an
indexing run and use it for symbols popularity ranking.
Reviewers: bkramer
Subscribers: cfe-commits, hokein, djasper
Differential Revision: http://reviews.llvm.org/D20804
llvm-svn: 271268