Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
Summary:
Currently, a symbol can have only one #include header attached, which
might not work well if the symbol can be imported via different #includes depending
on where it's used. This patch stores multiple #include headers (with # references)
for each symbol, so that CodeCompletion can decide which include to insert.
In this patch, code completion simply picks the most popular include as the default inserted header. We also return all possible includes and their edits in the `CodeCompletion` results.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: mgrang, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51291
llvm-svn: 341304
This patch is a proof-of-concept Dex index implementation. It has
several flaws, which don't allow replacing static MemIndex yet, such as:
* Not being able to handle queries of small size (less than 3 symbols);
a way to solve this is generating trigrams of smaller size and having
such incomplete trigrams in the index structure.
* Speed measurements: while manually editing files in Vim and requesting
autocompletion gives an impression that the performance is at least
comparable with the current static index, having actual numbers is
important because we don't want to hurt the users and roll out slow
code. Eric (@ioeric) suggested that we should only replace MemIndex as
soon as we have the evidence that this is not a regression in terms of
performance. An approach which is likely to be successful here is to
wait until we have benchmark library in the LLVM core repository, which
is something I have suggested in the LLVM mailing lists, received
positive feedback on and started working on. I will add a dependency as
soon as the suggested patch is out for a review (currently there's at
least one complication which is being addressed by
https://github.com/google/benchmark/pull/649). Key performance
improvements for iterators are sorting by cost and the limit iterator.
* Quality measurements: currently, boosting iterator and two-phase
lookup stage are not implemented, without these the quality is likely to
be worse than the current implementation can yield. Measuring quality is
tricky, but another suggestion in the offline discussion was that the
drop-in replacement should only happen after Boosting iterators
implementation (and subsequent query enhancement).
The proposed changes do not affect Clangd functionality or performance,
`DexIndex` is only used in unit tests and not in production code.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50337
llvm-svn: 340175
Summary: Surface it in the completion items C++ API, and when a flag is set.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48938
llvm-svn: 336309
Summary:
Previously, the strings matched LSP completion pretty closely.
The completion label was a single string, for instance. This made
implementing completion itself easy but makes it hard to use the names
in other way, e.g. pretty-printed name in synthesized
documentation/hover.
It also limits our introspection into completion items, which can only
be as precise as the indexed symbols. This change is a prerequisite to
improvements to overload bundling which need to inspect e.g. signature
structure.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48475
llvm-svn: 335360
Summary:
Potential use case: argument go-to-definition result with symbol
information (e.g. function definition in cc file) that might not be in the AST.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D44305
llvm-svn: 327487
Summary:
This is an important ranking signal.
It's off for the dynamic index for now. Correspondingly, tell the index
infrastructure only to report declarations for the dynamic index.
Reviewers: ioeric, hokein
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D44315
llvm-svn: 327275
Summary:
The intent was that [ar] doesn't match "FooBar"; the first character must match
a Head character (hard requirement, not just a low score).
This matches VSCode, and was "tested" but the tests were defective.
The tests expected matches("FooBar") to fail for lack of a match. But instead
it fails because the string should be annotated - matches("FooB[ar]").
This patch makes matches("FooBar") ignore annotations, as was intended.
Fixing the code to reject weak matches for the first char causes problems:
- [bre] no longer matches "HTMLBRElement".
We allow matching against an uppercase char even if we don't think it's head.
Only do this if there's at least one lowercase, to avoid triggering on MACROS
- [print] no longer matches "sprintf".
This is hard to fix without false positives (e.g. [int] vs "sprintf"])
This patch leaves this case broken. A future patch will add a dictionary
providing custom segmentation to common names from the standard library.
Fixed a couple of index tests that indirectly relied on broken fuzzy matching.
Added const in a couple of missing places for consistency with new code.
Subscribers: klimek, ilya-biryukov, jkorous-apple, ioeric, cfe-commits
Differential Revision: https://reviews.llvm.org/D44003
llvm-svn: 326721
Within a TU:
- as now, collect a declaration from the first occurrence of a symbol
(taking clang's canonical declaration)
- when we first see a definition occurrence, copy the symbol and add it
Across TUs/sources:
- mergeSymbol in Merge.h is responsible for combining matching Symbols.
This covers dynamic/static merges and cross-TU merges in the static index.
- it prefers declarations from Symbols that have a definition.
- GlobalSymbolBuilderMain is modified to use mergeSymbol as a reduce step.
Random cleanups (can be pulled out):
- SymbolFromYAML -> SymbolsFromYAML, new singular SymbolFromYAML added
- avoid uninit'd SymbolLocations. Add an idiomatic way to check "absent".
- CanonicalDeclaration (as well as Definition) are mapped as optional in YAML.
- added operator<< for Symbol & SymbolLocation, for debugging
Reviewers: ioeric, hokein
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D42942
llvm-svn: 324735
Summary:
Instead of passing Context explicitly around, we now have a thread-local
Context object `Context::current()` which is an implicit argument to
every function.
Most manipulation of this should use the WithContextValue helper, which
augments the current Context to add a single KV pair, and restores the
old context on destruction.
Advantages are:
- less boilerplate in functions that just propagate contexts
- reading most code doesn't require understanding context at all, and
using context as values in fewer places still
- fewer options to pass the "wrong" context when it changes within a
scope (e.g. when using Span)
- contexts pass through interfaces we can't modify, such as VFS
- propagating contexts across threads was slightly tricky (e.g.
copy vs move, no move-init in lambdas), and is now encapsulated in
the threadpool
Disadvantages are all the usual TLS stuff - hidden magic, and
potential for higher memory usage on threads that don't use the
context. (In practice, it's just one pointer)
Reviewers: ilya-biryukov
Subscribers: klimek, jkorous-apple, ioeric, cfe-commits
Differential Revision: https://reviews.llvm.org/D42517
llvm-svn: 323872
Summary:
This makes performance slower but more predictable (it always processes
every symbol). We need to find ways to make this fast, possibly by precomputing
short queries or capping the number of scored results. But our current approach
is too naive.
It also no longer returns results in a "good" order. In fact it's pathological:
the top N results are ranked from worst to best. Indexes aren't responsible for
ranking and MergedIndex can't do a good job, so I'm pleased that this will make
any hidden assumptions we have more noticeable :-)
Reviewers: hokein
Subscribers: klimek, ilya-biryukov, cfe-commits
Differential Revision: https://reviews.llvm.org/D42060
llvm-svn: 322821
Summary:
We now hide the static/dynamic split from the code completion, behind a
new implementation of the SymbolIndex interface. This will reduce the
complexity of the sema/index merging that needs to be done by
CodeComplete, at a fairly small cost in flexibility.
Reviewers: hokein
Subscribers: klimek, mgorny, ilya-biryukov, cfe-commits
Differential Revision: https://reviews.llvm.org/D42049
llvm-svn: 322480
Summary:
This improves a few things:
- the insert -> freeze -> read sequence is now enforced/communicated by the
type system
- SymbolSlab::const_iterator iterates over symbols, not over id-symbol pairs
- we avoid permanently storing a second copy of the IDs, and the
string map's hashtable
The slab size is now down to 21.8MB for the LLVM project.
Of this only 2.7MB is strings, the rest is #symbols * `sizeof(Symbol)`.
`sizeof(Symbol)` is currently 96, which seems too big - I think
SymbolInfo isn't efficiently packed. That's a topic for another patch!
Also added simple API to see the memory usage/#symbols of a slab, since
it seems likely we will continue to care about this.
Reviewers: ilya-biryukov
Subscribers: klimek, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D41506
llvm-svn: 321412
Summary:
Symbols are not self-contained - it's only safe to hand them out if you
guarantee the lifetime of the underlying data.
Before this lands, I'm going to measure the before/after memory usage of the
LLVM index loaded into memory in a single slab.
Reviewers: hokein
Subscribers: klimek, ilya-biryukov, cfe-commits
Differential Revision: https://reviews.llvm.org/D41483
llvm-svn: 321272
Summary: When scopes are specified, only match symbols from scopes.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: klimek, ilya-biryukov, cfe-commits
Differential Revision: https://reviews.llvm.org/D41367
llvm-svn: 321067
Summary:
o Index interfaces to support using different index sources (e.g. AST index, global index) for code completion, cross-reference finding etc. This patch focuses on code completion.
The following changes in the original patch has been split out.
o Implement an AST-based index.
o Add an option to replace sema code completion for qualified-id with index-based completion.
o Implement an initial naive code completion index which matches symbols that have the query string as substring.
Reviewers: malaperle, sammccall
Reviewed By: sammccall
Subscribers: hokein, klimek, malaperle, mgorny, ilya-biryukov, cfe-commits
Differential Revision: https://reviews.llvm.org/D40548
llvm-svn: 320688