Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
Summary:
Currently, a symbol can have only one #include header attached, which
might not work well if the symbol can be imported via different #includes depending
on where it's used. This patch stores multiple #include headers (with # references)
for each symbol, so that CodeCompletion can decide which include to insert.
In this patch, code completion simply picks the most popular include as the default inserted header. We also return all possible includes and their edits in the `CodeCompletion` results.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: mgrang, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51291
llvm-svn: 341304
SymbolCollector will be used for two cases:
- collect Symbol type only, used for indexing preamble AST.
- collect Symbol and SymbolOccurrences, used for indexing main AST.
For finding local references from the AST, we will implement it in other ways.
llvm-svn: 341208
* Use consistent assertion messages in iterators implementations
* Silence a bunch of clang-tidy warnings: use `emplace_back` instead of
`push_back` where possible, make sure arguments have the same name in
header and implementation file, use for loop over ranges where possible
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D51528
llvm-svn: 341190
This patch introduces iterator cost concept to improve the performance
of Dex query iterators (mainly, AND iterator). Benchmarks show that the
queries become ~10% faster.
Before
```
-------------------------------------------------------
Benchmark Time CPU Iteration
-------------------------------------------------------
DexAdHocQueries 5883074 ns 5883018 ns 117
DexRealQ 959904457 ns 959898507 ns 1
```
After
```
-------------------------------------------------------
Benchmark Time CPU Iteration
-------------------------------------------------------
DexAdHocQueries 5238403 ns 5238361 ns 130
DexRealQ 873275207 ns 873269453 ns 1
```
Reviewed by: sammccall
Differential Revision: https://reviews.llvm.org/D51310
llvm-svn: 341057
Stop using `$$$` (empty) trigram and generating a posting list with all
items. Since TRUE iterator is already implemented and correctly inserted
when there are no real trigram posting lists, this is a valid
transformation.
Benchmarks show that this simple change allows ~30% speedup on dataset
of real completion queries.
Before
```
-------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------
DexAdHocQueries 5640321 ns 5640265 ns 120
DexRealQ 939835603 ns 939830296 ns 1
```
After
```
-------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------
DexAdHocQueries 3452014 ns 3451987 ns 203
DexRealQ 667455912 ns 667455750 ns 1
```
Reviewed by: ilya-biryukov
Differential Revision: https://reviews.llvm.org/D51287
llvm-svn: 340729
This patch introduces LIMIT iterator, which is very important for
improving the quality of search query. LIMIT iterators can be applied on
top of BOOST iterators to prevent populating query request with a huge
number of low-quality symbols.
Reviewed by: sammccall
Differential Revision: https://reviews.llvm.org/D51029
llvm-svn: 340605
Summary:
For index-based code completion, send an asynchronous speculative index
request, based on the index request for the last code completion on the same
file and the filter text typed before the cursor, before sema code completion
is invoked. This can reduce the code completion latency (by roughly latency of
sema code completion) if the speculative request is the same as the one
generated for the ongoing code completion from sema. As a sequence of code
completions often have the same scopes and proximity paths etc, this should be
effective for a number of code completions.
Trace with speculative index request:{F6997544}
Reviewers: hokein, ilya-biryukov
Reviewed By: ilya-biryukov
Subscribers: javed.absar, jfb, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D50962
llvm-svn: 340604
This patch prints information about built index size estimation to
verbose logs. This is useful for optimizing memory usage of DexIndex and
comparisons with MemIndex.
Reviewed by: sammccall
Differential Revision: https://reviews.llvm.org/D51154
llvm-svn: 340601
This patch introduces BOOST iterator - a substantial block for efficient
and high-quality symbol retrieval. The concept of boosting allows
performing computationally inexpensive scoring on the query side so that
the final (expensive) scoring can only be applied on the items with the
highest preliminary score while eliminating the need to score too many
items.
Reviewed by: ilya-biryukov
Differential Revision: https://reviews.llvm.org/D50970
llvm-svn: 340409
Summary:
It was previously only indexing the preamble decls. The new
implementation will index both the preamble and the main AST and
report both sets of symbols, preferring the ones from the main AST
whenever the symbol is present in both.
The symbols in the main AST slab always store all information
available in the preamble symbols, possibly adding more,
e.g. definition locations.
Reviewers: hokein, ioeric
Reviewed By: ioeric
Subscribers: kadircet, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D50889
llvm-svn: 340404
This patch adds hidden Clangd flag ("use-dex-index") which replaces
(currently) default `MemIndex` with `DexIndex` for the static index.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50897
llvm-svn: 340262
This patch is a proof-of-concept Dex index implementation. It has
several flaws, which don't allow replacing static MemIndex yet, such as:
* Not being able to handle queries of small size (less than 3 symbols);
a way to solve this is generating trigrams of smaller size and having
such incomplete trigrams in the index structure.
* Speed measurements: while manually editing files in Vim and requesting
autocompletion gives an impression that the performance is at least
comparable with the current static index, having actual numbers is
important because we don't want to hurt the users and roll out slow
code. Eric (@ioeric) suggested that we should only replace MemIndex as
soon as we have the evidence that this is not a regression in terms of
performance. An approach which is likely to be successful here is to
wait until we have benchmark library in the LLVM core repository, which
is something I have suggested in the LLVM mailing lists, received
positive feedback on and started working on. I will add a dependency as
soon as the suggested patch is out for a review (currently there's at
least one complication which is being addressed by
https://github.com/google/benchmark/pull/649). Key performance
improvements for iterators are sorting by cost and the limit iterator.
* Quality measurements: currently, boosting iterator and two-phase
lookup stage are not implemented, without these the quality is likely to
be worse than the current implementation can yield. Measuring quality is
tricky, but another suggestion in the offline discussion was that the
drop-in replacement should only happen after Boosting iterators
implementation (and subsequent query enhancement).
The proposed changes do not affect Clangd functionality or performance,
`DexIndex` is only used in unit tests and not in production code.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50337
llvm-svn: 340175
Proposed changes:
* Cleanup comments in `clangd/index/dex/Iterator.h`: Vim's `gq`
formatting added redundant spaces instead of newlines in few
places
* Few comments in `OrIterator` are wrong
* Use `EXPECT_TRUE(Condition)` instead of
`EXPECT_THAT(Condition, true)` (same with `EXPECT_FALSE`)
* Don't expose `dump()` method to the public by misplacing
`private:`
This patch does not affect functionality.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50956
llvm-svn: 340157
This patch improves `dex::Iterator` string representation by
incorporating the information about the element which is currently being
pointed to by the `DocumentIterator`.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50689
llvm-svn: 339877
This patch handles trigram generation "short" identifiers and queries.
Trigram generator produces incomplete trigrams for short names so that
the same query iterator API can be used to match symbols which don't
have enough symbols to form a trigram and correctly handle queries which
also are not sufficient for generating a full trigram.
Reviewed by: ioeric
Differential revision: https://reviews.llvm.org/D50517
llvm-svn: 339548
This patch modifies `consume` function to allow retrieval of limited
number of symbols. This is the "cheap" implementation of top-level
limiting iterator. In the future we would like to have a complete limit
iterator implementation to insert it into the query subtrees, but in the
meantime this version would be enough for a fully-functional
proof-of-concept Dex implementation.
Reviewers: ioeric, ilya-biryukov
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50500
llvm-svn: 339426
Summary:
This is the first step of implementing Xrefs in clangd:
- add index interfaces, and related data structures.
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D49658
llvm-svn: 339011
Summary:
The implicit bool conversion could happen superisingly, e.g. when
checking `if (Loc1 == Loc2)`, the compiler will convert SymbolLocation to
bool before comparing (because we don't define operator `==` for SymbolLocation).
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D49657
llvm-svn: 338517
The original Dex Iterators patch (https://reviews.llvm.org/rL338017)
caused problems for Clang 3.6 and Clang 3.7 due to the compiler bug
which prevented inferring template parameter (`Size`) in create(And|Or)?
functions. It was reverted in https://reviews.llvm.org/rL338054.
In this revision the mentioned helper functions were replaced with
variadic templated versions.
Proposed changes were tested on multiple compiler versions, including
Clang 3.6 which originally caused the failure.
llvm-svn: 338116
This patch introduces three essential types of query iterators:
`DocumentIterator`, `AndIterator`, `OrIterator`. It provides a
convenient API for query tree generation and serves as a building block
for the next generation symbol index - Dex. Currently, many
optimizations are missed to improve code readability and to serve as the
reference implementation. Potential improvements are briefly mentioned
in `FIXME`s and will be addressed in the following patches.
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
Iterators, their applications and potential extensions are explained in
detail in the design proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: ioeric, sammccall, ilya-biryukov
Subscribers: cfe-commits, klimek, jfb, mgrang, mgorny, MaskRay, jkorous,
arphaman
Differential Revision: https://reviews.llvm.org/D49546
llvm-svn: 338017
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
Summary:
log() is split into four functions:
- elog()/log()/vlog() have different severity levels, allowing filtering
- dlog() is a lazy macro which uses LLVM_DEBUG - it logs to the logger, but
conditionally based on -debug-only flag and is omitted in release builds
All logging functions use formatv-style format strings now, e.g:
log("Could not resolve URI {0}: {1}", URI, Result.takeError());
Existing log sites have been split between elog/log/vlog by best guess.
This includes a workaround for passing Error to formatv that can be
simplified when D49170 or similar lands.
Subscribers: ilya-biryukov, javed.absar, ioeric, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D49008
llvm-svn: 336785
Summary: This is not enabled in the global-symbol-builder or dynamic index yet.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D49028
llvm-svn: 336553
Summary: Surface it in the completion items C++ API, and when a flag is set.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48938
llvm-svn: 336309
Summary:
Previously, the strings matched LSP completion pretty closely.
The completion label was a single string, for instance. This made
implementing completion itself easy but makes it hard to use the names
in other way, e.g. pretty-printed name in synthesized
documentation/hover.
It also limits our introspection into completion items, which can only
be as precise as the indexed symbols. This change is a prerequisite to
improvements to overload bundling which need to inspect e.g. signature
structure.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48475
llvm-svn: 335360
Summary:
The qualified name can be used to match a completion item to its corresponding
symbol. This can be useful for tools that measure code completion quality.
Qualified names are not precise for identifying symbols; we need to figure out a
better way to identify completion items.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48425
llvm-svn: 335334
Summary:
It's almost always identical to Name, and in fact we never used it (we used name
instead).
The only case where they differ is objc method selectors (foo: vs foo:bar:).
We can live with the latter for both name and filterText, so I've made that
change too.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48375
llvm-svn: 335321
Summary: This allows tools to examine symbols that would be collected in a symbol index. For example, a tool that measures index-based completion quality would be interested in references to symbols that are collected in the index.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48418
llvm-svn: 335218
Summary:
This allows dynamic index to have consistent URI schemes with the
static index which can have customized URI schemes, which would make file
proximity scoring based on URIs easier.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D47931
llvm-svn: 334809
Summary:
This adds more symbols to the index:
- member variables and functions
- enum constants in scoped enums
The code completion behavior should remain intact but workspace symbols should
now provide much more useful symbols.
Other symbols should be considered such as the ones in "main files" (files not
being included) but this can be done separately as this introduces its fair
share of problems.
Signed-off-by: Marc-Andre Laperle <marc-andre.laperle@ericsson.com>
Reviewers: ioeric, sammccall
Reviewed By: ioeric, sammccall
Subscribers: hokein, sammccall, jkorous, klimek, ilya-biryukov, jkorous-apple, ioeric, MaskRay, cfe-commits
Differential Revision: https://reviews.llvm.org/D44954
llvm-svn: 334017