Commit Graph

23 Commits

Author SHA1 Message Date
Sam McCall 9c7624e14b [clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.

Old and busted:
 I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
 symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).

New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
 holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
 data alive.
 I update by building a new MemIndex and calling SwapIndex::reset().

Reviewers: kbobyrev, ioeric

Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D51422

llvm-svn: 341318
2018-09-03 14:37:43 +00:00
Fangrui Song 399943bc76 [clangd] Fix many typos. NFC
llvm-svn: 341273
2018-09-01 07:47:03 +00:00
Kirill Bobyrev 493b1627ca [NFC] Cleanup Dex
* Use consistent assertion messages in iterators implementations
* Silence a bunch of clang-tidy warnings: use `emplace_back` instead of
  `push_back` where possible, make sure arguments have the same name in
  header and implementation file, use for loop over ranges where possible

Reviewed by: ioeric

Differential Revision: https://reviews.llvm.org/D51528

llvm-svn: 341190
2018-08-31 09:17:02 +00:00
Kirill Bobyrev a2f146fd9c [clangd] Remove UB introduced in rL341057
llvm-svn: 341066
2018-08-30 13:30:34 +00:00
Kirill Bobyrev 38bdac5db8 [clangd] Implement iterator cost
This patch introduces iterator cost concept to improve the performance
of Dex query iterators (mainly, AND iterator). Benchmarks show that the
queries become ~10% faster.

Before

```
-------------------------------------------------------
Benchmark                Time           CPU Iteration
-------------------------------------------------------
DexAdHocQueries    5883074 ns    5883018 ns        117
DexRealQ         959904457 ns  959898507 ns          1
```

After

```
-------------------------------------------------------
Benchmark                Time           CPU Iteration
-------------------------------------------------------
DexAdHocQueries    5238403 ns    5238361 ns        130
DexRealQ         873275207 ns  873269453 ns          1
```

Reviewed by: sammccall

Differential Revision: https://reviews.llvm.org/D51310

llvm-svn: 341057
2018-08-30 11:23:58 +00:00
Kirill Bobyrev b217ddb1bb [clangd] Use TRUE iterator instead of complete posting list
Stop using `$$$` (empty) trigram and generating a posting list with all
items. Since TRUE iterator is already implemented and correctly inserted
when there are no real trigram posting lists, this is a valid
transformation.

Benchmarks show that this simple change allows ~30% speedup on dataset
of real completion queries.

Before

```
-------------------------------------------------------
Benchmark                Time           CPU Iterations
-------------------------------------------------------
DexAdHocQueries    5640321 ns    5640265 ns        120
DexRealQ         939835603 ns  939830296 ns          1
```

After

```
-------------------------------------------------------
Benchmark                Time           CPU Iterations
-------------------------------------------------------
DexAdHocQueries    3452014 ns    3451987 ns        203
DexRealQ         667455912 ns  667455750 ns          1
```

Reviewed by: ilya-biryukov

Differential Revision: https://reviews.llvm.org/D51287

llvm-svn: 340729
2018-08-27 09:47:50 +00:00
Kirill Bobyrev a98961bc84 [clangd] Implement LIMIT iterator
This patch introduces LIMIT iterator, which is very important for
improving the quality of search query. LIMIT iterators can be applied on
top of BOOST iterators to prevent populating query request with a huge
number of low-quality symbols.

Reviewed by: sammccall

Differential Revision: https://reviews.llvm.org/D51029

llvm-svn: 340605
2018-08-24 11:25:43 +00:00
Kirill Bobyrev fc89001cec [clangd] Log memory usage of DexIndex and MemIndex
This patch prints information about built index size estimation to
verbose logs. This is useful for optimizing memory usage of DexIndex and
comparisons with MemIndex.

Reviewed by: sammccall

Differential Revision: https://reviews.llvm.org/D51154

llvm-svn: 340601
2018-08-24 09:12:54 +00:00
Kirill Bobyrev 7413e985ea [clangd] Implement BOOST iterator
This patch introduces BOOST iterator - a substantial block for efficient
and high-quality symbol retrieval. The concept of boosting allows
performing computationally inexpensive scoring on the query side so that
the final (expensive) scoring can only be applied on the items with the
highest preliminary score while eliminating the need to score too many
items.

Reviewed by: ilya-biryukov

Differential Revision: https://reviews.llvm.org/D50970

llvm-svn: 340409
2018-08-22 13:44:15 +00:00
Kirill Bobyrev 14e4700468 [clangd] Cleanup after D50897
The wrong diff that was uploaded to Phabricator was building the wrong
index.

llvm-svn: 340388
2018-08-22 07:17:59 +00:00
Kirill Bobyrev 7a94c918a0 [clangd] Allow using experimental Dex index
This patch adds hidden Clangd flag ("use-dex-index") which replaces
(currently) default `MemIndex` with `DexIndex` for the static index.

Reviewed by: ioeric

Differential Revision: https://reviews.llvm.org/D50897

llvm-svn: 340262
2018-08-21 10:32:27 +00:00
Kirill Bobyrev 870aaf2963 [clangd] DexIndex implementation prototype
This patch is a proof-of-concept Dex index implementation. It has
several flaws, which don't allow replacing static MemIndex yet, such as:

* Not being able to handle queries of small size (less than 3 symbols);
  a way to solve this is generating trigrams of smaller size and having
  such incomplete trigrams in the index structure.
* Speed measurements: while manually editing files in Vim and requesting
  autocompletion gives an impression that the performance is at least
  comparable with the current static index, having actual numbers is
  important because we don't want to hurt the users and roll out slow
  code. Eric (@ioeric) suggested that we should only replace MemIndex as
  soon as we have the evidence that this is not a regression in terms of
  performance. An approach which is likely to be successful here is to
  wait until we have benchmark library in the LLVM core repository, which
  is something I have suggested in the LLVM mailing lists, received
  positive feedback on and started working on. I will add a dependency as
  soon as the suggested patch is out for a review (currently there's at
  least one complication which is being addressed by
  https://github.com/google/benchmark/pull/649). Key performance
  improvements for iterators are sorting by cost and the limit iterator.
* Quality measurements: currently, boosting iterator and two-phase
  lookup stage are not implemented, without these the quality is likely to
  be worse than the current implementation can yield. Measuring quality is
  tricky, but another suggestion in the offline discussion was that the
  drop-in replacement should only happen after Boosting iterators
  implementation (and subsequent query enhancement).

The proposed changes do not affect Clangd functionality or performance,
`DexIndex` is only used in unit tests and not in production code.

Reviewed by: ioeric

Differential Revision: https://reviews.llvm.org/D50337

llvm-svn: 340175
2018-08-20 14:39:32 +00:00
Kirill Bobyrev 6d8bd7f56a [clangd] NFC: Cleanup Dex Iterator comments and simplify tests
Proposed changes:

* Cleanup comments in `clangd/index/dex/Iterator.h`: Vim's `gq`
  formatting added redundant spaces instead of newlines in few
  places
* Few comments in `OrIterator` are wrong
* Use `EXPECT_TRUE(Condition)` instead of
  `EXPECT_THAT(Condition, true)` (same with `EXPECT_FALSE`)
* Don't expose `dump()` method to the public by misplacing
  `private:`

This patch does not affect functionality.

Reviewed by: ioeric

Differential Revision: https://reviews.llvm.org/D50956

llvm-svn: 340157
2018-08-20 09:16:14 +00:00
Kirill Bobyrev 30ffdf42f7 [clangd] Implement TRUE Iterator
This patch introduces TRUE Iterator which efficiently handles posting
lists containing all items within `[0, Size)` range.

Reviewed by: ioeric

Differential Revision: https://reviews.llvm.org/D50955

llvm-svn: 340155
2018-08-20 08:47:30 +00:00
Kirill Bobyrev 51534ab864 [clangd] NFC: Improve Dex Iterators debugging traits
This patch improves `dex::Iterator` string representation by
incorporating the information about the element which is currently being
pointed to by the `DocumentIterator`.

Reviewed by: ioeric

Differential Revision: https://reviews.llvm.org/D50689

llvm-svn: 339877
2018-08-16 13:19:43 +00:00
Kirill Bobyrev 8e35f1e7cb NFC: Enforce good formatting across multiple clang-tools-extra files
This patch improves readability of multiple files in clang-tools-extra
and enforces LLVM Coding Guidelines.

Reviewed by: ioeric

Differential Revision: https://reviews.llvm.org/D50707

llvm-svn: 339687
2018-08-14 16:03:32 +00:00
Simon Pilgrim 7b31fae983 Fix MSVC 'std::min: no matching overloaded function found' error.
llvm-svn: 339557
2018-08-13 12:24:48 +00:00
Kirill Bobyrev ff2dd9095f [clangd] Generate incomplete trigrams for the Dex index
This patch handles trigram generation "short" identifiers and queries.
Trigram generator produces incomplete trigrams for short names so that
the same query iterator API can be used to match symbols which don't
have enough symbols to form a trigram and correctly handle queries which
also are not sufficient for generating a full trigram.

Reviewed by: ioeric

Differential revision: https://reviews.llvm.org/D50517

llvm-svn: 339548
2018-08-13 08:57:06 +00:00
Kirill Bobyrev 0a75766c3d [clangd] Allow consuming limited number of items
This patch modifies `consume` function to allow retrieval of limited
number of symbols. This is the "cheap" implementation of top-level
limiting iterator. In the future we would like to have a complete limit
iterator implementation to insert it into the query subtrees, but in the
meantime this version would be enough for a fully-functional
proof-of-concept Dex implementation.

Reviewers: ioeric, ilya-biryukov

Reviewed by: ioeric

Differential Revision: https://reviews.llvm.org/D50500

llvm-svn: 339426
2018-08-10 11:50:44 +00:00
Kirill Bobyrev a522c1cf86 [clangd] Return Dex Iterators
The original Dex Iterators patch (https://reviews.llvm.org/rL338017)
caused problems for Clang 3.6 and Clang 3.7 due to the compiler bug
which prevented inferring template parameter (`Size`) in create(And|Or)?
functions. It was reverted in https://reviews.llvm.org/rL338054.

In this revision the mentioned helper functions were replaced with
variadic templated versions.

Proposed changes were tested on multiple compiler versions, including
Clang 3.6 which originally caused the failure.

llvm-svn: 338116
2018-07-27 09:54:27 +00:00
Kirill Bobyrev d75b556c56 Revert Clangd Dex Iterators patch
This reverts two revisions:

* https://reviews.llvm.org/rL338017
* https://reviews.llvm.org/rL338028

They caused crash for Clang 3.6 & Clang 3.7 buildbots, it was
reported by Jeremy Morse.

llvm-svn: 338054
2018-07-26 18:25:48 +00:00
Kirill Bobyrev bea258d3d7 [clangd] Proof-of-concept query iterators for Dex symbol index
This patch introduces three essential types of query iterators:
`DocumentIterator`, `AndIterator`, `OrIterator`. It provides a
convenient API for query tree generation and serves as a building block
for the next generation symbol index - Dex. Currently, many
optimizations are missed to improve code readability and to serve as the
reference implementation. Potential improvements are briefly mentioned
in `FIXME`s and will be addressed in the following patches.

Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html

Iterators, their applications and potential extensions are explained in
detail in the design proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj

Reviewers: ioeric, sammccall, ilya-biryukov

Subscribers: cfe-commits, klimek, jfb, mgrang, mgorny, MaskRay, jkorous,
arphaman

Differential Revision: https://reviews.llvm.org/D49546

llvm-svn: 338017
2018-07-26 10:42:31 +00:00
Kirill Bobyrev 5e82f05e7a [clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are

* Trigrams -  these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file

This patch outlines the generic for such token namespaces, but only
implements trigram generation.

The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.

However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).

Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html

The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj

Reviewers: sammccall, ioeric, ilya-biryukovA

Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman

Differential Revision: https://reviews.llvm.org/D49591

llvm-svn: 337901
2018-07-25 10:34:57 +00:00