Summary:
This allows us to deduplicate header symbols across TUs. File digests
are collects when collecting symbols/refs. And the index store deduplicates
file symbols based on the file digest.
Reviewers: sammccall, hokein
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53433
llvm-svn: 346221
Summary:
The goal is 8 bytes, which has a nonzero risk of collisions with huge indexes.
This patch should shake out any issues with truncation at all, we can lower
further later.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53587
llvm-svn: 345113
Standardize on the most common namespace setup in our *.cpp files:
using namespace llvm;
namespace clang {
namespace clangd {
void foo(StringRef) { ... }
And remove redundant llvm:: qualifiers. (Except for cases like
make_unique where this causes problems with std:: and ADL).
This choice is pretty arbitrary, but some broad consistency is nice.
This is going to conflict with everything. Sorry :-/
Squash the other configurations:
A)
using namespace llvm;
using namespace clang;
using namespace clangd;
void clangd::foo(StringRef);
This is in some of the older files. (It prevents accidentally defining a
new function instead of one in the header file, for what that's worth).
B)
namespace clang {
namespace clangd {
void foo(llvm::StringRef) { ... }
This is fine, but in practice the using directive often gets added over time.
C)
namespace clang {
namespace clangd {
using namespace llvm; // inside the namespace
This was pretty common, but is a bit misleading: name lookup preferrs
clang::clangd::foo > clang::foo > llvm:: foo (no matter where the using
directive is).
llvm-svn: 344850
That revision changed integer members to bitfields; the integers were
default initialized before and the bitfields lost that default
initialization. This started causing msan use-of-uninitialized memory in
clangd tests.
llvm-svn: 344773
Summary:
The RefSlab::size can easily cause confusions, it returns the number of
different symbols, rahter than the number of all references.
- add numRefs() method and cache it, since calculating it everytime is nontrivial.
- clear misused places.
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53389
llvm-svn: 344745
Summary:
These are often not expected to be used directly e.g.
```
TEST_F(Fixture, X) {
^ // "Fixture_X_Test" expanded in the macro should be down ranked.
}
```
Only doing this for sema for now, as such symbols are mostly coming from sema
e.g. gtest macros expanded in the main file. We could also add a similar field
for the index symbol.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53374
llvm-svn: 344736
Summary:
This would buy us more memory. Using a 32-bits integer is enough for
most human-readable source code (up to 4M lines and 4K columns).
Previsouly, we used 8 bytes for a position, now 4 bytes, it would save
us 8 bytes for each Ref and each Symbol instance.
For LLVM-project binary index file, we save ~13% memory.
| Before | After |
| 412MB | 355MB |
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53363
llvm-svn: 344735
Summary:
Add a flag to SymbolCollector to collect refs fdrom headers.
Note that we collect refs from headers in static index, and we don't do it for
dynamic index because of the preamble (we skip function body in preamble,
collecting it will result incomplete results).
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53322
llvm-svn: 344678
Summary:
One relatively boring bug: forgot to notify the CV after enqueue.
One much more fun bug: the thread member could access instance variables before
they were initialized. Although the thread was last in the init list, QueueCV
etc were listed after Thread in the class, so their default constructors raced
with the thread itself.
We have to get very unlucky to lose this race, I saw it 0.02% of the time.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D53313
llvm-svn: 344595
Summary:
Reuse the old -use-dex-index experiment flag for this.
To avoid breaking the tests, make Dex deduplicate symbols, addressing an old FIXME.
Reviewers: hokein
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53288
llvm-svn: 344594
Summary:
See tinyurl.com/clangd-automatic-index for design and goals.
Lots of limitations to keep this patch smallish, TODOs everywhere:
- no serialization to disk
- no changes to dynamic index, which now has a much simpler job
- no partitioning of symbols by file to avoid duplication of header symbols
- no reindexing of edited files
- only a single worker thread
- compilation database is slurped synchronously (doesn't scale)
- uses memindex, rebuilds after every file (should be dex, periodically)
It's not hooked up to ClangdServer/ClangdLSPServer yet: the layering
isn't clear (it should really be in ClangdServer, but ClangdLSPServer
has all the CDB interactions).
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D53032
llvm-svn: 344513
Summary:
Previously, SymbolCollector postfilters all references at the end to
find all references of interesting symbols.
It was incorrect when indxing main AST where we don't see locations
of symbol declarations and definitions in the main AST (as those are in
preamble AST).
The fix is to do earily check during collecting references.
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53273
llvm-svn: 344507
Summary:
The bug being fixed: when a posting list doesn't exist in the index, it
was previously just dropped from the query rather than being treated as
empty. Now that we have the FALSE iterator, we can use it instead.
The query tree logic previously had a bunch of special cases to detect whether
subtrees are empty. Now we just naively build the whole tree, and rely
on the query optimizations to drop the trivial parts.
Finally, there was a bug in trigram generation: the empty query would
generate a single trigram "$$$" instead of no trigrams.
This had no effect (there was no posting list, so the other bug
cancelled it out). But we now have to fix this bug too.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52796
llvm-svn: 343802
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343801
Summary:
This allows inheriting from it, so index() can ga away and allowing
TestTU::index) to be fixed.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52250
llvm-svn: 343780
Summary:
Currently queries like "ab" can match identifiers like a_yellow_bee.
The value of allowing this for exactly one segment but no more seems dubious.
It costs ~3% of overall ram (~9% of posting list ram) and some quality.
Reviewers: ilya-biryukov, ioeric
Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52885
llvm-svn: 343777
Summary:
1) Instead of x$$ for a short-query trigram, just use x
2) Make rules more coherent: prefixes of length 1-2, and first char + next head
3) Fix Dex::fuzzyFind to mark results as incomplete, because
short-trigram rules only yield a subset of results.
Reviewers: ioeric
Subscribers: ilya-biryukov, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52808
llvm-svn: 343775
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343774
Summary:
It's slow, and the open-source reduce implementation doesn't scale properly.
While here, tidy up some dead headers and comments.
Reviewers: kadircet
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D52517
llvm-svn: 343759
Declaring a field with the same name as a type causes GCC to error out:
Dex.h:104:10: error: declaration of 'clang::clangd::dex::Corpus clang::clangd::dex::Dex::Corpus' [-fpermissive]
Corpus Corpus;
^
Iterator.h:127:7: error: changes meaning of 'Corpus' from 'class clang::clangd::dex::Corpus' [-fpermissive]
class Corpus {
llvm-svn: 343610
Summary:
- Corpus avoids having to pass size to the true iterator, and (soon) any
iterator that might optimize down to true.
- Shorten names of factory functions now they're scoped to the Corpus.
intersect() and unionOf() rather than createAnd() or createOr() as this
seems to read better to me, and fits with other short names. Opinion wanted!
- DEFAULT_BOOST_SCORE --> 1. This is a multiplier, don't obfuscate identity.
- Simplify variadic templates in Iterator.h
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52711
llvm-svn: 343589
Summary:
This makes it suitable for logging (which immediately found a bug, to
be fixed in the next patch...)
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52715
llvm-svn: 343580
Summary:
When no scope qualifier is specified, allow completing index symbols
from any scope and insert proper automatically. This is still experimental and
hidden behind a flag.
Things missing:
- Scope proximity based scoring.
- FuzzyFind supports weighted scopes.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: kbobyrev, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52364
llvm-svn: 343248
* With the current implementation, `sizeof(std::vector<Chunk>)` is added
twice to the `Dex` memory estimate which is incorrect
* `Dex` logs memory usage estimation before `BackingDataSize` is set and
hence the log report excludes size of the external `SymbolSlab` which is
coupled with `Dex` instance
Reviewed By: ioeric
Differential Revision: https://reviews.llvm.org/D52503
llvm-svn: 343117
Because `PostingList` objects are compressed, it is now impossible to
see elements other than the current one and the documentation doesn't
match implementation anymore.
Reviewed By: ioeric
Differential Revision: https://reviews.llvm.org/D52545
llvm-svn: 343116
Summary: Soon we can drop support for MR-via-YAML.
I need to modify some out-of-tree versions to use the library, first.
Reviewers: kadircet
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D52465
llvm-svn: 343019
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
For consistency, functional-style code pieces are replaced with their
simple counterparts to improve readability.
Also, file headers are fixed to comply with LLVM Coding Standards.
`static` member of anonymous namespace is not marked `static` anymore,
because it is redundant.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D52466
llvm-svn: 342974
This patch implements Variable-length Byte compression of `PostingList`s
to sacrifice some performance for lower memory consumption.
`PostingList` compression and decompression was extensively tested using
fuzzer for multiple hours and runnning significant number of realistic
`FuzzyFindRequests`. AddressSanitizer and UndefinedBehaviorSanitizer
were used to ensure the correct behaviour.
Performance evaluation was conducted with recent LLVM symbol index (292k
symbols) and the collection of user-recorded queries (7751
`FuzzyFindRequest` JSON dumps):
| Metrics | Before| After | Change (%)
| ----- | ----- | ----- | -----
| Memory consumption (posting lists only), MB | 54.4 | 23.5 | -60%
| Time to process queries, sec | 7.70 | 9.4 | +25%
Reviewers: sammccall, ioeric
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D52300
llvm-svn: 342965
`Dex` should utilize `FuzzyFindRequest.RestrictForCodeCompletion` flags
and omit symbols not meant for code completion when asked for it.
The measurements below were conducted with setting
`FuzzyFindRequest.RestrictForCodeCompletion` to `true` (so that it's
more realistic). Sadly, the average latency goes down, I suspect that is
mostly because of the empty queries where the number of posting lists is
critical.
| Metrics | Before | After | Relative difference
| ----- | ----- | ----- | -----
| Cumulative query latency (7000 `FuzzyFindRequest`s over LLVM static index) | 6182735043 ns | 7202442053 ns | +16%
| Whole Index size | 81.24 MB | 81.79 MB | +0.6%
Out of 292252 symbols collected from LLVM codebase 136926 appear to be
restricted for code completion.
Reviewers: ioeric
Differential Revision: https://reviews.llvm.org/D52357
llvm-svn: 342866
Summary:
Pros:
o Loading macros from preamble for every completion is slow (see profile).
o Calculating macro USR is also slow (see profile).
o Sema can provide a lot of macro completion results (e.g. when filter is empty,
60k for some large TUs!).
Cons:
o Slight memory increase in dynamic index (~1%).
o Some extra work during preamble build (should be fine as preamble build and
indexAST is way slower).
Before:
{F7195645}
After:
{F7195646}
Reviewers: ilya-biryukov, sammccall
Reviewed By: sammccall
Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52078
llvm-svn: 342529
Summary:
FileIndex now provides explicit interfaces for preamble and main file updates.
This avoids growing parameter list when preamble and main symbols diverge
further (e.g. D52078). This also gets rid of the hack in `indexAST` that
inferred main file index based on `TopLevelDecls`.
Also separate `indexMainDecls` from `indexAST`.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52222
llvm-svn: 342460
Summary:
We can use cl::ResetCommandLineParser() to support different types of
command-lines, as long as we're careful about option lifetimes.
(I tried using subcommands, but the error messages were bad)
I found a mostly-reasonable pattern to isolate the fiddly parts.
Added -scope and -limit flags to the `find` command to demonstrate.
(Note that scope support seems to be broken in dex?)
Fixed symbol lookup to parse symbol IDs.
Caveats:
- with command help (e.g. `find -help`), you also get some spam
about required arguments. This is a bug in llvm::cl, which prints
these to errs() rather than the designated stream.
Reviewers: kbobyrev
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51989
llvm-svn: 342456
This patch abstracts `PostingList` interface and reuses existing
implementation. It will be used later to test different `PostingList`
representations.
No functionality change is introduced, this patch is mostly refactoring
so that the following patches could focus on functionality while not
being too hard to review.
Reviewed By: sammccall, ioeric
Differential Revision: https://reviews.llvm.org/D51982
llvm-svn: 342155
As discussed during D51860 review, it is better to use `llvm::Optional`
here as it has clear semantics which reflect intended behavior.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D52028
llvm-svn: 342138
JSON (de)serialization of `FuzzyFindRequest` might be useful for both
D51090 and D51628. Also, this allows precise logging of the fuzzy find
requests.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D51852
llvm-svn: 341802
Currently, `SymbolIndex::estimateMemoryUsage()` returns the "overhead"
estimate, i.e. the estimate of the Index data structure excluding
backing data (such as Symbol Slab and Reference Slab). This patch
propagates information about paired data size where necessary.
Reviewed By: ioeric, sammccall
Differential Revision: https://reviews.llvm.org/D51539
llvm-svn: 341800
If the current element is already beyond advanceTo()'s DocID, just
return instead of doing binary search. This simple optimization saves up
to 6-7% performance,
Reviewed By: ilya-biryukov
Differential Revision: https://reviews.llvm.org/D51802
llvm-svn: 341781
This patch sets URI schemes of Dex to SymbolCollector's default schemes
in case callers tried to pass empty list of schemes. This was the case
for initialization in Clangd main and was a reason of incorrect
behavior.
Also, it fixes a bug with missed `continue;` after spotting invalid URI
scheme conversion.
llvm-svn: 341552
Quality.cpp defines a structure for convenient storage of Top N items,
it should be used instead of the `std::priority_queue` with slightly
obscure semantics.
This patch does not affect functionality.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D51676
llvm-svn: 341544
This patch introduces `PathURI` Search Token kind and utilizes it to
uprank symbols which are defined in files with small distance to the
directory where the fuzzy find request is coming from (e.g. files user
is editing).
Reviewed By: ioeric
Reviewers: ioeric, sammccall
Differential Revision: https://reviews.llvm.org/D51481
llvm-svn: 341542
Summary:
Like D51475 but simplified based on recent patches.
While here, clarify that loadIndex() takes a filename, not file content.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51638
llvm-svn: 341376
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
`buildStaticIndex()` is used by two other tools that I'm building, now
it's useful outside of `tool/ClangdMain.cpp`.
Also, slightly refactor the code while moving it to the different source
file.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D51626
llvm-svn: 341369
Summary:
A few things that I noticed while merging the SwapIndex patch:
- SymbolOccurrences and particularly SymbolOccurrenceSlab are unwieldy names,
and these names appear *a lot*. Ref, RefSlab, etc seem clear enough
and read/format much better.
- The asymmetry between SymbolSlab and RefSlab (build() vs freeze()) is
confusing and irritating, and doesn't even save much code.
Avoiding RefSlab::Builder was my idea, but it was a bad one; add it.
- DenseMap<SymbolID, ArrayRef<Ref>> seems like a reasonable compromise for
constructing MemIndex - and means many less wasted allocations than the
current DenseMap<SymbolID, vector<Ref*>> for FileIndex, and none for
slabs.
- RefSlab::find() is not actually used for anything, so we can throw
away the DenseMap and keep the representation much more compact.
- A few naming/consistency fixes: e.g. Slabs,Refs -> Symbols,Refs.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51605
llvm-svn: 341368
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
Summary:
Currently, a symbol can have only one #include header attached, which
might not work well if the symbol can be imported via different #includes depending
on where it's used. This patch stores multiple #include headers (with # references)
for each symbol, so that CodeCompletion can decide which include to insert.
In this patch, code completion simply picks the most popular include as the default inserted header. We also return all possible includes and their edits in the `CodeCompletion` results.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: mgrang, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51291
llvm-svn: 341304
SymbolCollector will be used for two cases:
- collect Symbol type only, used for indexing preamble AST.
- collect Symbol and SymbolOccurrences, used for indexing main AST.
For finding local references from the AST, we will implement it in other ways.
llvm-svn: 341208
* Use consistent assertion messages in iterators implementations
* Silence a bunch of clang-tidy warnings: use `emplace_back` instead of
`push_back` where possible, make sure arguments have the same name in
header and implementation file, use for loop over ranges where possible
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D51528
llvm-svn: 341190
This patch introduces iterator cost concept to improve the performance
of Dex query iterators (mainly, AND iterator). Benchmarks show that the
queries become ~10% faster.
Before
```
-------------------------------------------------------
Benchmark Time CPU Iteration
-------------------------------------------------------
DexAdHocQueries 5883074 ns 5883018 ns 117
DexRealQ 959904457 ns 959898507 ns 1
```
After
```
-------------------------------------------------------
Benchmark Time CPU Iteration
-------------------------------------------------------
DexAdHocQueries 5238403 ns 5238361 ns 130
DexRealQ 873275207 ns 873269453 ns 1
```
Reviewed by: sammccall
Differential Revision: https://reviews.llvm.org/D51310
llvm-svn: 341057
Stop using `$$$` (empty) trigram and generating a posting list with all
items. Since TRUE iterator is already implemented and correctly inserted
when there are no real trigram posting lists, this is a valid
transformation.
Benchmarks show that this simple change allows ~30% speedup on dataset
of real completion queries.
Before
```
-------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------
DexAdHocQueries 5640321 ns 5640265 ns 120
DexRealQ 939835603 ns 939830296 ns 1
```
After
```
-------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------
DexAdHocQueries 3452014 ns 3451987 ns 203
DexRealQ 667455912 ns 667455750 ns 1
```
Reviewed by: ilya-biryukov
Differential Revision: https://reviews.llvm.org/D51287
llvm-svn: 340729
This patch introduces LIMIT iterator, which is very important for
improving the quality of search query. LIMIT iterators can be applied on
top of BOOST iterators to prevent populating query request with a huge
number of low-quality symbols.
Reviewed by: sammccall
Differential Revision: https://reviews.llvm.org/D51029
llvm-svn: 340605
Summary:
For index-based code completion, send an asynchronous speculative index
request, based on the index request for the last code completion on the same
file and the filter text typed before the cursor, before sema code completion
is invoked. This can reduce the code completion latency (by roughly latency of
sema code completion) if the speculative request is the same as the one
generated for the ongoing code completion from sema. As a sequence of code
completions often have the same scopes and proximity paths etc, this should be
effective for a number of code completions.
Trace with speculative index request:{F6997544}
Reviewers: hokein, ilya-biryukov
Reviewed By: ilya-biryukov
Subscribers: javed.absar, jfb, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D50962
llvm-svn: 340604
This patch prints information about built index size estimation to
verbose logs. This is useful for optimizing memory usage of DexIndex and
comparisons with MemIndex.
Reviewed by: sammccall
Differential Revision: https://reviews.llvm.org/D51154
llvm-svn: 340601
This patch introduces BOOST iterator - a substantial block for efficient
and high-quality symbol retrieval. The concept of boosting allows
performing computationally inexpensive scoring on the query side so that
the final (expensive) scoring can only be applied on the items with the
highest preliminary score while eliminating the need to score too many
items.
Reviewed by: ilya-biryukov
Differential Revision: https://reviews.llvm.org/D50970
llvm-svn: 340409
Summary:
It was previously only indexing the preamble decls. The new
implementation will index both the preamble and the main AST and
report both sets of symbols, preferring the ones from the main AST
whenever the symbol is present in both.
The symbols in the main AST slab always store all information
available in the preamble symbols, possibly adding more,
e.g. definition locations.
Reviewers: hokein, ioeric
Reviewed By: ioeric
Subscribers: kadircet, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D50889
llvm-svn: 340404
This patch adds hidden Clangd flag ("use-dex-index") which replaces
(currently) default `MemIndex` with `DexIndex` for the static index.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50897
llvm-svn: 340262
This patch is a proof-of-concept Dex index implementation. It has
several flaws, which don't allow replacing static MemIndex yet, such as:
* Not being able to handle queries of small size (less than 3 symbols);
a way to solve this is generating trigrams of smaller size and having
such incomplete trigrams in the index structure.
* Speed measurements: while manually editing files in Vim and requesting
autocompletion gives an impression that the performance is at least
comparable with the current static index, having actual numbers is
important because we don't want to hurt the users and roll out slow
code. Eric (@ioeric) suggested that we should only replace MemIndex as
soon as we have the evidence that this is not a regression in terms of
performance. An approach which is likely to be successful here is to
wait until we have benchmark library in the LLVM core repository, which
is something I have suggested in the LLVM mailing lists, received
positive feedback on and started working on. I will add a dependency as
soon as the suggested patch is out for a review (currently there's at
least one complication which is being addressed by
https://github.com/google/benchmark/pull/649). Key performance
improvements for iterators are sorting by cost and the limit iterator.
* Quality measurements: currently, boosting iterator and two-phase
lookup stage are not implemented, without these the quality is likely to
be worse than the current implementation can yield. Measuring quality is
tricky, but another suggestion in the offline discussion was that the
drop-in replacement should only happen after Boosting iterators
implementation (and subsequent query enhancement).
The proposed changes do not affect Clangd functionality or performance,
`DexIndex` is only used in unit tests and not in production code.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50337
llvm-svn: 340175
Proposed changes:
* Cleanup comments in `clangd/index/dex/Iterator.h`: Vim's `gq`
formatting added redundant spaces instead of newlines in few
places
* Few comments in `OrIterator` are wrong
* Use `EXPECT_TRUE(Condition)` instead of
`EXPECT_THAT(Condition, true)` (same with `EXPECT_FALSE`)
* Don't expose `dump()` method to the public by misplacing
`private:`
This patch does not affect functionality.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50956
llvm-svn: 340157
This patch improves `dex::Iterator` string representation by
incorporating the information about the element which is currently being
pointed to by the `DocumentIterator`.
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50689
llvm-svn: 339877
This patch handles trigram generation "short" identifiers and queries.
Trigram generator produces incomplete trigrams for short names so that
the same query iterator API can be used to match symbols which don't
have enough symbols to form a trigram and correctly handle queries which
also are not sufficient for generating a full trigram.
Reviewed by: ioeric
Differential revision: https://reviews.llvm.org/D50517
llvm-svn: 339548
This patch modifies `consume` function to allow retrieval of limited
number of symbols. This is the "cheap" implementation of top-level
limiting iterator. In the future we would like to have a complete limit
iterator implementation to insert it into the query subtrees, but in the
meantime this version would be enough for a fully-functional
proof-of-concept Dex implementation.
Reviewers: ioeric, ilya-biryukov
Reviewed by: ioeric
Differential Revision: https://reviews.llvm.org/D50500
llvm-svn: 339426
Summary:
This is the first step of implementing Xrefs in clangd:
- add index interfaces, and related data structures.
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D49658
llvm-svn: 339011
Summary:
The implicit bool conversion could happen superisingly, e.g. when
checking `if (Loc1 == Loc2)`, the compiler will convert SymbolLocation to
bool before comparing (because we don't define operator `==` for SymbolLocation).
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D49657
llvm-svn: 338517
The original Dex Iterators patch (https://reviews.llvm.org/rL338017)
caused problems for Clang 3.6 and Clang 3.7 due to the compiler bug
which prevented inferring template parameter (`Size`) in create(And|Or)?
functions. It was reverted in https://reviews.llvm.org/rL338054.
In this revision the mentioned helper functions were replaced with
variadic templated versions.
Proposed changes were tested on multiple compiler versions, including
Clang 3.6 which originally caused the failure.
llvm-svn: 338116
This patch introduces three essential types of query iterators:
`DocumentIterator`, `AndIterator`, `OrIterator`. It provides a
convenient API for query tree generation and serves as a building block
for the next generation symbol index - Dex. Currently, many
optimizations are missed to improve code readability and to serve as the
reference implementation. Potential improvements are briefly mentioned
in `FIXME`s and will be addressed in the following patches.
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
Iterators, their applications and potential extensions are explained in
detail in the design proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: ioeric, sammccall, ilya-biryukov
Subscribers: cfe-commits, klimek, jfb, mgrang, mgorny, MaskRay, jkorous,
arphaman
Differential Revision: https://reviews.llvm.org/D49546
llvm-svn: 338017
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
Summary:
log() is split into four functions:
- elog()/log()/vlog() have different severity levels, allowing filtering
- dlog() is a lazy macro which uses LLVM_DEBUG - it logs to the logger, but
conditionally based on -debug-only flag and is omitted in release builds
All logging functions use formatv-style format strings now, e.g:
log("Could not resolve URI {0}: {1}", URI, Result.takeError());
Existing log sites have been split between elog/log/vlog by best guess.
This includes a workaround for passing Error to formatv that can be
simplified when D49170 or similar lands.
Subscribers: ilya-biryukov, javed.absar, ioeric, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D49008
llvm-svn: 336785
Summary: This is not enabled in the global-symbol-builder or dynamic index yet.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D49028
llvm-svn: 336553
Summary: Surface it in the completion items C++ API, and when a flag is set.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48938
llvm-svn: 336309
Summary:
Previously, the strings matched LSP completion pretty closely.
The completion label was a single string, for instance. This made
implementing completion itself easy but makes it hard to use the names
in other way, e.g. pretty-printed name in synthesized
documentation/hover.
It also limits our introspection into completion items, which can only
be as precise as the indexed symbols. This change is a prerequisite to
improvements to overload bundling which need to inspect e.g. signature
structure.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48475
llvm-svn: 335360
Summary:
The qualified name can be used to match a completion item to its corresponding
symbol. This can be useful for tools that measure code completion quality.
Qualified names are not precise for identifying symbols; we need to figure out a
better way to identify completion items.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48425
llvm-svn: 335334
Summary:
It's almost always identical to Name, and in fact we never used it (we used name
instead).
The only case where they differ is objc method selectors (foo: vs foo:bar:).
We can live with the latter for both name and filterText, so I've made that
change too.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48375
llvm-svn: 335321
Summary: This allows tools to examine symbols that would be collected in a symbol index. For example, a tool that measures index-based completion quality would be interested in references to symbols that are collected in the index.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48418
llvm-svn: 335218
Summary:
This allows dynamic index to have consistent URI schemes with the
static index which can have customized URI schemes, which would make file
proximity scoring based on URIs easier.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D47931
llvm-svn: 334809
Summary:
This adds more symbols to the index:
- member variables and functions
- enum constants in scoped enums
The code completion behavior should remain intact but workspace symbols should
now provide much more useful symbols.
Other symbols should be considered such as the ones in "main files" (files not
being included) but this can be done separately as this introduces its fair
share of problems.
Signed-off-by: Marc-Andre Laperle <marc-andre.laperle@ericsson.com>
Reviewers: ioeric, sammccall
Reviewed By: ioeric, sammccall
Subscribers: hokein, sammccall, jkorous, klimek, ilya-biryukov, jkorous-apple, ioeric, MaskRay, cfe-commits
Differential Revision: https://reviews.llvm.org/D44954
llvm-svn: 334017
Summary:
These decls are sometime used as the canonical declarations (e.g. for go-to-def),
which seems to be bad.
- friend decls that are not definitions should be ignored for indexing purposes
- this means they should never be selected as canonical decl
- if the friend decl is the only decl, then the symbol should not be indexed
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: mgorny, klimek, ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D47623
llvm-svn: 333885
Summary:
This is more efficient and avoids data races when reading files that
come from the preamble. The staleness can occur when reading a file
from disk that changed after the preamble was built. This can lead to
crashes, e.g. when parsing comments.
We do not to rely on symbols from the main file anyway, since any info
that those provide can always be taken from the AST.
Reviewers: ioeric, sammccall
Reviewed By: ioeric
Subscribers: malaperle, klimek, javed.absar, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D47272
llvm-svn: 333196
Summary:
To fix a crash in code completion that occurrs when reading doc
comments from files that were updated after the preamble was
computed. In that case, the files on disk could've been changed and we
can't rely on finding the comment text with the same range anymore.
The current workaround is to not provide comments from the headers at
all and rely on the dynamic index instead.
A more principled solution would be to store contents of the files
read inside the preamble, but it is way harder to implement properly,
given that it would definitely increase the sizes of the preamble.
Together with D47272, this should fix all preamble-related crashes
we're aware of.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: klimek, ioeric, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D47274
llvm-svn: 333189
Summary:
This assumes that .inc files are supposed to be included via headers
that include them.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: klimek, ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D47187
llvm-svn: 333188
Summary:
Previous implementation used to extract brief text from doxygen comments.
Brief text parsing slows down completion and is not suited for
non-doxygen comments.
This commit switches to providing comments that mimic the ones
originally written in the source code, doing minimal reindenting and
removing the comments markers to make the output more user-friendly.
It means we lose support for doxygen-specific features, e.g. extracting
brief text, but provide useful results for non-doxygen comments.
Switching the doxygen support back is an option, but I suggest to see
whether the current approach gives more useful results.
Reviewers: sammccall, hokein, ioeric
Reviewed By: sammccall
Subscribers: klimek, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D45999
llvm-svn: 332459
Summary:
This uses heuristics to identify private proto symbols. For example,
top-level symbols whose name contains "_" are considered private. These symbols
are not expected to be used by users.
Reviewers: ilya-biryukov, malaperle
Reviewed By: ilya-biryukov
Subscribers: sammccall, klimek, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D46751
llvm-svn: 332456
Summary:
This patch adds index support for GoToDefinition -- when we don't get the
definition from local AST, we query our index (Static&Dynamic) index to
get it.
Since we currently collect top-level symbol in the index, it doesn't support all
cases (e.g. class members), we will extend the index to include more symbols in
the future.
Reviewers: sammccall
Subscribers: klimek, ilya-biryukov, jkorous-apple, ioeric, MaskRay, cfe-commits
Differential Revision: https://reviews.llvm.org/D45717
llvm-svn: 331189
Summary:
This is a convenient function when we try to get std::string of
SymbolID.
Reviewers: ioeric
Subscribers: klimek, ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D46065
llvm-svn: 330835
Summary:
This is a basic implementation of the "workspace/symbol" request which is
used to find symbols by a string query. Since this is similar to code completion
in terms of result, this implementation reuses the "fuzzyFind" in order to get
matches. For now, the scoring algorithm is the same as code completion and
improvements could be done in the future.
The index model doesn't contain quite enough symbols for this to cover
common symbols like methods, enum class enumerators, functions in unamed
namespaces, etc. The index model will be augmented separately to achieve this.
Reviewers: sammccall, ilya-biryukov
Reviewed By: sammccall
Subscribers: jkorous, hokein, simark, sammccall, klimek, mgorny, ilya-biryukov, mgrang, jkorous-apple, ioeric, MaskRay, cfe-commits
Differential Revision: https://reviews.llvm.org/D44882
llvm-svn: 330637
Summary:
Previsouly, class completions items from the index were missing
template parameters in both the snippet and the label.
Reviewers: sammccall, hokein
Reviewed By: sammccall
Subscribers: klimek, jkorous-apple, ioeric, MaskRay, cfe-commits
Differential Revision: https://reviews.llvm.org/D45482
llvm-svn: 330004
Summary:
LSP is using Line & column as symbol position, clangd needs to transfer file
offset to Line & column when sending results back to LSP client, which is a high
cost, especially for finding workspace symbol -- we have to read the file
content from disk (if it isn't loaded in memory).
Saving these information in the index will make the clangd life eaiser.
Reviewers: sammccall
Subscribers: klimek, ilya-biryukov, jkorous-apple, ioeric, MaskRay, cfe-commits
Differential Revision: https://reviews.llvm.org/D45513
llvm-svn: 329997
Summary:
Fix bugs:
- don't count occurrences of decls where we don't spell the name
- findDefinitions at MACRO(^X) goes to the definition of MACRO
Subscribers: klimek, ilya-biryukov, jkorous-apple, ioeric, MaskRay, cfe-commits
Differential Revision: https://reviews.llvm.org/D45356
llvm-svn: 329571
The current code was casting pointer to a misaligned type which is undefined behavior.
Found by compiling with Undefined Behavior Sanitizer and running tests (check-clang-tools).
llvm-svn: 327902
Summary:
Potential use case: argument go-to-definition result with symbol
information (e.g. function definition in cc file) that might not be in the AST.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D44305
llvm-svn: 327487
Summary:
This is an important ranking signal.
It's off for the dynamic index for now. Correspondingly, tell the index
infrastructure only to report declarations for the dynamic index.
Reviewers: ioeric, hokein
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D44315
llvm-svn: 327275
Summary: This also matches the range in symbol index.
Reviewers: sammccall
Subscribers: klimek, ilya-biryukov, jkorous-apple, ioeric, cfe-commits
Differential Revision: https://reviews.llvm.org/D44247
llvm-svn: 327129
Summary:
These have different USRs than the underlying entity, but are not typically
interesting in their own right and can be numerous (e.g. generated traits).
Reviewers: ioeric
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D44298
llvm-svn: 327127
Summary:
Symbols with different canonical includes might be defined in the same header
(e.g. symbols defined in STL <iosfwd>). This patch adds support for mapping from
qualified symbol names to canonical headers and special mapping for symbols in <iosfwd>
Reviewers: sammccall, hokein
Reviewed By: sammccall
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D43869
llvm-svn: 326456
Summary:
Currently, we pick the first declaration of a symbol in a TU, which is considered
canonical in the clangIndex, as the canonical declaration in clangd. This causes
forward declarations that might appear in a random header to be used as a
canonical declaration, which is not desirable for features like go-to-declaration
or include insertion.
For example, for class X, we would consider the forward declaration in fwd.h to
be the canonical declaration, while the preferred canonical declaration should
be the actual definition in x.h.
```
// fwd.h
class X; // forward decl
// x.h
class X {};
```
This patch fixes the issue by making symbol collector favor the actual definition of
a TagDecl (i.e. class/struct/enum/union) found in a header file over the first seen
declarations in a TU. Other symbol types like functions are not handled because
using the first seen declarations as canonical declarations is usually a good
heuristic for them.
Reviewers: sammccall
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D43823
llvm-svn: 326313
Summary:
The new behaviors introduced by this patch:
o When include collection is enabled, we always set IncludeHeader field in Symbol
even if it's the same as FileURI in decl.
o Disable include collection in FileIndex which is currently only used to build
dynamic index. We should revisit when we actually want to use FileIndex to global
index.
o Code-completion only uses IncludeHeader to insert headers but not FileURI in
CanonicalDeclaration. This ensures that inserted headers are always canonicalized.
Note that include insertion can still be triggered for symbols that are already
included if they are merged from dynamic index and static index, but we would
only use includes that are already canonicalized (e.g. from static index).
Reason for change:
Collecting header includes in dynamic index enables inserting includes for headers
that are not indexed but opened in the editor. Comparing to inserting includes for
symbols in global/static index, this is nice-to-have but would probably require
non-trivial amount of work to get right. For example:
o Currently it's not easy to fully support CanonicalIncludes in dynamic index, given the way
we run dynamic index.
o It's also harder to reason about the correctness of include canonicalization for dynamic index
(i.e. symbols in the current file/TU) than static index where symbols are collected
offline and sanity check is possible before shipping to production.
o We have less control/flexibility over symbol info in the dynamic index
(e.g. URIs, path normalization), which could be used to help make decision when inserting includes.
As header collection (especially canonicalization) is relatively new, and enabling
it for dynamic index would immediately affect current users with only dynamic
index support, I propose we disable it for dynamic index for now to avoid
compromising other hot features like code completion and only support it for
static index where include insertion would likely to bring more value.
Reviewers: ilya-biryukov, sammccall, hokein
Subscribers: klimek, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D43550
llvm-svn: 325764
Summary:
o Avoid inserting a header include into the header itself.
o Avoid inserting non-header files (by not indexing symbols in main
files at all).
o Canonicalize include paths for symbols in dynamic index.
Reviewers: sammccall, ilya-biryukov
Reviewed By: ilya-biryukov
Subscribers: klimek, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D43462
llvm-svn: 325523
Summary:
There are a few implementation options here - alternatives are either both
awkward and inefficient, or really inefficient.
This is at least potentially a hot path when used as a reducer for common
symbols.
(Also fix an unused-var that sneaked in)
Reviewers: ioeric
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D43381
llvm-svn: 325476
Summary:
o Collect suitable #include paths for index symbols. This also does smart mapping
for STL symbols and IWYU pragma (code borrowed from include-fixer).
o For global code completion, add a command for inserting new #include in each code
completion item.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: klimek, mgorny, ilya-biryukov, jkorous-apple, hintonda, cfe-commits
Differential Revision: https://reviews.llvm.org/D42640
llvm-svn: 325343
Within a TU:
- as now, collect a declaration from the first occurrence of a symbol
(taking clang's canonical declaration)
- when we first see a definition occurrence, copy the symbol and add it
Across TUs/sources:
- mergeSymbol in Merge.h is responsible for combining matching Symbols.
This covers dynamic/static merges and cross-TU merges in the static index.
- it prefers declarations from Symbols that have a definition.
- GlobalSymbolBuilderMain is modified to use mergeSymbol as a reduce step.
Random cleanups (can be pulled out):
- SymbolFromYAML -> SymbolsFromYAML, new singular SymbolFromYAML added
- avoid uninit'd SymbolLocations. Add an idiomatic way to check "absent".
- CanonicalDeclaration (as well as Definition) are mapped as optional in YAML.
- added operator<< for Symbol & SymbolLocation, for debugging
Reviewers: ioeric, hokein
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D42942
llvm-svn: 324735
Summary:
Some STL symbols are defined in inline namespaces. For example,
```
namespace std {
inline namespace __cxx11 {
typedef ... string;
}
}
```
Currently, this will be `std::__cxx11::string`; however, `std::string` is desired.
Inline namespaces are treated as transparent scopes. This
reflects the way they're most commonly used for lookup. Ideally we'd
include them, but at query time it's hard to find all the inline
namespaces to query: the preamble doesn't have a dedicated list.
Reviewers: sammccall, hokein
Reviewed By: sammccall
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D42796
llvm-svn: 324065
Summary:
Instead of passing Context explicitly around, we now have a thread-local
Context object `Context::current()` which is an implicit argument to
every function.
Most manipulation of this should use the WithContextValue helper, which
augments the current Context to add a single KV pair, and restores the
old context on destruction.
Advantages are:
- less boilerplate in functions that just propagate contexts
- reading most code doesn't require understanding context at all, and
using context as values in fewer places still
- fewer options to pass the "wrong" context when it changes within a
scope (e.g. when using Span)
- contexts pass through interfaces we can't modify, such as VFS
- propagating contexts across threads was slightly tricky (e.g.
copy vs move, no move-init in lambdas), and is now encapsulated in
the threadpool
Disadvantages are all the usual TLS stuff - hidden magic, and
potential for higher memory usage on threads that don't use the
context. (In practice, it's just one pointer)
Reviewers: ilya-biryukov
Subscribers: klimek, jkorous-apple, ioeric, cfe-commits
Differential Revision: https://reviews.llvm.org/D42517
llvm-svn: 323872
Summary:
For symbols defined inside macros:
* use expansion location, if the symbol is formed via macro concatenation.
* use spelling location, otherwise.
This will fix some symbols that have ill-format location (especial invalid filepath).
Reviewers: ioeric
Reviewed By: ioeric
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D42575
llvm-svn: 323867
Summary:
* truncate symbols from static/dynamic index to the limited number
(which would save lots of cost in constructing the merged symbols).
* add an CLI option allowing to limit the number of returned completion results.
(default to 100)
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: klimek, ilya-biryukov, jkorous-apple, ioeric, cfe-commits
Differential Revision: https://reviews.llvm.org/D42484
llvm-svn: 323408