Summary:
It is producing too much input in non-verbose mode,
i.e. a message per indexed file
Reviewers: sammccall, kadircet
Reviewed By: sammccall
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D56915
llvm-svn: 351563
Summary:
Files without any symbols were never marked as updated during indexing, which resulted in failure while writing shards for these files.
This patch fixes the logic to mark files that are seen for the first time but don't contain any symbols as updated.
Reviewers: ilya-biryukov
Reviewed By: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D56592
llvm-svn: 351170
Summary:
This would save us some memory and disk space:
- Dex usage (261 MB vs 266 MB)
- Disk (75 MB vs 76 MB)
It would save more when we index the main file symbol D55185.
Reviewers: ilya-biryukov
Reviewed By: ilya-biryukov
Subscribers: nridge, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D56314
llvm-svn: 350803
Summary:
Unfortunately, yaml::Input::setCurrentDocument() and yaml::Input::nextDocument() are
internal APIs, the way we use them may cause a nullptr accessing when
processing an empty YAML file.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D56442
llvm-svn: 350633
The new guideline is to qualify with 'llvm::' explicitly both in
'.h' and '.cpp' files. This simplifies moving the code between
header and source files and is easier to keep consistent.
llvm-svn: 350531
Summary:
There were a few different places where we canonicalized paths, each
one had its own flavor. This patch tries to unify them all under one place.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D55818
llvm-svn: 349618
Summary:
Currently, background index rebuilds symbol index on every indexed file,
which can be inefficient. This patch makes it only rebuild symbol index periodically.
As the rebuild no longer happens too often, we could also build more efficient
dex index.
Reviewers: ilya-biryukov, kadircet
Reviewed By: kadircet
Subscribers: dblaikie, MaskRay, jkorous, arphaman, jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D55770
llvm-svn: 349496
Summary:
There was a chance that multiple clangd instances could try to write
same shard, in which case we would get a malformed file most likely. This patch
changes the writing mechanism to first write to a temporary file and then rename
it to fit real destination. Which is guaranteed to be atomic by POSIX.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D55417
llvm-svn: 349348
Summary:
We'll soon have tasks pending for reading shards from disk, we want
them to have normal priority. Because:
- They are not CPU intensive, mostly IO bound.
- Give a good coverage for the project at startup, therefore it is worth
spending some cycles.
- We have only one task per whole CDB rather than one task per file.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D55315
llvm-svn: 349345
Summary:
When there are multiple symbols in the result of a fuzzy find with the
same name, one has to perform an additional query to figure out which of those
symbols are coming from the "interesting" scope. This patch prints the scope in
fuzzy find results to get rid of the second symbol.
Reviewers: hokein
Subscribers: ilya-biryukov, ioeric, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D55705
llvm-svn: 349152
Summary:
When indexing a file which contains an uncompilable error, we will
trigger an assertion failure -- the IndexFileIn data is not set, but we
access them in the backgound index.
Reviewers: kadircet
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D55650
llvm-svn: 349144
Summary:
Partitions include graphs in auto-index so that each shards contains
only part of the include graph related to itself.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D55062
llvm-svn: 348252
Summary:
This is the second part for introducing include hierarchy into index
files produced by clangd. You can see the base patch that introduces structures
and discusses the future of the patches in D54817
Reviewers: ilya-biryukov
Subscribers: mgorny, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D54999
llvm-svn: 348005
Summary:
Currently, there's no way of knowing about header files
using compilation database, since it doesn't contain header files as entries.
Using this information, restoring from cache using compile commands becomes
possible instead of doing directory traversal. Also, we can issue indexing
actions for out-of-date headers even if source files depending on them haven't
changed.
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D54817
llvm-svn: 347669
Summary:
Ownership and configuration:
The auto-index (background index) is maintained by ClangdServer, like Dynamic.
(This means ClangdServer will be able to enqueue preamble indexing in future).
For now it's enabled by a simple boolean flag in ClangdServer::Options, but
we probably want to eventually allow injecting the storage strategy.
New 'sync' command:
In order to meaningfully test the integration (not just unit-test components)
we need a way for tests to ensure the asynchronous index reads/writes occur
before a certain point.
Because these tests and assertions are few, I think exposing an explicit "sync"
command for use in tests is simpler than allowing threading to be completely
disabled in the background index (as we do for TUScheduler).
Bugs:
I fixed a couple of trivial bugs I found while testing, but there's one I can't.
JSONCompilationDatabase::getAllFiles() may return relative paths, and currently
we trigger an assertion that assumes they are absolute.
There's no efficient way to resolve them (you have to retrieve the corresponding
command and then resolve against its directory property). In general I think
this behavior is broken and we should fix it in JSONCompilationDatabase and
require CompilationDatabase::getAllFiles() to be absolute.
Reviewers: kadircet
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D54894
llvm-svn: 347567
Summary:
And add a hidden option to control whether the types are collected.
For experiments, will be removed when expected types implementation
is stabilized.
The index size is almost unchanged, e.g. the YAML index for all clangd
sources increased from 53MB to 54MB.
Reviewers: ioeric, sammccall
Reviewed By: sammccall
Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52274
llvm-svn: 347560
Summary:
Instead of receiving compilation commands, auto-index is triggered by just
filenames to reindex, and gets commands from the global comp DB internally.
This has advantages:
- more of the work can be done asynchronously (fetching compilation commands
upfront can be slow for large CDBs)
- we get access to the CDB which can be used to retrieve interpolated commands
for headers (useful in some cases where the original TU goes away)
- fits nicely with the filename-only change observation from r347297
The interface to GlobalCompilationDatabase gets extended: when retrieving a
compile command, the GCDB can optionally report the project the file belongs to.
This naturally fits together with getCompileCommand: it's hard to implement one
without the other. But because most callers don't care, I've ended up with an
awkward optional-out-param-in-virtual method pattern - maybe there's a better
one.
This is the main missing integration point between ClangdServer and
BackgroundIndex, after this we should be able to add an auto-index flag.
Reviewers: ioeric, kadircet
Subscribers: MaskRay, jkorous, arphaman, cfe-commits, ilya-biryukov
Differential Revision: https://reviews.llvm.org/D54865
llvm-svn: 347538
Summary:
Instead of passing around a list of supported URI schemes in clangd, we
expose an interface to convert a path to URI using any compatible scheme
that has been registered. It favors customized schemes and falls
back to "file" when no other scheme works.
Changes in this patch are:
- URI::create(AbsPath, URISchemes) -> URI::create(AbsPath). The new API finds a
compatible scheme from the registry.
- Remove URISchemes option everywhere (ClangdServer, SymbolCollecter, FileIndex etc).
- Unit tests will use "unittest" by default.
- Move "test" scheme from ClangdLSPServer to ClangdMain.cpp, and only
register the test scheme when lit-test or enable-lit-scheme is set.
(The new flag is added to make lit protocol.test work; I wonder if there
is alternative here.)
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D54800
llvm-svn: 347467
Summary:
Puts the digest of the source file that generated the index into
serialized index and stores them back on load, if exists.
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D54693
llvm-svn: 347235
Summary:
This is our goal. It has a non-zero rick, but so far we haven't see any
collision (externally and internally).
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D54622
llvm-svn: 347044
Summary:
For symbols in global namespace (without any scope), we need to
add global scope "" to the fuzzy request.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D54519
llvm-svn: 346947
Summary:
This would save us 8 bytes per ref, and buy us ~40MB in total
for llvm index (from ~300MB to ~260 MB).
The char pointer must be null-terminated, and llvm::StringSaver
guarantees it.
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53427
llvm-svn: 346852
Summary:
Namespace references is less useful compared with other symbols, and
they contribute large part of the index. This patch drops them.
The number of refs is reduced from 5.4 million to 4.7 million.
| | Before | After |
|file size | 78 MB | 71MB |
|memory | 330MB | 300MB|
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D54202
llvm-svn: 346319
Summary:
This allows us to deduplicate header symbols across TUs. File digests
are collects when collecting symbols/refs. And the index store deduplicates
file symbols based on the file digest.
Reviewers: sammccall, hokein
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53433
llvm-svn: 346221
Summary:
The goal is 8 bytes, which has a nonzero risk of collisions with huge indexes.
This patch should shake out any issues with truncation at all, we can lower
further later.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53587
llvm-svn: 345113
Standardize on the most common namespace setup in our *.cpp files:
using namespace llvm;
namespace clang {
namespace clangd {
void foo(StringRef) { ... }
And remove redundant llvm:: qualifiers. (Except for cases like
make_unique where this causes problems with std:: and ADL).
This choice is pretty arbitrary, but some broad consistency is nice.
This is going to conflict with everything. Sorry :-/
Squash the other configurations:
A)
using namespace llvm;
using namespace clang;
using namespace clangd;
void clangd::foo(StringRef);
This is in some of the older files. (It prevents accidentally defining a
new function instead of one in the header file, for what that's worth).
B)
namespace clang {
namespace clangd {
void foo(llvm::StringRef) { ... }
This is fine, but in practice the using directive often gets added over time.
C)
namespace clang {
namespace clangd {
using namespace llvm; // inside the namespace
This was pretty common, but is a bit misleading: name lookup preferrs
clang::clangd::foo > clang::foo > llvm:: foo (no matter where the using
directive is).
llvm-svn: 344850
That revision changed integer members to bitfields; the integers were
default initialized before and the bitfields lost that default
initialization. This started causing msan use-of-uninitialized memory in
clangd tests.
llvm-svn: 344773
Summary:
The RefSlab::size can easily cause confusions, it returns the number of
different symbols, rahter than the number of all references.
- add numRefs() method and cache it, since calculating it everytime is nontrivial.
- clear misused places.
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53389
llvm-svn: 344745
Summary:
These are often not expected to be used directly e.g.
```
TEST_F(Fixture, X) {
^ // "Fixture_X_Test" expanded in the macro should be down ranked.
}
```
Only doing this for sema for now, as such symbols are mostly coming from sema
e.g. gtest macros expanded in the main file. We could also add a similar field
for the index symbol.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53374
llvm-svn: 344736
Summary:
This would buy us more memory. Using a 32-bits integer is enough for
most human-readable source code (up to 4M lines and 4K columns).
Previsouly, we used 8 bytes for a position, now 4 bytes, it would save
us 8 bytes for each Ref and each Symbol instance.
For LLVM-project binary index file, we save ~13% memory.
| Before | After |
| 412MB | 355MB |
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53363
llvm-svn: 344735
Summary:
Add a flag to SymbolCollector to collect refs fdrom headers.
Note that we collect refs from headers in static index, and we don't do it for
dynamic index because of the preamble (we skip function body in preamble,
collecting it will result incomplete results).
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53322
llvm-svn: 344678
Summary:
One relatively boring bug: forgot to notify the CV after enqueue.
One much more fun bug: the thread member could access instance variables before
they were initialized. Although the thread was last in the init list, QueueCV
etc were listed after Thread in the class, so their default constructors raced
with the thread itself.
We have to get very unlucky to lose this race, I saw it 0.02% of the time.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D53313
llvm-svn: 344595
Summary:
Reuse the old -use-dex-index experiment flag for this.
To avoid breaking the tests, make Dex deduplicate symbols, addressing an old FIXME.
Reviewers: hokein
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53288
llvm-svn: 344594
Summary:
See tinyurl.com/clangd-automatic-index for design and goals.
Lots of limitations to keep this patch smallish, TODOs everywhere:
- no serialization to disk
- no changes to dynamic index, which now has a much simpler job
- no partitioning of symbols by file to avoid duplication of header symbols
- no reindexing of edited files
- only a single worker thread
- compilation database is slurped synchronously (doesn't scale)
- uses memindex, rebuilds after every file (should be dex, periodically)
It's not hooked up to ClangdServer/ClangdLSPServer yet: the layering
isn't clear (it should really be in ClangdServer, but ClangdLSPServer
has all the CDB interactions).
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D53032
llvm-svn: 344513
Summary:
Previously, SymbolCollector postfilters all references at the end to
find all references of interesting symbols.
It was incorrect when indxing main AST where we don't see locations
of symbol declarations and definitions in the main AST (as those are in
preamble AST).
The fix is to do earily check during collecting references.
Reviewers: sammccall
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53273
llvm-svn: 344507
Summary:
The bug being fixed: when a posting list doesn't exist in the index, it
was previously just dropped from the query rather than being treated as
empty. Now that we have the FALSE iterator, we can use it instead.
The query tree logic previously had a bunch of special cases to detect whether
subtrees are empty. Now we just naively build the whole tree, and rely
on the query optimizations to drop the trivial parts.
Finally, there was a bug in trigram generation: the empty query would
generate a single trigram "$$$" instead of no trigrams.
This had no effect (there was no posting list, so the other bug
cancelled it out). But we now have to fix this bug too.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52796
llvm-svn: 343802
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343801
Summary:
This allows inheriting from it, so index() can ga away and allowing
TestTU::index) to be fixed.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52250
llvm-svn: 343780
Summary:
Currently queries like "ab" can match identifiers like a_yellow_bee.
The value of allowing this for exactly one segment but no more seems dubious.
It costs ~3% of overall ram (~9% of posting list ram) and some quality.
Reviewers: ilya-biryukov, ioeric
Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52885
llvm-svn: 343777
Summary:
1) Instead of x$$ for a short-query trigram, just use x
2) Make rules more coherent: prefixes of length 1-2, and first char + next head
3) Fix Dex::fuzzyFind to mark results as incomplete, because
short-trigram rules only yield a subset of results.
Reviewers: ioeric
Subscribers: ilya-biryukov, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52808
llvm-svn: 343775
Summary:
The FALSE iterator will be used in a followup patch to fix a logic bug in Dex
(currently, tokens that don't have posting lists in the index are simply dropped
from the query, changing semantics).
It can usually be optimized away, so added the following opmitizations:
- simplify booleans inside AND/OR
- replace effectively-empty AND/OR with booleans
- flatten nested AND/ORs
While working on this, found a bug in the AND iterator: its constructor sync()
assumes that ReachedEnd is set if applicable, but the constructor never sets it.
This crashes if a non-first iterator is nonempty.
Reviewers: ilya-biryukov
Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52789
llvm-svn: 343774
Summary:
It's slow, and the open-source reduce implementation doesn't scale properly.
While here, tidy up some dead headers and comments.
Reviewers: kadircet
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D52517
llvm-svn: 343759
Declaring a field with the same name as a type causes GCC to error out:
Dex.h:104:10: error: declaration of 'clang::clangd::dex::Corpus clang::clangd::dex::Dex::Corpus' [-fpermissive]
Corpus Corpus;
^
Iterator.h:127:7: error: changes meaning of 'Corpus' from 'class clang::clangd::dex::Corpus' [-fpermissive]
class Corpus {
llvm-svn: 343610
Summary:
- Corpus avoids having to pass size to the true iterator, and (soon) any
iterator that might optimize down to true.
- Shorten names of factory functions now they're scoped to the Corpus.
intersect() and unionOf() rather than createAnd() or createOr() as this
seems to read better to me, and fits with other short names. Opinion wanted!
- DEFAULT_BOOST_SCORE --> 1. This is a multiplier, don't obfuscate identity.
- Simplify variadic templates in Iterator.h
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52711
llvm-svn: 343589
Summary:
This makes it suitable for logging (which immediately found a bug, to
be fixed in the next patch...)
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52715
llvm-svn: 343580
Summary:
When no scope qualifier is specified, allow completing index symbols
from any scope and insert proper automatically. This is still experimental and
hidden behind a flag.
Things missing:
- Scope proximity based scoring.
- FuzzyFind supports weighted scopes.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: kbobyrev, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52364
llvm-svn: 343248
* With the current implementation, `sizeof(std::vector<Chunk>)` is added
twice to the `Dex` memory estimate which is incorrect
* `Dex` logs memory usage estimation before `BackingDataSize` is set and
hence the log report excludes size of the external `SymbolSlab` which is
coupled with `Dex` instance
Reviewed By: ioeric
Differential Revision: https://reviews.llvm.org/D52503
llvm-svn: 343117
Because `PostingList` objects are compressed, it is now impossible to
see elements other than the current one and the documentation doesn't
match implementation anymore.
Reviewed By: ioeric
Differential Revision: https://reviews.llvm.org/D52545
llvm-svn: 343116
Summary: Soon we can drop support for MR-via-YAML.
I need to modify some out-of-tree versions to use the library, first.
Reviewers: kadircet
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D52465
llvm-svn: 343019
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
For consistency, functional-style code pieces are replaced with their
simple counterparts to improve readability.
Also, file headers are fixed to comply with LLVM Coding Standards.
`static` member of anonymous namespace is not marked `static` anymore,
because it is redundant.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D52466
llvm-svn: 342974
This patch implements Variable-length Byte compression of `PostingList`s
to sacrifice some performance for lower memory consumption.
`PostingList` compression and decompression was extensively tested using
fuzzer for multiple hours and runnning significant number of realistic
`FuzzyFindRequests`. AddressSanitizer and UndefinedBehaviorSanitizer
were used to ensure the correct behaviour.
Performance evaluation was conducted with recent LLVM symbol index (292k
symbols) and the collection of user-recorded queries (7751
`FuzzyFindRequest` JSON dumps):
| Metrics | Before| After | Change (%)
| ----- | ----- | ----- | -----
| Memory consumption (posting lists only), MB | 54.4 | 23.5 | -60%
| Time to process queries, sec | 7.70 | 9.4 | +25%
Reviewers: sammccall, ioeric
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D52300
llvm-svn: 342965
`Dex` should utilize `FuzzyFindRequest.RestrictForCodeCompletion` flags
and omit symbols not meant for code completion when asked for it.
The measurements below were conducted with setting
`FuzzyFindRequest.RestrictForCodeCompletion` to `true` (so that it's
more realistic). Sadly, the average latency goes down, I suspect that is
mostly because of the empty queries where the number of posting lists is
critical.
| Metrics | Before | After | Relative difference
| ----- | ----- | ----- | -----
| Cumulative query latency (7000 `FuzzyFindRequest`s over LLVM static index) | 6182735043 ns | 7202442053 ns | +16%
| Whole Index size | 81.24 MB | 81.79 MB | +0.6%
Out of 292252 symbols collected from LLVM codebase 136926 appear to be
restricted for code completion.
Reviewers: ioeric
Differential Revision: https://reviews.llvm.org/D52357
llvm-svn: 342866
Summary:
Pros:
o Loading macros from preamble for every completion is slow (see profile).
o Calculating macro USR is also slow (see profile).
o Sema can provide a lot of macro completion results (e.g. when filter is empty,
60k for some large TUs!).
Cons:
o Slight memory increase in dynamic index (~1%).
o Some extra work during preamble build (should be fine as preamble build and
indexAST is way slower).
Before:
{F7195645}
After:
{F7195646}
Reviewers: ilya-biryukov, sammccall
Reviewed By: sammccall
Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52078
llvm-svn: 342529
Summary:
FileIndex now provides explicit interfaces for preamble and main file updates.
This avoids growing parameter list when preamble and main symbols diverge
further (e.g. D52078). This also gets rid of the hack in `indexAST` that
inferred main file index based on `TopLevelDecls`.
Also separate `indexMainDecls` from `indexAST`.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52222
llvm-svn: 342460
Summary:
We can use cl::ResetCommandLineParser() to support different types of
command-lines, as long as we're careful about option lifetimes.
(I tried using subcommands, but the error messages were bad)
I found a mostly-reasonable pattern to isolate the fiddly parts.
Added -scope and -limit flags to the `find` command to demonstrate.
(Note that scope support seems to be broken in dex?)
Fixed symbol lookup to parse symbol IDs.
Caveats:
- with command help (e.g. `find -help`), you also get some spam
about required arguments. This is a bug in llvm::cl, which prints
these to errs() rather than the designated stream.
Reviewers: kbobyrev
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51989
llvm-svn: 342456
This patch abstracts `PostingList` interface and reuses existing
implementation. It will be used later to test different `PostingList`
representations.
No functionality change is introduced, this patch is mostly refactoring
so that the following patches could focus on functionality while not
being too hard to review.
Reviewed By: sammccall, ioeric
Differential Revision: https://reviews.llvm.org/D51982
llvm-svn: 342155
As discussed during D51860 review, it is better to use `llvm::Optional`
here as it has clear semantics which reflect intended behavior.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D52028
llvm-svn: 342138
JSON (de)serialization of `FuzzyFindRequest` might be useful for both
D51090 and D51628. Also, this allows precise logging of the fuzzy find
requests.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D51852
llvm-svn: 341802
Currently, `SymbolIndex::estimateMemoryUsage()` returns the "overhead"
estimate, i.e. the estimate of the Index data structure excluding
backing data (such as Symbol Slab and Reference Slab). This patch
propagates information about paired data size where necessary.
Reviewed By: ioeric, sammccall
Differential Revision: https://reviews.llvm.org/D51539
llvm-svn: 341800
If the current element is already beyond advanceTo()'s DocID, just
return instead of doing binary search. This simple optimization saves up
to 6-7% performance,
Reviewed By: ilya-biryukov
Differential Revision: https://reviews.llvm.org/D51802
llvm-svn: 341781
This patch sets URI schemes of Dex to SymbolCollector's default schemes
in case callers tried to pass empty list of schemes. This was the case
for initialization in Clangd main and was a reason of incorrect
behavior.
Also, it fixes a bug with missed `continue;` after spotting invalid URI
scheme conversion.
llvm-svn: 341552
Quality.cpp defines a structure for convenient storage of Top N items,
it should be used instead of the `std::priority_queue` with slightly
obscure semantics.
This patch does not affect functionality.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D51676
llvm-svn: 341544
This patch introduces `PathURI` Search Token kind and utilizes it to
uprank symbols which are defined in files with small distance to the
directory where the fuzzy find request is coming from (e.g. files user
is editing).
Reviewed By: ioeric
Reviewers: ioeric, sammccall
Differential Revision: https://reviews.llvm.org/D51481
llvm-svn: 341542
Summary:
Like D51475 but simplified based on recent patches.
While here, clarify that loadIndex() takes a filename, not file content.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51638
llvm-svn: 341376
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
`buildStaticIndex()` is used by two other tools that I'm building, now
it's useful outside of `tool/ClangdMain.cpp`.
Also, slightly refactor the code while moving it to the different source
file.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D51626
llvm-svn: 341369
Summary:
A few things that I noticed while merging the SwapIndex patch:
- SymbolOccurrences and particularly SymbolOccurrenceSlab are unwieldy names,
and these names appear *a lot*. Ref, RefSlab, etc seem clear enough
and read/format much better.
- The asymmetry between SymbolSlab and RefSlab (build() vs freeze()) is
confusing and irritating, and doesn't even save much code.
Avoiding RefSlab::Builder was my idea, but it was a bad one; add it.
- DenseMap<SymbolID, ArrayRef<Ref>> seems like a reasonable compromise for
constructing MemIndex - and means many less wasted allocations than the
current DenseMap<SymbolID, vector<Ref*>> for FileIndex, and none for
slabs.
- RefSlab::find() is not actually used for anything, so we can throw
away the DenseMap and keep the representation much more compact.
- A few naming/consistency fixes: e.g. Slabs,Refs -> Symbols,Refs.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51605
llvm-svn: 341368
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
Summary:
Currently, a symbol can have only one #include header attached, which
might not work well if the symbol can be imported via different #includes depending
on where it's used. This patch stores multiple #include headers (with # references)
for each symbol, so that CodeCompletion can decide which include to insert.
In this patch, code completion simply picks the most popular include as the default inserted header. We also return all possible includes and their edits in the `CodeCompletion` results.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: mgrang, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51291
llvm-svn: 341304