Commit Graph

43 Commits

Author SHA1 Message Date
Sam McCall 668ac94ba4 [clangd] Truncate SymbolID to 16 bytes.
Summary:
The goal is 8 bytes, which has a nonzero risk of collisions with huge indexes.
This patch should shake out any issues with truncation at all, we can lower
further later.

Reviewers: ioeric

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D53587

llvm-svn: 345113
2018-10-24 06:58:42 +00:00
Sam McCall c008af6466 [clangd] Namespace style cleanup in cpp files. NFC.
Standardize on the most common namespace setup in our *.cpp files:
  using namespace llvm;
  namespace clang {
  namespace clangd {
  void foo(StringRef) { ... }
And remove redundant llvm:: qualifiers. (Except for cases like
make_unique where this causes problems with std:: and ADL).

This choice is pretty arbitrary, but some broad consistency is nice.
This is going to conflict with everything. Sorry :-/

Squash the other configurations:

A)
  using namespace llvm;
  using namespace clang;
  using namespace clangd;
  void clangd::foo(StringRef);
This is in some of the older files. (It prevents accidentally defining a
new function instead of one in the header file, for what that's worth).

B)
  namespace clang {
  namespace clangd {
  void foo(llvm::StringRef) { ... }
This is fine, but in practice the using directive often gets added over time.

C)
  namespace clang {
  namespace clangd {
  using namespace llvm; // inside the namespace
This was pretty common, but is a bit misleading: name lookup preferrs
clang::clangd::foo > clang::foo > llvm:: foo (no matter where the using
directive is).

llvm-svn: 344850
2018-10-20 15:30:37 +00:00
Haojian Wu 812b6c51c3 [clangd] Remove the overflow log.
Summary:
LLVM codebase has generated files (all are build/Target/XXX/*.inc) that
exceed the MaxLine & MaxColumn. Printing these log would be noisy.

Reviewers: sammccall

Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D53400

llvm-svn: 344777
2018-10-19 08:35:24 +00:00
Haojian Wu 6ece6e7dad [clangd] Clear the semantic of RefSlab::size.
Summary:
The RefSlab::size can easily cause confusions, it returns the number of
different symbols, rahter than the number of all references.

- add numRefs() method and cache it, since calculating it everytime is nontrivial.
- clear misused places.

Reviewers: sammccall

Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D53389

llvm-svn: 344745
2018-10-18 15:33:20 +00:00
Haojian Wu b515fabb3b [clangd] Encode Line/Column as a 32-bits integer.
Summary:
This would buy us more memory. Using a 32-bits integer is enough for
most human-readable source code (up to 4M lines and 4K columns).

Previsouly, we used 8 bytes for a position, now 4 bytes, it would save
us 8 bytes for each Ref and each Symbol instance.

For LLVM-project binary index file, we save ~13% memory.

| Before | After |
| 412MB  | 355MB |

Reviewers: sammccall

Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D53363

llvm-svn: 344735
2018-10-18 10:43:50 +00:00
Jonas Toth 3acdd020b4 [clangd] fix miscompiling lower_bound call
llvm-svn: 344044
2018-10-09 13:24:50 +00:00
Kirill Bobyrev 4a5ff88fdb [clangd] NFC: Migrate to LLVM STLExtras API where possible
This patch improves readability by migrating `std::function(ForwardIt
start, ForwardIt end, ...)` to LLVM's STLExtras range-based equivalent
`llvm::function(RangeT &&Range, ...)`.

Similar change in Clang: D52576.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D52650

llvm-svn: 343937
2018-10-07 14:49:41 +00:00
Sam McCall 46b5555844 [clangd] Fix error handling for SymbolID parsing (notably YAML and dexp)
llvm-svn: 342505
2018-09-18 19:00:59 +00:00
Kirill Bobyrev e6dd0806c7 [clangd] Cleanup FuzzyFindRequest filtering limit semantics
As discussed during D51860 review, it is better to use `llvm::Optional`
here as it has clear semantics which reflect intended behavior.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D52028

llvm-svn: 342138
2018-09-13 14:27:03 +00:00
Kirill Bobyrev 0dee397e06 [clangd] NFC: Use uint32_t for FuzzyFindRequest limits
Reviewed By: ioeric

Differential Revision: https://reviews.llvm.org/D51860

llvm-svn: 341921
2018-09-11 10:31:38 +00:00
Kirill Bobyrev 5faf8a3d84 [clangd] Unbreak buildbots after r341802
Solution: use std::move when returning result from toJSON(...).
llvm-svn: 341832
2018-09-10 14:31:38 +00:00
Kirill Bobyrev 09f00dcf69 [clangd] Implement FuzzyFindRequest JSON (de)serialization
JSON (de)serialization of `FuzzyFindRequest` might be useful for both
D51090 and D51628. Also, this allows precise logging of the fuzzy find
requests.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D51852

llvm-svn: 341802
2018-09-10 11:51:05 +00:00
Eric Liu 6df66001ee [clangd] Add "Deprecated" field to Symbol and CodeCompletion.
Summary: Also set "deprecated" field in LSP CompletionItem.

Reviewers: sammccall, kadircet

Reviewed By: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits

Differential Revision: https://reviews.llvm.org/D51724

llvm-svn: 341576
2018-09-06 18:52:26 +00:00
Sam McCall 50f3631057 [clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
  llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.

The format is a RIFF container (chunks of (type, size, data)) with:
 - a compressed string table
 - simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.

There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.

Alternatives considered:
 - compressed YAML or JSON: bulky and slow to load
 - llvm bitstream: confusing model and libraries are hard to use. My attempt
   produced slightly larger files, and the code was longer and slower.
 - protobuf or similar: would be really nice (esp for back-compat) but the
   dependency is a big hassle
 - ad-hoc binary format without a container: it seems clear we're going
   to add posting lists and occurrences here, and that they will benefit
   from sharing a string table. The container makes it easy to debug
   these pieces in isolation, and make them optional.

Reviewers: ioeric

Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D51585

llvm-svn: 341375
2018-09-04 16:16:50 +00:00
Kirill Bobyrev cc8b507a60 [clangd] NFC: Change quality type to float
Reviewed by: sammccall

Differential Revision: https://reviews.llvm.org/D51636

llvm-svn: 341374
2018-09-04 15:45:56 +00:00
Sam McCall b0138317d6 [clangd] SymbolOccurrences -> Refs and cleanup
Summary:
A few things that I noticed while merging the SwapIndex patch:
 - SymbolOccurrences and particularly SymbolOccurrenceSlab are unwieldy names,
   and these names appear *a lot*. Ref, RefSlab, etc seem clear enough
   and read/format much better.
 - The asymmetry between SymbolSlab and RefSlab (build() vs freeze()) is
   confusing and irritating, and doesn't even save much code.
   Avoiding RefSlab::Builder was my idea, but it was a bad one; add it.
 - DenseMap<SymbolID, ArrayRef<Ref>> seems like a reasonable compromise for
   constructing MemIndex - and means many less wasted allocations than the
   current DenseMap<SymbolID, vector<Ref*>> for FileIndex, and none for
   slabs.
 - RefSlab::find() is not actually used for anything, so we can throw
   away the DenseMap and keep the representation much more compact.
 - A few naming/consistency fixes: e.g. Slabs,Refs -> Symbols,Refs.

Reviewers: ioeric

Subscribers: ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D51605

llvm-svn: 341368
2018-09-04 14:39:56 +00:00
Sam McCall 9c7624e14b [clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.

Old and busted:
 I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
 symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).

New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
 holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
 data alive.
 I update by building a new MemIndex and calling SwapIndex::reset().

Reviewers: kbobyrev, ioeric

Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D51422

llvm-svn: 341318
2018-09-03 14:37:43 +00:00
Eric Liu 83f63e42b2 [clangd] Support multiple #include headers in one symbol.
Summary:
Currently, a symbol can have only one #include header attached, which
might not work well if the symbol can be imported via different #includes depending
on where it's used. This patch stores multiple #include headers (with # references)
for each symbol, so that CodeCompletion can decide which include to insert.

In this patch, code completion simply picks the most popular include as the default inserted header. We also return all possible includes and their edits in the `CodeCompletion` results.

Reviewers: sammccall

Reviewed By: sammccall

Subscribers: mgrang, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D51291

llvm-svn: 341304
2018-09-03 10:18:21 +00:00
Fangrui Song 399943bc76 [clangd] Fix many typos. NFC
llvm-svn: 341273
2018-09-01 07:47:03 +00:00
Sam McCall 2e5700f038 [clangd] Flatten out Symbol::Details. It was ill-conceived, sorry.
Reviewers: ioeric

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D51504

llvm-svn: 341211
2018-08-31 13:55:01 +00:00
Haojian Wu d81e3146e3 [clangd] Collect symbol occurrences in SymbolCollector.
SymbolCollector will be used for two cases:
 - collect Symbol type only, used for indexing preamble AST.
 - collect Symbol and SymbolOccurrences, used for indexing main AST.

For finding local references from the AST, we will implement it in other ways.

llvm-svn: 341208
2018-08-31 12:54:13 +00:00
Haojian Wu 931b2262d4 [clangd] Simplify the code using UniqueStringSaver, NFC.
llvm-svn: 340161
2018-08-20 09:47:12 +00:00
Sam McCall 4e5742a479 [clangd] Make SymbolOrigin an enum class, rather than a plain enum.
I never intended to define namespace pollution like clangd::AST, clangd::Unknown
etc. Oops!

llvm-svn: 336431
2018-07-06 11:50:49 +00:00
Sam McCall 2161ec7ee2 [clangd] Track origins of symbols (various indexes, Sema).
Summary: Surface it in the completion items C++ API, and when a flag is set.

Reviewers: ioeric

Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D48938

llvm-svn: 336309
2018-07-05 06:20:41 +00:00
Sam McCall a68951e37e [clangd] More precise representation of symbol names/labels in the index.
Summary:
Previously, the strings matched LSP completion pretty closely.
The completion label was a single string, for instance. This made
implementing completion itself easy but makes it hard to use the names
in other way, e.g. pretty-printed name in synthesized
documentation/hover.

It also limits our introspection into completion items, which can only
be as precise as the indexed symbols. This change is a prerequisite to
improvements to overload bundling which need to inspect e.g. signature
structure.

Reviewers: ioeric

Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D48475

llvm-svn: 335360
2018-06-22 16:11:35 +00:00
Sam McCall 032db94ac9 [clangd] Remove FilterText from the index.
Summary:
It's almost always identical to Name, and in fact we never used it (we used name
instead).
The only case where they differ is objc method selectors (foo: vs foo:bar:).
We can live with the latter for both name and filterText, so I've made that
change too.

Reviewers: ioeric

Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D48375

llvm-svn: 335321
2018-06-22 06:41:43 +00:00
Sam McCall dc8abc45d2 [clangd] Incorporate #occurrences in scoring code complete results.
Summary: needs tests

Reviewers: ilya-biryukov

Subscribers: klimek, ioeric, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D46183

llvm-svn: 331457
2018-05-03 14:53:02 +00:00
Haojian Wu cbf20ef6ab [clangd] Add "str()" method to SymbolID.
Summary:
This is a convenient function when we try to get std::string of
SymbolID.

Reviewers: ioeric

Subscribers: klimek, ilya-biryukov, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D46065

llvm-svn: 330835
2018-04-25 15:27:09 +00:00
Haojian Wu 545c02a710 [clangd] Add line and column number to the index symbol.
Summary:
LSP is using Line & column as symbol position, clangd needs to transfer file
offset to Line & column when sending results back to LSP client, which is a high
cost, especially for finding workspace symbol -- we have to read the file
content from disk (if it isn't loaded in memory).

Saving these information in the index will make the clangd life eaiser.

Reviewers: sammccall

Subscribers: klimek, ilya-biryukov, jkorous-apple, ioeric, MaskRay, cfe-commits

Differential Revision: https://reviews.llvm.org/D45513

llvm-svn: 329997
2018-04-13 08:30:39 +00:00
Eric Liu c5105f9e3c [clangd] collect symbol #include & insert #include in global code completion.
Summary:
o Collect suitable #include paths for index symbols. This also does smart mapping
for STL symbols and IWYU pragma (code borrowed from include-fixer).
o For global code completion, add a command for inserting new #include in each code
completion item.

Reviewers: sammccall

Reviewed By: sammccall

Subscribers: klimek, mgorny, ilya-biryukov, jkorous-apple, hintonda, cfe-commits

Differential Revision: https://reviews.llvm.org/D42640

llvm-svn: 325343
2018-02-16 14:15:55 +00:00
Haojian Wu dc02a3d943 [clangd] SymbolLocation only covers symbol name.
Summary:
* Change the offset range to half-open, [start, end).
* Fix a few fixmes.

Reviewers: sammccall

Subscribers: klimek, ilya-biryukov, jkorous-apple, ioeric, cfe-commits

Differential Revision: https://reviews.llvm.org/D43182

llvm-svn: 324992
2018-02-13 09:53:50 +00:00
Sam McCall 6003951c66 [clangd] Collect definitions when indexing.
Within a TU:
 - as now, collect a declaration from the first occurrence of a symbol
   (taking clang's canonical declaration)
 - when we first see a definition occurrence, copy the symbol and add it
Across TUs/sources:
 - mergeSymbol in Merge.h is responsible for combining matching Symbols.
   This covers dynamic/static merges and cross-TU merges in the static index.
 - it prefers declarations from Symbols that have a definition.
 - GlobalSymbolBuilderMain is modified to use mergeSymbol as a reduce step.
Random cleanups (can be pulled out):
 - SymbolFromYAML -> SymbolsFromYAML, new singular SymbolFromYAML added
 - avoid uninit'd SymbolLocations. Add an idiomatic way to check "absent".
 - CanonicalDeclaration (as well as Definition) are mapped as optional in YAML.
 - added operator<< for Symbol & SymbolLocation, for debugging

Reviewers: ioeric, hokein

Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits

Differential Revision: https://reviews.llvm.org/D42942

llvm-svn: 324735
2018-02-09 14:42:01 +00:00
Eric Liu 7f24765912 [clangd] Use URIs in index symbols.
Reviewers: hokein, sammccall

Reviewed By: sammccall

Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits

Differential Revision: https://reviews.llvm.org/D42915

llvm-svn: 324358
2018-02-06 16:10:35 +00:00
Sam McCall e2f43f500a [clangd] Improve const-correctness of Symbol->Detail. NFC
Summary:
This would have caught a bug I wrote in an early version of D42049, where
an index user could overwrite data internal to the index because the Symbol is
not deep-const.

The YAML traits are now a bit more verbose, but separate concerns a bit more
nicely: ArenaPtr can be reused for other similarly-allocated objects, including
scalars etc.

Reviewers: hokein

Subscribers: klimek, ilya-biryukov, cfe-commits, ioeric

Differential Revision: https://reviews.llvm.org/D42059

llvm-svn: 322509
2018-01-15 20:09:09 +00:00
Eric Liu 76f6b44443 [clangd] Add more symbol information for code completion.
Reviewers: hokein, sammccall

Reviewed By: sammccall

Subscribers: klimek, ilya-biryukov, cfe-commits

Differential Revision: https://reviews.llvm.org/D41345

llvm-svn: 322097
2018-01-09 17:32:00 +00:00
Benjamin Kramer 50a967d601 [clangd] Simplify code. No functionality change intended.
llvm-svn: 321523
2017-12-28 14:47:01 +00:00
Sam McCall 4b9bbb378b [clangd] Use Builder for symbol slabs, and use sorted-vector for storage
Summary:
This improves a few things:
 - the insert -> freeze -> read sequence is now enforced/communicated by the
   type system
 - SymbolSlab::const_iterator iterates over symbols, not over id-symbol pairs
 - we avoid permanently storing a second copy of the IDs, and the
   string map's hashtable

The slab size is now down to 21.8MB for the LLVM project.
Of this only 2.7MB is strings, the rest is #symbols * `sizeof(Symbol)`.
`sizeof(Symbol)` is currently 96, which seems too big - I think
SymbolInfo isn't efficiently packed. That's a topic for another patch!

Also added simple API to see the memory usage/#symbols of a slab, since
it seems likely we will continue to care about this.

Reviewers: ilya-biryukov

Subscribers: klimek, mgrang, cfe-commits

Differential Revision: https://reviews.llvm.org/D41506

llvm-svn: 321412
2017-12-23 19:38:03 +00:00
Sam McCall df898cc5ed [clangd] Don't re-hash SymbolID in maps, just use the SHA1 data
llvm-svn: 321302
2017-12-21 20:11:46 +00:00
Sam McCall 6c0d0f5775 [clangd] Index symbols share storage within a slab.
Summary:
Symbols are not self-contained - it's only safe to hand them out if you
guarantee the lifetime of the underlying data.

Before this lands, I'm going to measure the before/after memory usage of the
LLVM index loaded into memory in a single slab.

Reviewers: hokein

Subscribers: klimek, ilya-biryukov, cfe-commits

Differential Revision: https://reviews.llvm.org/D41483

llvm-svn: 321272
2017-12-21 14:58:44 +00:00
Eric Liu b99d5e8b62 [clangd] Put all #includes in one block in clangd source files. NFC
Clang-format categorizes and sorts #includes with style. It doesn't make sense
to manually managing #include blocks.

llvm-svn: 320743
2017-12-14 21:22:03 +00:00
Haojian Wu 56a5fca473 [clangd] Construct SymbolSlab from YAML format.
Summary: This will be used together with D40548 for the global index source (experimental).

Reviewers: sammccall

Reviewed By: sammccall

Subscribers: klimek, mgorny, ilya-biryukov, cfe-commits, ioeric

Differential Revision: https://reviews.llvm.org/D41178

llvm-svn: 320694
2017-12-14 12:17:14 +00:00
Ilya Biryukov 5a85b8e6dd [clangd] clang-format the source code. NFC
llvm-svn: 320577
2017-12-13 12:53:16 +00:00
Haojian Wu 4c1394d67d [clangd] Introduce a "Symbol" class.
Summary:
* The "Symbol" class represents a C++ symbol in the codebase, containing all the
  information of a C++ symbol needed by clangd. clangd will use it in clangd's
  AST/dynamic index and global/static index (code completion and code
  navigation).
* The SymbolCollector (another IndexAction) will be used to recollect the
  symbols when the source file is changed (for ASTIndex), or to generate
  all C++ symbols for the whole project.

In the long term (when index-while-building is ready), clangd should share a
same "Symbol" structure and IndexAction with index-while-building, but
for now we want to have some stuff working in clangd.

Reviewers: ioeric, sammccall, ilya-biryukov, malaperle

Reviewed By: sammccall

Subscribers: malaperle, klimek, mgorny, cfe-commits

Differential Revision: https://reviews.llvm.org/D40897

llvm-svn: 320486
2017-12-12 15:42:10 +00:00