Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
Summary:
The new mode avoids serializing and deserializing YAML.
This results in better performance and less memory usage. Reduce phase
is now almost instant.
The default is to use the old mode going through YAML serialization to
allow migrating MapReduce clients that require the old mode to operate
properly. After we migrate the clients, we can switch the default to
the new mode.
Reviewers: hokein, ioeric, kbobyrev, sammccall
Reviewed By: ioeric
Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51155
llvm-svn: 340600
`global-symbol-builder` help message mentions `-executor=<string>`
option, but doesn't give any example of what the value could be
Assuming the most popular use case to be building the whole project
index, help message should probably give an example of such usage.
Reviewers: ioeric
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D49785
llvm-svn: 338015
Summary: Surface it in the completion items C++ API, and when a flag is set.
Reviewers: ioeric
Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48938
llvm-svn: 336309
Summary:
For example, template parameter might not be resolved in a broken TU,
which can result in wrong USR/SymbolID.
Reviewers: ilya-biryukov
Subscribers: MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48881
llvm-svn: 336252
Summary:
This is a convenient function when we try to get std::string of
SymbolID.
Reviewers: ioeric
Subscribers: klimek, ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D46065
llvm-svn: 330835
Summary:
This is an important ranking signal.
It's off for the dynamic index for now. Correspondingly, tell the index
infrastructure only to report declarations for the dynamic index.
Reviewers: ioeric, hokein
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D44315
llvm-svn: 327275
Summary:
I'm not sure whether there are any principal reasons why it returns raw owning pointer,
or it is just a old code that was not updated post-C++11.
I'm not too sure what testing i should do, because `check-all` is not error clean here for some reason,
but it does not //appear// asif those failures are related to these changes.
This is Clang-tools-extra part.
Clang part is D43779.
Reviewers: klimek, bkramer, alexfh, pcc
Reviewed By: alexfh
Subscribers: ioeric, jkorous-apple, cfe-commits
Tags: #clang, #clang-tools-extra
Differential Revision: https://reviews.llvm.org/D43780
llvm-svn: 326202
Summary:
There are a few implementation options here - alternatives are either both
awkward and inefficient, or really inefficient.
This is at least potentially a hot path when used as a reducer for common
symbols.
(Also fix an unused-var that sneaked in)
Reviewers: ioeric
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D43381
llvm-svn: 325476
Summary:
D42640 adds calls to `Preprocessor::addCommentHandler` in
`unittests/clangd/SymbolCollectorTests.cpp` and
`clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp` but does not
link `clangLex` library. This causes undefined reference errors when
built with `-DBUILD_SHARED_LIBS=ON`.
Reviewers: ioeric
Subscribers: klimek, mgorny, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D43437
llvm-svn: 325458
Summary:
o Collect suitable #include paths for index symbols. This also does smart mapping
for STL symbols and IWYU pragma (code borrowed from include-fixer).
o For global code completion, add a command for inserting new #include in each code
completion item.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: klimek, mgorny, ilya-biryukov, jkorous-apple, hintonda, cfe-commits
Differential Revision: https://reviews.llvm.org/D42640
llvm-svn: 325343
Within a TU:
- as now, collect a declaration from the first occurrence of a symbol
(taking clang's canonical declaration)
- when we first see a definition occurrence, copy the symbol and add it
Across TUs/sources:
- mergeSymbol in Merge.h is responsible for combining matching Symbols.
This covers dynamic/static merges and cross-TU merges in the static index.
- it prefers declarations from Symbols that have a definition.
- GlobalSymbolBuilderMain is modified to use mergeSymbol as a reduce step.
Random cleanups (can be pulled out):
- SymbolFromYAML -> SymbolsFromYAML, new singular SymbolFromYAML added
- avoid uninit'd SymbolLocations. Add an idiomatic way to check "absent".
- CanonicalDeclaration (as well as Definition) are mapped as optional in YAML.
- added operator<< for Symbol & SymbolLocation, for debugging
Reviewers: ioeric, hokein
Subscribers: klimek, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D42942
llvm-svn: 324735
Summary:
o We only collect symbols in namespace or translation unit scopes.
o Add an option to only collect symbols in included headers.
Reviewers: hokein, ilya-biryukov
Reviewed By: hokein, ilya-biryukov
Subscribers: klimek, cfe-commits
Differential Revision: https://reviews.llvm.org/D41823
llvm-svn: 322193
Summary:
This enables more execution modes like standalone and Mapreduce-style execution.
See also D41729
Reviewers: hokein, sammccall
Subscribers: klimek, ilya-biryukov, cfe-commits
Differential Revision: https://reviews.llvm.org/D41730
llvm-svn: 322084
Summary:
This improves a few things:
- the insert -> freeze -> read sequence is now enforced/communicated by the
type system
- SymbolSlab::const_iterator iterates over symbols, not over id-symbol pairs
- we avoid permanently storing a second copy of the IDs, and the
string map's hashtable
The slab size is now down to 21.8MB for the LLVM project.
Of this only 2.7MB is strings, the rest is #symbols * `sizeof(Symbol)`.
`sizeof(Symbol)` is currently 96, which seems too big - I think
SymbolInfo isn't efficiently packed. That's a topic for another patch!
Also added simple API to see the memory usage/#symbols of a slab, since
it seems likely we will continue to care about this.
Reviewers: ilya-biryukov
Subscribers: klimek, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D41506
llvm-svn: 321412
Summary:
The tools is used to generate global symbols for clangd (global code completion),
The format is YAML, which is only for **experiment**.
Usage:
./bin/global-symbol-builder </path/to/llvm-dir> > global-symbols.yaml
TEST:
used the tool to generate global symbols for LLVM (~72MB).
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: klimek, mgorny, ilya-biryukov, cfe-commits
Differential Revision: https://reviews.llvm.org/D41491
llvm-svn: 321358