This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
Summary:
The library has graduated from clangd to llvm/Support.
This is a mechanical change to move to the new API and remove the old one.
Main API changes:
- namespace clang::clangd::json --> llvm::json
- json::Expr --> json::Value
- Expr::asString() etc --> Value::getAsString() etc
- unsigned longs need a cast (due to r336541 adding lossless integer support)
Reviewers: ilya-biryukov
Subscribers: mgorny, ioeric, MaskRay, jkorous, omtcyfz, cfe-commits
Differential Revision: https://reviews.llvm.org/D49077
llvm-svn: 336549
Summary:
We now compute a distance from the main file to the symbol header, which
is a weighted count of:
- some number of #include traversals from source file --> included file
- some number of FS traversals from file --> parent directory
- some number of FS traversals from parent directory --> child file/dir
This calculation is performed in the appropriate URI scheme.
This means we'll get some proximity boost from header files in main-file
contexts, even when these are in different directory trees.
This extended file proximity model is not yet incorporated in the index
interface/implementation.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D48441
llvm-svn: 336177
Summary: D46524 (rL332378) introduced a link failure when built with
`-DSHARED_LIB=ON`, which this patch fixes.
Reviewers: ioeric
Subscribers: klimek, mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D46906
llvm-svn: 332438
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
Summary:
This is a basic implementation of the "workspace/symbol" request which is
used to find symbols by a string query. Since this is similar to code completion
in terms of result, this implementation reuses the "fuzzyFind" in order to get
matches. For now, the scoring algorithm is the same as code completion and
improvements could be done in the future.
The index model doesn't contain quite enough symbols for this to cover
common symbols like methods, enum class enumerators, functions in unamed
namespaces, etc. The index model will be augmented separately to achieve this.
Reviewers: sammccall, ilya-biryukov
Reviewed By: sammccall
Subscribers: jkorous, hokein, simark, sammccall, klimek, mgorny, ilya-biryukov, mgrang, jkorous-apple, ioeric, MaskRay, cfe-commits
Differential Revision: https://reviews.llvm.org/D44882
llvm-svn: 330637
Summary: This makes C++/objC not totally broken, without hurting C files too much.
Reviewers: ilya-biryukov
Subscribers: klimek, jkorous-apple, ioeric, cfe-commits
Differential Revision: https://reviews.llvm.org/D45442
llvm-svn: 330418
Summary:
This patch adds support for incremental document syncing, as described
in the LSP spec. The protocol specifies ranges in terms of Position (a
line and a character), and our drafts are stored as plain strings. So I
see two things that may not be super efficient for very large files:
- Converting a Position to an offset (the positionToOffset function)
requires searching for end of lines until we reach the desired line.
- When we update a range, we construct a new string, which implies
copying the whole document.
However, for the typical size of a C++ document and the frequency of
update (at which a user types), it may not be an issue. This patch aims
at getting the basic feature in, and we can always improve it later if
we find it's too slow.
Signed-off-by: Simon Marchi <simon.marchi@ericsson.com>
Reviewers: malaperle, ilya-biryukov
Reviewed By: ilya-biryukov
Subscribers: MaskRay, klimek, mgorny, ilya-biryukov, jkorous-apple, ioeric, cfe-commits
Differential Revision: https://reviews.llvm.org/D44272
llvm-svn: 328500
Summary:
To implement incremental document syncing, we want to verify that the
ranges provided by the front-end are valid. Currently, positionToOffset
deals with invalid Positions by returning 0 or Code.size(), which are
two valid offsets. Instead, return an llvm:Expected<size_t> with an
error if the position is invalid.
According to the LSP, if the character value exceeds the number of
characters of the given line, it should default back to the end of the
line. It makes sense in some contexts to have this behavior, and does
not in other contexts. The AllowColumnsBeyondLineLength parameter
allows to decide what to do in that case, default back to the end of the
line, or return an error.
Reviewers: ilya-biryukov
Subscribers: klimek, ilya-biryukov, jkorous-apple, ioeric, cfe-commits
Differential Revision: https://reviews.llvm.org/D44673
llvm-svn: 328100
Summary:
D42640 adds calls to `Preprocessor::addCommentHandler` in
`unittests/clangd/SymbolCollectorTests.cpp` and
`clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp` but does not
link `clangLex` library. This causes undefined reference errors when
built with `-DBUILD_SHARED_LIBS=ON`.
Reviewers: ioeric
Subscribers: klimek, mgorny, ilya-biryukov, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D43437
llvm-svn: 325458
Summary:
o Collect suitable #include paths for index symbols. This also does smart mapping
for STL symbols and IWYU pragma (code borrowed from include-fixer).
o For global code completion, add a command for inserting new #include in each code
completion item.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: klimek, mgorny, ilya-biryukov, jkorous-apple, hintonda, cfe-commits
Differential Revision: https://reviews.llvm.org/D42640
llvm-svn: 325343
Summary:
It was deprecated and callback version and is used everywhere.
Only changes to the testing code were needed.
Reviewers: hokein, ioeric, sammccall
Reviewed By: sammccall
Subscribers: mgorny, klimek, jkorous-apple, cfe-commits
Differential Revision: https://reviews.llvm.org/D43068
llvm-svn: 324883
Initially submitted as r324356 and reverted in r324386.
This change additionally contains a fix to crashes of the buildbots.
The source of the crash was undefined behaviour caused by
std::future<> whose std::promise<> was destroyed without calling
set_value().
llvm-svn: 324575
Summary:
In the new threading model clangd creates one thread per file to manage
the AST and one thread to process each of the incoming requests.
The number of actively running threads is bounded by the semaphore to
avoid overloading the system.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: klimek, mgorny, jkorous-apple, ioeric, hintonda, cfe-commits
Differential Revision: https://reviews.llvm.org/D42573
llvm-svn: 324356
Summary:
We now provide an abstraction of Scheduler that abstracts threading
and resource management in ClangdServer.
No changes to behavior are intended with an exception of changed error
messages.
This patch is preliminary work to allow a revamped threading
implementation that will move the threading code out of CppFile.
Reviewers: sammccall, bkramer, jkorous-apple
Reviewed By: sammccall
Subscribers: hokein, mgorny, hintonda, ioeric, jkorous-apple, cfe-commits, klimek
Differential Revision: https://reviews.llvm.org/D42174
llvm-svn: 323851
Summary:
This adds checks that our diagnostics emit correct ranges in a bunch of cases,
as promised in D41118.
The diagnostics-preamble test is also converted and extended to be a little more
precise.
diagnostics.test stays around as the smoke test for this feature.
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, cfe-commits
Differential Revision: https://reviews.llvm.org/D41454
llvm-svn: 323448
Summary: I will replace the existing URI struct in Protocol.h with the new URI and rename FileURI to URI in a followup patch.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: jkorous-apple, klimek, mgorny, ilya-biryukov, cfe-commits
Differential Revision: https://reviews.llvm.org/D41946
llvm-svn: 323101
Summary:
The goal here is again to make it easier to read and write the tests.
I've extracted `parseTextMarker` from CodeCompleteTests into an `Annotations`
class, adding features to it:
- as well as points `^s` it allows ranges `[[...]]`
- multiple points and ranges are supported
- points and ranges may be named: `$name^` and `$name[[...]]`
These features are used for the xrefs tests. This also paves the way for
replacing the lit diagnostics.test with more readable unit tests, using named
ranges.
Alternative considered: `TestSelectionRange` in clang-refactor/TestSupport
Main problems were:
- delimiting the end of ranges is awkward, requiring counting
- comment syntax is long and at least as cryptic for most cases
- no separate syntax for point vs range, which keeps xrefs tests concise
- Still need to convert to Position everywhere
- Still need helpers for common case of expecting exactly one point/range
(I'll probably promote the extra `PrintTo`s from some of the core Protocol types
into `operator<<` in `Protocol.h` itself in a separate, prior patch...)
Reviewers: ioeric
Subscribers: klimek, mgorny, ilya-biryukov, cfe-commits
Differential Revision: https://reviews.llvm.org/D41432
llvm-svn: 321184
Summary:
- Moved these functions to SourceCode.h
- added unit tests
- fix off by one in positionToOffset: Offset - 1 in final calculation was wrong
- fixed formatOnType which had an equal and opposite off-by-one
- positionToOffset and offsetToPosition both consistently clamp to beginning/end
of file when input is out of range
- gave variables more descriptive names
- removed windows line ending fixmes where there is nothing to fix
- elaborated on UTF-8 fixmes
This will conflict with Eric's D41281, but in a pretty easy-to-resolve way.
Reviewers: ioeric
Subscribers: klimek, mgorny, ilya-biryukov, cfe-commits
Differential Revision: https://reviews.llvm.org/D41351
llvm-svn: 321073
Summary:
o Index interfaces to support using different index sources (e.g. AST index, global index) for code completion, cross-reference finding etc. This patch focuses on code completion.
The following changes in the original patch has been split out.
o Implement an AST-based index.
o Add an option to replace sema code completion for qualified-id with index-based completion.
o Implement an initial naive code completion index which matches symbols that have the query string as substring.
Reviewers: malaperle, sammccall
Reviewed By: sammccall
Subscribers: hokein, klimek, malaperle, mgorny, ilya-biryukov, cfe-commits
Differential Revision: https://reviews.llvm.org/D40548
llvm-svn: 320688
Summary:
* The "Symbol" class represents a C++ symbol in the codebase, containing all the
information of a C++ symbol needed by clangd. clangd will use it in clangd's
AST/dynamic index and global/static index (code completion and code
navigation).
* The SymbolCollector (another IndexAction) will be used to recollect the
symbols when the source file is changed (for ASTIndex), or to generate
all C++ symbols for the whole project.
In the long term (when index-while-building is ready), clangd should share a
same "Symbol" structure and IndexAction with index-while-building, but
for now we want to have some stuff working in clangd.
Reviewers: ioeric, sammccall, ilya-biryukov, malaperle
Reviewed By: sammccall
Subscribers: malaperle, klimek, mgorny, cfe-commits
Differential Revision: https://reviews.llvm.org/D40897
llvm-svn: 320486
Summary:
It will be used to pass around things like Logger and Tracer throughout
clangd classes.
Reviewers: sammccall, ioeric, hokein, bkramer
Reviewed By: sammccall
Subscribers: klimek, bkramer, mgorny, cfe-commits
Differential Revision: https://reviews.llvm.org/D40485
llvm-svn: 320468
We currently use target_link_libraries without an explicit scope
specifier (INTERFACE, PRIVATE or PUBLIC) when linking executables.
Dependencies added in this way apply to both the target and its
dependencies, i.e. they become part of the executable's link interface
and are transitive.
Transitive dependencies generally don't make sense for executables,
since you wouldn't normally be linking against an executable. This also
causes issues for generating install export files when using
LLVM_DISTRIBUTION_COMPONENTS. For example, clang has a lot of LLVM
library dependencies, which are currently added as interface
dependencies. If clang is in the distribution components but the LLVM
libraries it depends on aren't (which is a perfectly legitimate use case
if the LLVM libraries are being built static and there are therefore no
run-time dependencies on them), CMake will complain about the LLVM
libraries not being in export set when attempting to generate the
install export file for clang. This is reasonable behavior on CMake's
part, and the right thing is for LLVM's build system to explicitly use
PRIVATE dependencies for executables.
Unfortunately, CMake doesn't allow you to mix and match the keyword and
non-keyword target_link_libraries signatures for a single target; i.e.,
if a single call to target_link_libraries for a particular target uses
one of the INTERFACE, PRIVATE, or PUBLIC keywords, all other calls must
also be updated to use those keywords. This means we must do this change
in a single shot. I also fully expect to have missed some instances; I
tested by enabling all the projects in the monorepo (except dragonegg),
and configuring both with and without shared libraries, on both Darwin
and Linux, but I'm planning to rely on the buildbots for other
configurations (since it should be pretty easy to fix those).
Even after this change, we still have a lot of target_link_libraries
calls that don't specify a scope keyword, mostly for shared libraries.
I'm thinking about addressing those in a follow-up, but that's a
separate change IMO.
Differential Revision: https://reviews.llvm.org/D40823
llvm-svn: 319840
Summary:
Common parts are mostly FS related, so pulled out TestFS.h for the common stuff.
Deliberately resisted cleaning up much here, so this is pretty mechanical.
Reviewers: hokein
Subscribers: klimek, mgorny, ilya-biryukov, cfe-commits
Differential Revision: https://reviews.llvm.org/D40784
llvm-svn: 319741
Summary:
This will be used for rescoring code completion results based on partial
identifiers.
Short-term use:
- we want to limit the number of code completion results returned to
improve performance of global completion. The scorer will be used to
rerank the results to return when the user has applied a filter.
Long-term use case:
- ranking of completion results from in-memory index
- merging of completion results from multiple sources (merging usually
works best when done at the component-score level, rescoring the
fuzzy-match quality avoids different backends needing to have
comparable scores)
Reviewers: ilya-biryukov
Subscribers: cfe-commits, mgorny
Differential Revision: https://reviews.llvm.org/D40060
llvm-svn: 319557
Summary:
This form can be created with a nice clang-format-friendly literal syntax,
and gets escaping right. It knows how to call unparse() on our Protocol types.
All the places where we pass around JSON internally now use this type.
Object properties are sorted (stored as std::map) and so serialization is
canonicalized, with optional prettyprinting (triggered by a -pretty flag).
This makes the lit tests much nicer to read and somewhat nicer to debug.
(Unfortunately the completion tests use CHECK-DAG, which only has
line-granularity, so pretty-printing is disabled there. In future we
could make completion ordering deterministic, or switch to unittests).
Compared to the current approach, it has some efficiencies like avoiding copies
of string literals used as object keys, but is probably slower overall.
I think the code/test quality benefits are worth it.
This patch doesn't attempt to do anything about JSON *parsing*.
It takes direction from the proposal in this doc[1], but is limited in scope
and visibility, for now.
I am of half a mind just to use Expr as the target of a parser, and maybe do a
little string deduplication, but not bother with clever memory allocation.
That would be simple, and fast enough for clangd...
[1] https://docs.google.com/document/d/1OEF9IauWwNuSigZzvvbjc1cVS1uGHRyGTXaoy3DjqM4/edit
+cc d0k so he can tell me not to use std::map.
Reviewers: ioeric, malaperle
Subscribers: bkramer, ilya-biryukov, mgorny, klimek
Differential Revision: https://reviews.llvm.org/D39435
llvm-svn: 317486
Summary:
This lets you visualize clangd's activity on different threads over time,
and understand critical paths of requests and object lifetimes.
The data produced can be visualized in Chrome (at chrome://tracing), or
in a standalone copy of catapult (http://github.com/catapult-project/catapult)
This patch consists of:
- a command line flag "-trace" that causes clangd to emit JSON trace data
- an API (in Trace.h) allowing clangd code to easily add events to the stream
- several initial uses of this API to capture JSON-RPC requests, builds, logs
Example result: https://photos.app.goo.gl/12L9swaz5REGQ1rm1
Caveats:
- JSON serialization is ad-hoc (isn't it everywhere?) so the API is
limited to naming events rather than attaching arbitrary metadata.
I'd like to fix this (I think we could use a JSON-object abstraction).
- The recording is very naive: events are written immediately by
locking a mutex. Contention on the mutex might disturb performance.
- For now it just traces instants or spans on the current thread.
There are other things that make sense to show (cross-thread flows,
non-thread resources such as ASTs). But we have to start somewhere.
Reviewers: ioeric, ilya-biryukov
Subscribers: cfe-commits, mgorny
Differential Revision: https://reviews.llvm.org/D39086
llvm-svn: 317193
Summary:
Custom vfs::FileSystem is currently used for unit tests.
This revision depends on https://reviews.llvm.org/D33397.
Reviewers: bkramer, krasimir
Reviewed By: bkramer, krasimir
Subscribers: klimek, cfe-commits, mgorny
Differential Revision: https://reviews.llvm.org/D33416
llvm-svn: 303977