llvm-project

Commit Graph

Author	SHA1	Message	Date
Sam McCall	96f2489557	[clangd] Optionally use dex for the preamble parts of the dynamic index. Summary: Reuse the old -use-dex-index experiment flag for this. To avoid breaking the tests, make Dex deduplicate symbols, addressing an old FIXME. Reviewers: hokein Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D53288 llvm-svn: 344594	2018-10-16 08:53:52 +00:00
Sam McCall	bc8aee15a2	[clangd] Revert include path change in Dexp. NFC llvm-svn: 344533	2018-10-15 16:47:45 +00:00
Haojian Wu	397704ca40	[clangd] Add createIndex in dexp Summary: This would allow easily injecting our internal customization. Also updates the stale "symbol-collection-file" flag. Reviewers: sammccall Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D53292 llvm-svn: 344521	2018-10-15 15:12:40 +00:00
Haojian Wu	ddec850ceb	[clangd] dump xrefs information in dexp tool. Reviewers: sammccall Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D53019 llvm-svn: 344508	2018-10-15 12:32:49 +00:00
Kirill Bobyrev	4a5ff88fdb	[clangd] NFC: Migrate to LLVM STLExtras API where possible This patch improves readability by migrating `std::function(ForwardIt start, ForwardIt end, ...)` to LLVM's STLExtras range-based equivalent `llvm::function(RangeT &&Range, ...)`. Similar change in Clang: D52576. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D52650 llvm-svn: 343937	2018-10-07 14:49:41 +00:00
Sam McCall	50b89f0a9b	[clangd] Simplify Dex query tree logic and fix missing-posting-list bug Summary: The bug being fixed: when a posting list doesn't exist in the index, it was previously just dropped from the query rather than being treated as empty. Now that we have the FALSE iterator, we can use it instead. The query tree logic previously had a bunch of special cases to detect whether subtrees are empty. Now we just naively build the whole tree, and rely on the query optimizations to drop the trivial parts. Finally, there was a bug in trigram generation: the empty query would generate a single trigram "$$$" instead of no trigrams. This had no effect (there was no posting list, so the other bug cancelled it out). But we now have to fix this bug too. Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D52796 llvm-svn: 343802	2018-10-04 17:18:55 +00:00
Sam McCall	aa728f1afa	[clangd] Dex: FALSE iterator, peephole optimizations, fix AND bug Summary: The FALSE iterator will be used in a followup patch to fix a logic bug in Dex (currently, tokens that don't have posting lists in the index are simply dropped from the query, changing semantics). It can usually be optimized away, so added the following opmitizations: - simplify booleans inside AND/OR - replace effectively-empty AND/OR with booleans - flatten nested AND/ORs While working on this, found a bug in the AND iterator: its constructor sync() assumes that ReachedEnd is set if applicable, but the constructor never sets it. This crashes if a non-first iterator is nonempty. Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D52789 llvm-svn: 343801	2018-10-04 17:18:49 +00:00
Sam McCall	2ec5a10db3	[clangd] Remove one-segment-skipping from Dex trigrams. Summary: Currently queries like "ab" can match identifiers like a_yellow_bee. The value of allowing this for exactly one segment but no more seems dubious. It costs ~3% of overall ram (~9% of posting list ram) and some quality. Reviewers: ilya-biryukov, ioeric Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D52885 llvm-svn: 343777	2018-10-04 14:08:11 +00:00
Sam McCall	b5bbfef6cd	[cland] Dex: fix/simplify short-trigram generation Summary: 1) Instead of x$$ for a short-query trigram, just use x 2) Make rules more coherent: prefixes of length 1-2, and first char + next head 3) Fix Dex::fuzzyFind to mark results as incomplete, because short-trigram rules only yield a subset of results. Reviewers: ioeric Subscribers: ilya-biryukov, jkorous, mgrang, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D52808 llvm-svn: 343775	2018-10-04 14:01:55 +00:00
Sam McCall	87f69eaf4e	[clangd] Dex: FALSE iterator, peephole optimizations, fix AND bug Summary: The FALSE iterator will be used in a followup patch to fix a logic bug in Dex (currently, tokens that don't have posting lists in the index are simply dropped from the query, changing semantics). It can usually be optimized away, so added the following opmitizations: - simplify booleans inside AND/OR - replace effectively-empty AND/OR with booleans - flatten nested AND/ORs While working on this, found a bug in the AND iterator: its constructor sync() assumes that ReachedEnd is set if applicable, but the constructor never sets it. This crashes if a non-first iterator is nonempty. Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D52789 llvm-svn: 343774	2018-10-04 13:12:23 +00:00
Sam McCall	d9eae39800	[clangd] Support refs() in dex. Largely cloned from MemIndex. Reviewers: hokein Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D52726 llvm-svn: 343760	2018-10-04 09:16:12 +00:00
Sam McCall	a659d779f8	Reland r343589 "[clangd] Dex: add Corpus factory for iterators, rename, fold constant. NFC"" This reverts commit r343610. llvm-svn: 343622	2018-10-02 19:59:23 +00:00
Reid Kleckner	2b5259afb3	Revert r343589 "[clangd] Dex: add Corpus factory for iterators, rename, fold constant. NFC" Declaring a field with the same name as a type causes GCC to error out: Dex.h:104:10: error: declaration of 'clang::clangd::dex::Corpus clang::clangd::dex::Dex::Corpus' [-fpermissive] Corpus Corpus; ^ Iterator.h:127:7: error: changes meaning of 'Corpus' from 'class clang::clangd::dex::Corpus' [-fpermissive] class Corpus { llvm-svn: 343610	2018-10-02 17:31:43 +00:00
Sam McCall	51be55d0ec	[clangd] Zap TODONEs llvm-svn: 343590	2018-10-02 13:51:43 +00:00
Sam McCall	a1e7385d5c	[clangd] Dex: add Corpus factory for iterators, rename, fold constant. NFC Summary: - Corpus avoids having to pass size to the true iterator, and (soon) any iterator that might optimize down to true. - Shorten names of factory functions now they're scoped to the Corpus. intersect() and unionOf() rather than createAnd() or createOr() as this seems to read better to me, and fits with other short names. Opinion wanted! - DEFAULT_BOOST_SCORE --> 1. This is a multiplier, don't obfuscate identity. - Simplify variadic templates in Iterator.h Reviewers: ioeric Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D52711 llvm-svn: 343589	2018-10-02 13:44:26 +00:00
Sam McCall	7402836042	[clangd] Dex iterator printer shows query structure, not iterator state. Summary: This makes it suitable for logging (which immediately found a bug, to be fixed in the next patch...) Reviewers: ioeric Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D52715 llvm-svn: 343580	2018-10-02 11:51:36 +00:00
Sam McCall	329fc143fd	[clangd] Query dex index using query-style trigrams, not identifier-style trigrams llvm-svn: 343453	2018-10-01 10:42:51 +00:00
Eric Liu	670c147d83	[clangd] Initial supoprt for cross-namespace global code completion. Summary: When no scope qualifier is specified, allow completing index symbols from any scope and insert proper automatically. This is still experimental and hidden behind a flag. Things missing: - Scope proximity based scoring. - FuzzyFind supports weighted scopes. Reviewers: sammccall Reviewed By: sammccall Subscribers: kbobyrev, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D52364 llvm-svn: 343248	2018-09-27 18:46:00 +00:00
Eric Liu	ee7fe93fa8	[clangd] Add more tracing to index queries. NFC Reviewers: sammccall Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D52611 llvm-svn: 343247	2018-09-27 18:23:23 +00:00
Kirill Bobyrev	ea4f20c6be	[clangd] Fix bugs with incorrect memory estimate report * With the current implementation, `sizeof(std::vector<Chunk>)` is added twice to the `Dex` memory estimate which is incorrect * `Dex` logs memory usage estimation before `BackingDataSize` is set and hence the log report excludes size of the external `SymbolSlab` which is coupled with `Dex` instance Reviewed By: ioeric Differential Revision: https://reviews.llvm.org/D52503 llvm-svn: 343117	2018-09-26 15:06:23 +00:00
Kirill Bobyrev	0cdf629394	[docs] Update PostingList string representation format Because `PostingList` objects are compressed, it is now impossible to see elements other than the current one and the documentation doesn't match implementation anymore. Reviewed By: ioeric Differential Revision: https://reviews.llvm.org/D52545 llvm-svn: 343116	2018-09-26 14:59:49 +00:00
Sam McCall	02d600d267	[clangd] Merge binary + YAML serialization behind a (mostly) common interface. Summary: Interface is in one file, implementation in two as they have little in common. A couple of ad-hoc YAML functions left exposed: - symbol -> YAML I expect to keep for tools like dexp - YAML -> symbol is used for the MR-style indexer, I think we can eliminate this (merge-on-the-fly, else use a different serialization) Reviewers: kbobyrev Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D52453 llvm-svn: 342999	2018-09-25 18:06:43 +00:00
Kirill Bobyrev	d041f8a9d0	[clangd] NFC: Simplify code, enforce LLVM Coding Standards For consistency, functional-style code pieces are replaced with their simple counterparts to improve readability. Also, file headers are fixed to comply with LLVM Coding Standards. `static` member of anonymous namespace is not marked `static` anymore, because it is redundant. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D52466 llvm-svn: 342974	2018-09-25 13:58:48 +00:00
Kirill Bobyrev	69e6388564	[clangd] Fix some buildbots after r342965 Some compilers fail to parse struct default member initializer. llvm-svn: 342970	2018-09-25 13:14:11 +00:00
Kirill Bobyrev	6c2f5bd0f1	[clangd] Implement VByte PostingList compression This patch implements Variable-length Byte compression of `PostingList`s to sacrifice some performance for lower memory consumption. `PostingList` compression and decompression was extensively tested using fuzzer for multiple hours and runnning significant number of realistic `FuzzyFindRequests`. AddressSanitizer and UndefinedBehaviorSanitizer were used to ensure the correct behaviour. Performance evaluation was conducted with recent LLVM symbol index (292k symbols) and the collection of user-recorded queries (7751 `FuzzyFindRequest` JSON dumps): \| Metrics \| Before\| After \| Change (%) \| ----- \| ----- \| ----- \| ----- \| Memory consumption (posting lists only), MB \| 54.4 \| 23.5 \| -60% \| Time to process queries, sec \| 7.70 \| 9.4 \| +25% Reviewers: sammccall, ioeric Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D52300 llvm-svn: 342965	2018-09-25 11:54:51 +00:00
Kirill Bobyrev	94af0612e0	[clangd] Force Dex to respect symbol collector flags `Dex` should utilize `FuzzyFindRequest.RestrictForCodeCompletion` flags and omit symbols not meant for code completion when asked for it. The measurements below were conducted with setting `FuzzyFindRequest.RestrictForCodeCompletion` to `true` (so that it's more realistic). Sadly, the average latency goes down, I suspect that is mostly because of the empty queries where the number of posting lists is critical. \| Metrics \| Before \| After \| Relative difference \| ----- \| ----- \| ----- \| ----- \| Cumulative query latency (7000 `FuzzyFindRequest`s over LLVM static index) \| 6182735043 ns \| 7202442053 ns \| +16% \| Whole Index size \| 81.24 MB \| 81.79 MB \| +0.6% Out of 292252 symbols collected from LLVM codebase 136926 appear to be restricted for code completion. Reviewers: ioeric Differential Revision: https://reviews.llvm.org/D52357 llvm-svn: 342866	2018-09-24 08:45:18 +00:00
Sam McCall	46b5555844	[clangd] Fix error handling for SymbolID parsing (notably YAML and dexp) llvm-svn: 342505	2018-09-18 19:00:59 +00:00
Sam McCall	3bf9b6d920	[clangd] dexp tool uses llvm::cl to parse its flags. Summary: We can use cl::ResetCommandLineParser() to support different types of command-lines, as long as we're careful about option lifetimes. (I tried using subcommands, but the error messages were bad) I found a mostly-reasonable pattern to isolate the fiddly parts. Added -scope and -limit flags to the `find` command to demonstrate. (Note that scope support seems to be broken in dex?) Fixed symbol lookup to parse symbol IDs. Caveats: - with command help (e.g. `find -help`), you also get some spam about required arguments. This is a bug in llvm::cl, which prints these to errs() rather than the designated stream. Reviewers: kbobyrev Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D51989 llvm-svn: 342456	2018-09-18 09:49:57 +00:00
Kirill Bobyrev	249c5864cf	[clangd] Introduce PostingList interface This patch abstracts `PostingList` interface and reuses existing implementation. It will be used later to test different `PostingList` representations. No functionality change is introduced, this patch is mostly refactoring so that the following patches could focus on functionality while not being too hard to review. Reviewed By: sammccall, ioeric Differential Revision: https://reviews.llvm.org/D51982 llvm-svn: 342155	2018-09-13 17:11:03 +00:00
Kirill Bobyrev	bd72b08eb3	[clangd] Fix Dexp build %s/MaxCandidateCount/Limit/g after rL342138. llvm-svn: 342143	2018-09-13 15:35:55 +00:00
Kirill Bobyrev	e6dd0806c7	[clangd] Cleanup FuzzyFindRequest filtering limit semantics As discussed during D51860 review, it is better to use `llvm::Optional` here as it has clear semantics which reflect intended behavior. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D52028 llvm-svn: 342138	2018-09-13 14:27:03 +00:00
Kirill Bobyrev	d9f33b129c	[clangd] Don't create child AND and OR iterators with one posting list `AND( AND( Child ) ... )` -> `AND( Child ... )` `AND( OR( Child ) ... )` -> `AND( Child ... )` This simple optimization results in 5-6% performance improvement in the benchmark with 2000 serialized `FuzzyFindRequest`s. Reviewed By: ilya-biryukov Differential Revision: https://reviews.llvm.org/D52016 llvm-svn: 342124	2018-09-13 10:02:48 +00:00
Heejin Ahn	386d272387	[clangd] Add missing clangBasic target_link_libraries Without this, builds with `-DSHARED_LIB=ON` fail. llvm-svn: 342037	2018-09-12 09:40:13 +00:00
Kirill Bobyrev	e1e19c7b75	[clangd] Implement a Proof-of-Concept tool for symbol index exploration Reviewed By: sammccall, ilya-biryukov Differential Revision: https://reviews.llvm.org/D51628 llvm-svn: 342025	2018-09-12 07:32:54 +00:00
Kirill Bobyrev	38a889c185	[clangd] Add symbol slab size to index memory consumption estimates Currently, `SymbolIndex::estimateMemoryUsage()` returns the "overhead" estimate, i.e. the estimate of the Index data structure excluding backing data (such as Symbol Slab and Reference Slab). This patch propagates information about paired data size where necessary. Reviewed By: ioeric, sammccall Differential Revision: https://reviews.llvm.org/D51539 llvm-svn: 341800	2018-09-10 11:46:07 +00:00
Kirill Bobyrev	5abe478a3d	[clangd] NFC: Rename DexIndex to Dex Also, cleanup some redundant includes. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D51774 llvm-svn: 341784	2018-09-10 08:23:53 +00:00
Kirill Bobyrev	59491a1fa9	[clangd] Make advanceTo() faster on Posting Lists If the current element is already beyond advanceTo()'s DocID, just return instead of doing binary search. This simple optimization saves up to 6-7% performance, Reviewed By: ilya-biryukov Differential Revision: https://reviews.llvm.org/D51802 llvm-svn: 341781	2018-09-10 07:57:28 +00:00
Eric Liu	f76886859f	[clangd] Canonicalize include paths in clangd. Get rid of "../" and "../../". llvm-svn: 341645	2018-09-07 09:40:36 +00:00
Kirill Bobyrev	049b2d4345	[clangd] Fix Dex initialization This patch sets URI schemes of Dex to SymbolCollector's default schemes in case callers tried to pass empty list of schemes. This was the case for initialization in Clangd main and was a reason of incorrect behavior. Also, it fixes a bug with missed `continue;` after spotting invalid URI scheme conversion. llvm-svn: 341552	2018-09-06 15:10:10 +00:00
Kirill Bobyrev	e4ee0213d4	[clangd] NFC: mark single-parameter constructors explicit Code health: prevent implicit conversions to user-defined types. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D51690 llvm-svn: 341543	2018-09-06 13:06:04 +00:00
Kirill Bobyrev	19a9461e5f	[clangd] Implement proximity path boosting for Dex This patch introduces `PathURI` Search Token kind and utilizes it to uprank symbols which are defined in files with small distance to the directory where the fuzzy find request is coming from (e.g. files user is editing). Reviewed By: ioeric Reviewers: ioeric, sammccall Differential Revision: https://reviews.llvm.org/D51481 llvm-svn: 341542	2018-09-06 12:54:43 +00:00
Sam McCall	b0138317d6	[clangd] SymbolOccurrences -> Refs and cleanup Summary: A few things that I noticed while merging the SwapIndex patch: - SymbolOccurrences and particularly SymbolOccurrenceSlab are unwieldy names, and these names appear a lot. Ref, RefSlab, etc seem clear enough and read/format much better. - The asymmetry between SymbolSlab and RefSlab (build() vs freeze()) is confusing and irritating, and doesn't even save much code. Avoiding RefSlab::Builder was my idea, but it was a bad one; add it. - DenseMap<SymbolID, ArrayRef<Ref>> seems like a reasonable compromise for constructing MemIndex - and means many less wasted allocations than the current DenseMap<SymbolID, vector<Ref*>> for FileIndex, and none for slabs. - RefSlab::find() is not actually used for anything, so we can throw away the DenseMap and keep the representation much more compact. - A few naming/consistency fixes: e.g. Slabs,Refs -> Symbols,Refs. Reviewers: ioeric Subscribers: ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D51605 llvm-svn: 341368	2018-09-04 14:39:56 +00:00
Sam McCall	9c7624e14b	[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex. Summary: This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be immutable and focus on their job. Old and busted: I have a MemIndex, which holds a shared_ptr<vector<Symbol>>, which keeps the symbol slab alive. I update by calling build(shared_ptr<vector<Symbol>>). New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which holds a MemIndex, which holds a shared_ptr<void>, which keeps backing data alive. I update by building a new MemIndex and calling SwapIndex::reset(). Reviewers: kbobyrev, ioeric Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D51422 llvm-svn: 341318	2018-09-03 14:37:43 +00:00
Fangrui Song	399943bc76	[clangd] Fix many typos. NFC llvm-svn: 341273	2018-09-01 07:47:03 +00:00
Kirill Bobyrev	493b1627ca	[NFC] Cleanup Dex * Use consistent assertion messages in iterators implementations * Silence a bunch of clang-tidy warnings: use `emplace_back` instead of `push_back` where possible, make sure arguments have the same name in header and implementation file, use for loop over ranges where possible Reviewed by: ioeric Differential Revision: https://reviews.llvm.org/D51528 llvm-svn: 341190	2018-08-31 09:17:02 +00:00
Kirill Bobyrev	a2f146fd9c	[clangd] Remove UB introduced in rL341057 llvm-svn: 341066	2018-08-30 13:30:34 +00:00
Kirill Bobyrev	38bdac5db8	[clangd] Implement iterator cost This patch introduces iterator cost concept to improve the performance of Dex query iterators (mainly, AND iterator). Benchmarks show that the queries become ~10% faster. Before ``` ------------------------------------------------------- Benchmark Time CPU Iteration ------------------------------------------------------- DexAdHocQueries 5883074 ns 5883018 ns 117 DexRealQ 959904457 ns 959898507 ns 1 ``` After ``` ------------------------------------------------------- Benchmark Time CPU Iteration ------------------------------------------------------- DexAdHocQueries 5238403 ns 5238361 ns 130 DexRealQ 873275207 ns 873269453 ns 1 ``` Reviewed by: sammccall Differential Revision: https://reviews.llvm.org/D51310 llvm-svn: 341057	2018-08-30 11:23:58 +00:00
Kirill Bobyrev	b217ddb1bb	[clangd] Use TRUE iterator instead of complete posting list Stop using `$$$` (empty) trigram and generating a posting list with all items. Since TRUE iterator is already implemented and correctly inserted when there are no real trigram posting lists, this is a valid transformation. Benchmarks show that this simple change allows ~30% speedup on dataset of real completion queries. Before ``` ------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------- DexAdHocQueries 5640321 ns 5640265 ns 120 DexRealQ 939835603 ns 939830296 ns 1 ``` After ``` ------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------- DexAdHocQueries 3452014 ns 3451987 ns 203 DexRealQ 667455912 ns 667455750 ns 1 ``` Reviewed by: ilya-biryukov Differential Revision: https://reviews.llvm.org/D51287 llvm-svn: 340729	2018-08-27 09:47:50 +00:00
Kirill Bobyrev	a98961bc84	[clangd] Implement LIMIT iterator This patch introduces LIMIT iterator, which is very important for improving the quality of search query. LIMIT iterators can be applied on top of BOOST iterators to prevent populating query request with a huge number of low-quality symbols. Reviewed by: sammccall Differential Revision: https://reviews.llvm.org/D51029 llvm-svn: 340605	2018-08-24 11:25:43 +00:00
Kirill Bobyrev	fc89001cec	[clangd] Log memory usage of DexIndex and MemIndex This patch prints information about built index size estimation to verbose logs. This is useful for optimizing memory usage of DexIndex and comparisons with MemIndex. Reviewed by: sammccall Differential Revision: https://reviews.llvm.org/D51154 llvm-svn: 340601	2018-08-24 09:12:54 +00:00

1 2

65 Commits