llvm-project

Commit Graph

Author	SHA1	Message	Date
Harald van Dijk	7907c46fe6	Make clangd CompletionModel not depend on directory layout. The current code accounts for two possible layouts, but there is at least a third supported layout: clang-tools-extra may also be checked out as clang/tools/extra with the releases, which was not yet handled. Rather than treating that as a special case, use the location of CompletionModel.cmake to handle all three cases. This should address the problems that prompted D96787 and the problems that prompted the proposed revert D100625. Reviewed By: usaxena95 Differential Revision: https://reviews.llvm.org/D101851	2021-05-05 19:25:34 +01:00
serge-sans-paille	f51ab18716	Make clangd CompletionModel usable even with non-standard (but supported) layout llvm supports specifying a non-standard layout where each project lies in its own place. Do not assume a fixed layout and use the appropriate cmake variable instead. Differential Revision: https://reviews.llvm.org/D96787	2021-03-22 10:05:25 +01:00
Utkarsh Saxena	bf935a034b	[clangd] Make categorical features 64 bit in DecisionForest Model. CodeCompletionContext::Kind has 36 Kinds. The completion model used to support categorical features of 32 cardinality. Due to this clangd tests were failing asan tests due to overflow. This patch makes the completion model support 64 cardinality of categorical features by storing ENUM Features as uint64_t instead of uint32_t. Verified that this fixes the asan failures. Latency: 6.7ms (old) VS 6.8ms (new) per 1000 predictions. Differential Revision: https://reviews.llvm.org/D97770	2021-03-02 16:22:30 +01:00
Sam McCall	7d1b499cae	Revert "[clangd] Extract symbol-scope logic out of Quality, add tests. NFC" On second thought, this can't properly be reused for highlighting. Consider this example, which Quality wants to consider function-scope, but highlighting must consider class-scope: void foo() { class X { int ^y; }; }	2021-01-29 14:59:16 +01:00
Sam McCall	d0817b5f18	[clangd] Extract symbol-scope logic out of Quality, add tests. NFC This prepares for reuse from the semantic highlighting code. There's a bit of yak-shaving here: - when the enum is moved into the clangd namespace, promote it to a scoped enum. This means teaching the decision forest infrastructure to deal with scoped enums. - AccessibleScope isn't quite the right name: e.g. public class members are treated as accessible, but still have class scope. So rename to SymbolScope. - Rename some QualitySignals members to avoid name conflicts. (the string) SymbolScope -> Scope (the enum) Scope -> ScopeKind	2021-01-29 14:44:28 +01:00
Utkarsh Saxena	d5047d762f	[clangd] Update CC Ranking model with better sampling. A better sampling strategy was used to generate the dataset for this model. New signals introduced in this model: - NumNameInContext: Number of words in the context that matches the name of the candidate. - FractionNameInContext: Fraction of the words in context matching the name of the candidate. We remove the signal `IsForbidden` from the model and down rank forbidden signals aggresively. Differential Revision: https://reviews.llvm.org/D94697	2021-01-15 18:13:24 +01:00
Aaron Ballman	45e0f65162	Add a floating-point suffix to silence warnings; NFC This silences about 6000 warnings about truncating from double to float with Visual Studio.	2020-11-04 10:09:51 -05:00
Utkarsh Saxena	a0a6fd435c	[clangd] New CC Ranking Model to fix bad inference due to overflow. Unreachable file distances are represented as `std::numeric_limits<unsigned>::max()`. The previous dataset recorded the signals as `signed int` capturing this default value as `-1`. A new dataset was regenerated and a new model is trained that interprets this unreachable as the intended value. Distribution of `SymbolScopeDistance`: Value Normalised Frequency 0 46.6184 4294967295 29.5342 6 14.5666 4 6.4433 2 1.4534 8 0.5760 10 0.3581 .... Distribution of `FileProximityDistance`: Value Normalised Frequency 4294967295 39.9378 12 5.1997 14 4.9828 15 4.4221 16 4.3820 13 4.2765 17 3.8957 11 3.6387 19 3.4799 18 3.4076 .... Differential Revision: https://reviews.llvm.org/D89035	2020-10-08 15:30:00 +02:00
Utkarsh Saxena	45698ac005	[clangd] Split DecisionForest Evaluate() into one func per tree. This allows us MSAN to instrument this function. Previous version is not instrumentable due to it shear volume. Differential Revision: https://reviews.llvm.org/D88536	2020-10-01 18:07:23 +02:00
Utkarsh Saxena	a9f63d22fa	[clangd] Disable msan instrumentation for generated Evaluate(). MSAN build times out for generated DecisionForest inference runtime. A solution worth trying is splitting the function into 300 smaller functions and then re-enable msan. For now we are disabling instrumentation for the generated function. Differential Revision: https://reviews.llvm.org/D88495	2020-09-29 17:44:10 +02:00
Utkarsh Saxena	b5f7e9e26c	[clangd] Add a trained DecisionForest for code completion. Replaces the dummy CodeCompletion model with a trained DecisionForest model. The features.json needs to be manually curated specifying the features to be used. This is a one-time cost and does not change if the model changes until we decide to add/remove features. Differential Revision: https://reviews.llvm.org/D88071	2020-09-28 18:35:10 +02:00
Utkarsh Saxena	985deba931	Revert "Temporarily Revert "[clangd] Add Random Forest runtime for code completion."" We intend to replace heuristics based code completion ranking with a Decision Forest Model. This patch introduces a format for representing the model and an inference runtime that is code-generated at build time. - Forest.json contains all the trees as an array of trees. - Features.json describes the features to be used. - Codegen file takes the above two files and generates CompletionModel containing Feature struct and corresponding Evaluate function. The Evaluate function maps a feature to a real number describing the relevance of this candidate. - The codegen is part of build system and these files are generated at build time. - Proposes a way to test the generated runtime using a test model. - Replicates the model structure in unittests. - unittest tests both the test model (for correct tree traversal) and the real model (for sanity). This reverts commit `549e55b3d5`.	2020-09-19 10:54:04 +02:00
Eric Christopher	549e55b3d5	Temporarily Revert "[clangd] Add Random Forest runtime for code completion." as a header doesn't appear to have made it into the commit. This reverts commit `9b6765e784` and followup	2020-09-18 14:47:43 -07:00
Nico Weber	807777913e	CompletionModelCodegen: Remove unused import The unused import is 3.4+, so it also breaks py2.7 compat. But this is easy to fix :)	2020-09-18 16:24:58 -04:00
Nico Weber	0ea2a57274	clangd: Make ompletionModelCodegen.py tpy2.7 compatible LLVM still supports Python 2.7, so unbreak bots that still run that. In a separate commit so that this is easy to revert once we drop support :)	2020-09-18 15:26:58 -04:00
Utkarsh Saxena	9b6765e784	[clangd] Add Random Forest runtime for code completion. Summary: [WIP] - Proposes a json format for representing Random Forest model. - Proposes a way to test the generated runtime using a test model. TODO: - Add generated source code snippet for easier review. - Fix unused label warning. - Figure out required using declarations for CATEGORICAL columns from Features.json. - Necessary Google3 internal modifications for blaze before landing. - Add documentation for format of the model. - Document more. Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D83814	2020-09-18 19:25:56 +02:00

16 Commits