Commit Graph

13 Commits

Author SHA1 Message Date
Sam McCall 7d1b499cae Revert "[clangd] Extract symbol-scope logic out of Quality, add tests. NFC"
On second thought, this can't properly be reused for highlighting.

Consider this example, which Quality wants to consider function-scope,
but highlighting must consider class-scope:

void foo() {
  class X {
    int ^y;
  };
}
2021-01-29 14:59:16 +01:00
Sam McCall d0817b5f18 [clangd] Extract symbol-scope logic out of Quality, add tests. NFC
This prepares for reuse from the semantic highlighting code.

There's a bit of yak-shaving here:
 - when the enum is moved into the clangd namespace, promote it to a
   scoped enum. This means teaching the decision forest infrastructure
   to deal with scoped enums.
 - AccessibleScope isn't quite the right name: e.g. public class members
   are treated as accessible, but still have class scope. So rename to
   SymbolScope.
 - Rename some QualitySignals members to avoid name conflicts.
   (the string) SymbolScope -> Scope
   (the enum) Scope -> ScopeKind
2021-01-29 14:44:28 +01:00
Utkarsh Saxena d5047d762f [clangd] Update CC Ranking model with better sampling.
A better sampling strategy was used to generate the dataset for this
model.
New signals introduced in this model:
- NumNameInContext: Number of words in the context that matches the name
of the candidate.
- FractionNameInContext: Fraction of the words in context matching the
name of the candidate.

We remove the signal `IsForbidden` from the model and down rank
forbidden signals aggresively.

Differential Revision: https://reviews.llvm.org/D94697
2021-01-15 18:13:24 +01:00
Aaron Ballman 45e0f65162 Add a floating-point suffix to silence warnings; NFC
This silences about 6000 warnings about truncating from double to float
with Visual Studio.
2020-11-04 10:09:51 -05:00
Utkarsh Saxena a0a6fd435c [clangd] New CC Ranking Model to fix bad inference due to overflow.
Unreachable file distances are represented as
`std::numeric_limits<unsigned>::max()`.
The previous dataset recorded the signals as `signed int` capturing this default
value as `-1`.

A new dataset was regenerated and a new model is trained that
interprets this unreachable as the intended value.

Distribution of `SymbolScopeDistance`:
Value         Normalised Frequency
0             46.6184
4294967295    29.5342
6             14.5666
4              6.4433
2              1.4534
8              0.5760
10             0.3581
....

Distribution of `FileProximityDistance`:
Value         Normalised Frequency
4294967295    39.9378
12             5.1997
14             4.9828
15             4.4221
16             4.3820
13             4.2765
17             3.8957
11             3.6387
19             3.4799
18             3.4076
....

Differential Revision: https://reviews.llvm.org/D89035
2020-10-08 15:30:00 +02:00
Utkarsh Saxena 45698ac005 [clangd] Split DecisionForest Evaluate() into one func per tree.
This allows us MSAN to instrument this function. Previous version is not
instrumentable due to it shear volume.

Differential Revision: https://reviews.llvm.org/D88536
2020-10-01 18:07:23 +02:00
Utkarsh Saxena a9f63d22fa [clangd] Disable msan instrumentation for generated Evaluate().
MSAN build times out for generated DecisionForest inference runtime.

A solution worth trying is splitting the function into 300 smaller
functions and then re-enable msan.

For now we are disabling instrumentation for the generated function.

Differential Revision: https://reviews.llvm.org/D88495
2020-09-29 17:44:10 +02:00
Utkarsh Saxena b5f7e9e26c [clangd] Add a trained DecisionForest for code completion.
Replaces the dummy CodeCompletion model with a trained DecisionForest
model.
The features.json needs to be manually curated specifying the features
to be used. This is a one-time cost and does not change if the model
changes until we decide to add/remove features.

Differential Revision: https://reviews.llvm.org/D88071
2020-09-28 18:35:10 +02:00
Utkarsh Saxena 985deba931 Revert "Temporarily Revert "[clangd] Add Random Forest runtime for code completion.""
We intend to replace heuristics based code completion ranking with a Decision Forest Model.

This patch introduces a format for representing the model and an inference runtime that is code-generated at build time.
- Forest.json contains all the trees as an array of trees.
- Features.json describes the features to be used.
- Codegen file takes the above two files and generates CompletionModel containing Feature struct and corresponding Evaluate function.
   The Evaluate function maps a feature to a real number describing the relevance of this candidate.
- The codegen is part of build system and these files are generated at build time.
- Proposes a way to test the generated runtime using a test model.
  - Replicates the model structure in unittests.
  - unittest tests both the test model (for correct tree traversal) and the real model (for sanity).

This reverts commit 549e55b3d5.
2020-09-19 10:54:04 +02:00
Eric Christopher 549e55b3d5 Temporarily Revert "[clangd] Add Random Forest runtime for code completion."
as a header doesn't appear to have made it into the commit.

This reverts commit 9b6765e784 and followup
2020-09-18 14:47:43 -07:00
Nico Weber 807777913e CompletionModelCodegen: Remove unused import
The unused import is 3.4+, so it also breaks py2.7 compat.
But this is easy to fix :)
2020-09-18 16:24:58 -04:00
Nico Weber 0ea2a57274 clangd: Make ompletionModelCodegen.py tpy2.7 compatible
LLVM still supports Python 2.7, so unbreak bots that still run that.
In a separate commit so that this is easy to revert once we drop
support :)
2020-09-18 15:26:58 -04:00
Utkarsh Saxena 9b6765e784 [clangd] Add Random Forest runtime for code completion.
Summary:
[WIP]
- Proposes a json format for representing Random Forest model.
- Proposes a way to test the generated runtime using a test model.

TODO:
- Add generated source code snippet for easier review.
- Fix unused label warning.
- Figure out required using declarations for CATEGORICAL columns from Features.json.
- Necessary Google3 internal modifications for blaze before landing.
- Add documentation for format of the model.
- Document more.

Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D83814
2020-09-18 19:25:56 +02:00