Commit Graph

16 Commits

Author SHA1 Message Date
Harald van Dijk 7907c46fe6
Make clangd CompletionModel not depend on directory layout.
The current code accounts for two possible layouts, but there is at
least a third supported layout: clang-tools-extra may also be checked
out as clang/tools/extra with the releases, which was not yet handled.
Rather than treating that as a special case, use the location of
CompletionModel.cmake to handle all three cases. This should address the
problems that prompted D96787 and the problems that prompted the
proposed revert D100625.

Reviewed By: usaxena95

Differential Revision: https://reviews.llvm.org/D101851
2021-05-05 19:25:34 +01:00
serge-sans-paille f51ab18716 Make clangd CompletionModel usable even with non-standard (but supported) layout
llvm supports specifying a non-standard layout where each project lies in its
own place. Do not assume a fixed layout and use the appropriate cmake variable
instead.

Differential Revision: https://reviews.llvm.org/D96787
2021-03-22 10:05:25 +01:00
Utkarsh Saxena bf935a034b [clangd] Make categorical features 64 bit in DecisionForest Model.
CodeCompletionContext::Kind has 36 Kinds. The completion model used to
support categorical features of 32 cardinality.
Due to this clangd tests were failing asan tests due to overflow.

This patch makes the completion model support 64 cardinality of
categorical features by storing ENUM Features as uint64_t instead of
uint32_t.

Verified that this fixes the asan failures.

Latency: 6.7ms (old) VS 6.8ms (new) per 1000 predictions.

Differential Revision: https://reviews.llvm.org/D97770
2021-03-02 16:22:30 +01:00
Sam McCall 7d1b499cae Revert "[clangd] Extract symbol-scope logic out of Quality, add tests. NFC"
On second thought, this can't properly be reused for highlighting.

Consider this example, which Quality wants to consider function-scope,
but highlighting must consider class-scope:

void foo() {
  class X {
    int ^y;
  };
}
2021-01-29 14:59:16 +01:00
Sam McCall d0817b5f18 [clangd] Extract symbol-scope logic out of Quality, add tests. NFC
This prepares for reuse from the semantic highlighting code.

There's a bit of yak-shaving here:
 - when the enum is moved into the clangd namespace, promote it to a
   scoped enum. This means teaching the decision forest infrastructure
   to deal with scoped enums.
 - AccessibleScope isn't quite the right name: e.g. public class members
   are treated as accessible, but still have class scope. So rename to
   SymbolScope.
 - Rename some QualitySignals members to avoid name conflicts.
   (the string) SymbolScope -> Scope
   (the enum) Scope -> ScopeKind
2021-01-29 14:44:28 +01:00
Utkarsh Saxena d5047d762f [clangd] Update CC Ranking model with better sampling.
A better sampling strategy was used to generate the dataset for this
model.
New signals introduced in this model:
- NumNameInContext: Number of words in the context that matches the name
of the candidate.
- FractionNameInContext: Fraction of the words in context matching the
name of the candidate.

We remove the signal `IsForbidden` from the model and down rank
forbidden signals aggresively.

Differential Revision: https://reviews.llvm.org/D94697
2021-01-15 18:13:24 +01:00
Aaron Ballman 45e0f65162 Add a floating-point suffix to silence warnings; NFC
This silences about 6000 warnings about truncating from double to float
with Visual Studio.
2020-11-04 10:09:51 -05:00
Utkarsh Saxena a0a6fd435c [clangd] New CC Ranking Model to fix bad inference due to overflow.
Unreachable file distances are represented as
`std::numeric_limits<unsigned>::max()`.
The previous dataset recorded the signals as `signed int` capturing this default
value as `-1`.

A new dataset was regenerated and a new model is trained that
interprets this unreachable as the intended value.

Distribution of `SymbolScopeDistance`:
Value         Normalised Frequency
0             46.6184
4294967295    29.5342
6             14.5666
4              6.4433
2              1.4534
8              0.5760
10             0.3581
....

Distribution of `FileProximityDistance`:
Value         Normalised Frequency
4294967295    39.9378
12             5.1997
14             4.9828
15             4.4221
16             4.3820
13             4.2765
17             3.8957
11             3.6387
19             3.4799
18             3.4076
....

Differential Revision: https://reviews.llvm.org/D89035
2020-10-08 15:30:00 +02:00
Utkarsh Saxena 45698ac005 [clangd] Split DecisionForest Evaluate() into one func per tree.
This allows us MSAN to instrument this function. Previous version is not
instrumentable due to it shear volume.

Differential Revision: https://reviews.llvm.org/D88536
2020-10-01 18:07:23 +02:00
Utkarsh Saxena a9f63d22fa [clangd] Disable msan instrumentation for generated Evaluate().
MSAN build times out for generated DecisionForest inference runtime.

A solution worth trying is splitting the function into 300 smaller
functions and then re-enable msan.

For now we are disabling instrumentation for the generated function.

Differential Revision: https://reviews.llvm.org/D88495
2020-09-29 17:44:10 +02:00
Utkarsh Saxena b5f7e9e26c [clangd] Add a trained DecisionForest for code completion.
Replaces the dummy CodeCompletion model with a trained DecisionForest
model.
The features.json needs to be manually curated specifying the features
to be used. This is a one-time cost and does not change if the model
changes until we decide to add/remove features.

Differential Revision: https://reviews.llvm.org/D88071
2020-09-28 18:35:10 +02:00
Utkarsh Saxena 985deba931 Revert "Temporarily Revert "[clangd] Add Random Forest runtime for code completion.""
We intend to replace heuristics based code completion ranking with a Decision Forest Model.

This patch introduces a format for representing the model and an inference runtime that is code-generated at build time.
- Forest.json contains all the trees as an array of trees.
- Features.json describes the features to be used.
- Codegen file takes the above two files and generates CompletionModel containing Feature struct and corresponding Evaluate function.
   The Evaluate function maps a feature to a real number describing the relevance of this candidate.
- The codegen is part of build system and these files are generated at build time.
- Proposes a way to test the generated runtime using a test model.
  - Replicates the model structure in unittests.
  - unittest tests both the test model (for correct tree traversal) and the real model (for sanity).

This reverts commit 549e55b3d5.
2020-09-19 10:54:04 +02:00
Eric Christopher 549e55b3d5 Temporarily Revert "[clangd] Add Random Forest runtime for code completion."
as a header doesn't appear to have made it into the commit.

This reverts commit 9b6765e784 and followup
2020-09-18 14:47:43 -07:00
Nico Weber 807777913e CompletionModelCodegen: Remove unused import
The unused import is 3.4+, so it also breaks py2.7 compat.
But this is easy to fix :)
2020-09-18 16:24:58 -04:00
Nico Weber 0ea2a57274 clangd: Make ompletionModelCodegen.py tpy2.7 compatible
LLVM still supports Python 2.7, so unbreak bots that still run that.
In a separate commit so that this is easy to revert once we drop
support :)
2020-09-18 15:26:58 -04:00
Utkarsh Saxena 9b6765e784 [clangd] Add Random Forest runtime for code completion.
Summary:
[WIP]
- Proposes a json format for representing Random Forest model.
- Proposes a way to test the generated runtime using a test model.

TODO:
- Add generated source code snippet for easier review.
- Fix unused label warning.
- Figure out required using declarations for CATEGORICAL columns from Features.json.
- Necessary Google3 internal modifications for blaze before landing.
- Add documentation for format of the model.
- Document more.

Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D83814
2020-09-18 19:25:56 +02:00