On second thought, this can't properly be reused for highlighting.
Consider this example, which Quality wants to consider function-scope,
but highlighting must consider class-scope:
void foo() {
class X {
int ^y;
};
}
This prepares for reuse from the semantic highlighting code.
There's a bit of yak-shaving here:
- when the enum is moved into the clangd namespace, promote it to a
scoped enum. This means teaching the decision forest infrastructure
to deal with scoped enums.
- AccessibleScope isn't quite the right name: e.g. public class members
are treated as accessible, but still have class scope. So rename to
SymbolScope.
- Rename some QualitySignals members to avoid name conflicts.
(the string) SymbolScope -> Scope
(the enum) Scope -> ScopeKind
A better sampling strategy was used to generate the dataset for this
model.
New signals introduced in this model:
- NumNameInContext: Number of words in the context that matches the name
of the candidate.
- FractionNameInContext: Fraction of the words in context matching the
name of the candidate.
We remove the signal `IsForbidden` from the model and down rank
forbidden signals aggresively.
Differential Revision: https://reviews.llvm.org/D94697
Unreachable file distances are represented as
`std::numeric_limits<unsigned>::max()`.
The previous dataset recorded the signals as `signed int` capturing this default
value as `-1`.
A new dataset was regenerated and a new model is trained that
interprets this unreachable as the intended value.
Distribution of `SymbolScopeDistance`:
Value Normalised Frequency
0 46.6184
4294967295 29.5342
6 14.5666
4 6.4433
2 1.4534
8 0.5760
10 0.3581
....
Distribution of `FileProximityDistance`:
Value Normalised Frequency
4294967295 39.9378
12 5.1997
14 4.9828
15 4.4221
16 4.3820
13 4.2765
17 3.8957
11 3.6387
19 3.4799
18 3.4076
....
Differential Revision: https://reviews.llvm.org/D89035
This allows us MSAN to instrument this function. Previous version is not
instrumentable due to it shear volume.
Differential Revision: https://reviews.llvm.org/D88536
MSAN build times out for generated DecisionForest inference runtime.
A solution worth trying is splitting the function into 300 smaller
functions and then re-enable msan.
For now we are disabling instrumentation for the generated function.
Differential Revision: https://reviews.llvm.org/D88495
Replaces the dummy CodeCompletion model with a trained DecisionForest
model.
The features.json needs to be manually curated specifying the features
to be used. This is a one-time cost and does not change if the model
changes until we decide to add/remove features.
Differential Revision: https://reviews.llvm.org/D88071
We intend to replace heuristics based code completion ranking with a Decision Forest Model.
This patch introduces a format for representing the model and an inference runtime that is code-generated at build time.
- Forest.json contains all the trees as an array of trees.
- Features.json describes the features to be used.
- Codegen file takes the above two files and generates CompletionModel containing Feature struct and corresponding Evaluate function.
The Evaluate function maps a feature to a real number describing the relevance of this candidate.
- The codegen is part of build system and these files are generated at build time.
- Proposes a way to test the generated runtime using a test model.
- Replicates the model structure in unittests.
- unittest tests both the test model (for correct tree traversal) and the real model (for sanity).
This reverts commit 549e55b3d5.
Summary:
[WIP]
- Proposes a json format for representing Random Forest model.
- Proposes a way to test the generated runtime using a test model.
TODO:
- Add generated source code snippet for easier review.
- Fix unused label warning.
- Figure out required using declarations for CATEGORICAL columns from Features.json.
- Necessary Google3 internal modifications for blaze before landing.
- Add documentation for format of the model.
- Document more.
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D83814