CodeCompletionContext::Kind has 36 Kinds. The completion model used to
support categorical features of 32 cardinality.
Due to this clangd tests were failing asan tests due to overflow.
This patch makes the completion model support 64 cardinality of
categorical features by storing ENUM Features as uint64_t instead of
uint32_t.
Verified that this fixes the asan failures.
Latency: 6.7ms (old) VS 6.8ms (new) per 1000 predictions.
Differential Revision: https://reviews.llvm.org/D97770
On second thought, this can't properly be reused for highlighting.
Consider this example, which Quality wants to consider function-scope,
but highlighting must consider class-scope:
void foo() {
class X {
int ^y;
};
}
This prepares for reuse from the semantic highlighting code.
There's a bit of yak-shaving here:
- when the enum is moved into the clangd namespace, promote it to a
scoped enum. This means teaching the decision forest infrastructure
to deal with scoped enums.
- AccessibleScope isn't quite the right name: e.g. public class members
are treated as accessible, but still have class scope. So rename to
SymbolScope.
- Rename some QualitySignals members to avoid name conflicts.
(the string) SymbolScope -> Scope
(the enum) Scope -> ScopeKind
This allows us MSAN to instrument this function. Previous version is not
instrumentable due to it shear volume.
Differential Revision: https://reviews.llvm.org/D88536
MSAN build times out for generated DecisionForest inference runtime.
A solution worth trying is splitting the function into 300 smaller
functions and then re-enable msan.
For now we are disabling instrumentation for the generated function.
Differential Revision: https://reviews.llvm.org/D88495
We intend to replace heuristics based code completion ranking with a Decision Forest Model.
This patch introduces a format for representing the model and an inference runtime that is code-generated at build time.
- Forest.json contains all the trees as an array of trees.
- Features.json describes the features to be used.
- Codegen file takes the above two files and generates CompletionModel containing Feature struct and corresponding Evaluate function.
The Evaluate function maps a feature to a real number describing the relevance of this candidate.
- The codegen is part of build system and these files are generated at build time.
- Proposes a way to test the generated runtime using a test model.
- Replicates the model structure in unittests.
- unittest tests both the test model (for correct tree traversal) and the real model (for sanity).
This reverts commit 549e55b3d5.
Summary:
[WIP]
- Proposes a json format for representing Random Forest model.
- Proposes a way to test the generated runtime using a test model.
TODO:
- Add generated source code snippet for easier review.
- Fix unused label warning.
- Figure out required using declarations for CATEGORICAL columns from Features.json.
- Necessary Google3 internal modifications for blaze before landing.
- Add documentation for format of the model.
- Document more.
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D83814