2018-08-15 00:03:32 +08:00
|
|
|
//===--- Quality.h - Ranking alternatives for ambiguous queries --*- C++-*-===//
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
//
|
2019-01-19 16:50:56 +08:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
//
|
2018-08-15 00:03:32 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
///
|
|
|
|
/// Some operations such as code completion produce a set of candidates.
|
|
|
|
/// Usually the user can choose between them, but we should put the best options
|
|
|
|
/// at the top (they're easier to select, and more likely to be seen).
|
|
|
|
///
|
|
|
|
/// This file defines building blocks for ranking candidates.
|
|
|
|
/// It's used by the features directly and also in the implementation of
|
|
|
|
/// indexes, as indexes also need to heuristically limit their results.
|
|
|
|
///
|
|
|
|
/// The facilities here are:
|
|
|
|
/// - retrieving scoring signals from e.g. indexes, AST, CodeCompletionString
|
|
|
|
/// These are structured in a way that they can be debugged, and are fairly
|
|
|
|
/// consistent regardless of the source.
|
|
|
|
/// - compute scores from scoring signals. These are suitable for sorting.
|
|
|
|
/// - sorting utilities like the TopN container.
|
|
|
|
/// These could be split up further to isolate dependencies if we care.
|
|
|
|
///
|
2018-08-15 00:03:32 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_QUALITY_H
|
|
|
|
#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_QUALITY_H
|
2018-08-15 00:03:32 +08:00
|
|
|
|
2018-11-26 23:38:01 +08:00
|
|
|
#include "ExpectedTypes.h"
|
2018-10-17 19:19:02 +08:00
|
|
|
#include "FileDistance.h"
|
2021-01-10 23:32:00 +08:00
|
|
|
#include "TUScheduler.h"
|
2018-07-23 18:56:37 +08:00
|
|
|
#include "clang/Sema/CodeCompleteConsumer.h"
|
2018-06-15 16:58:12 +08:00
|
|
|
#include "llvm/ADT/ArrayRef.h"
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
#include "llvm/ADT/StringRef.h"
|
2019-05-06 18:25:10 +08:00
|
|
|
#include "llvm/ADT/StringSet.h"
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
#include <algorithm>
|
|
|
|
#include <functional>
|
|
|
|
#include <vector>
|
2018-08-15 00:03:32 +08:00
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
namespace llvm {
|
|
|
|
class raw_ostream;
|
2019-05-06 18:25:10 +08:00
|
|
|
} // namespace llvm
|
2018-08-15 00:03:32 +08:00
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
namespace clang {
|
|
|
|
class CodeCompletionResult;
|
2018-08-15 00:03:32 +08:00
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
namespace clangd {
|
2018-08-15 00:03:32 +08:00
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
struct Symbol;
|
2018-07-03 16:09:29 +08:00
|
|
|
class URIDistance;
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
|
|
|
|
// Signals structs are designed to be aggregated from 0 or more sources.
|
|
|
|
// A default instance has neutral signals, and sources are merged into it.
|
|
|
|
// They can be dumped for debugging, and evaluate()d into a score.
|
|
|
|
|
|
|
|
/// Attributes of a symbol that affect how much we like it.
|
|
|
|
struct SymbolQualitySignals {
|
|
|
|
bool Deprecated = false;
|
2018-06-08 17:36:34 +08:00
|
|
|
bool ReservedName = false; // __foo, _Foo are usually implementation details.
|
|
|
|
// FIXME: make these findable once user types _.
|
2018-10-18 20:23:05 +08:00
|
|
|
bool ImplementationDetail = false;
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
unsigned References = 0;
|
|
|
|
|
2018-06-06 16:53:36 +08:00
|
|
|
enum SymbolCategory {
|
2018-06-14 21:42:21 +08:00
|
|
|
Unknown = 0,
|
2018-06-06 16:53:36 +08:00
|
|
|
Variable,
|
|
|
|
Macro,
|
|
|
|
Type,
|
|
|
|
Function,
|
2018-07-24 16:51:52 +08:00
|
|
|
Constructor,
|
2018-11-30 19:17:15 +08:00
|
|
|
Destructor,
|
2018-06-06 16:53:36 +08:00
|
|
|
Namespace,
|
2018-06-14 21:42:21 +08:00
|
|
|
Keyword,
|
2018-11-30 19:17:15 +08:00
|
|
|
Operator,
|
2018-06-06 16:53:36 +08:00
|
|
|
} Category = Unknown;
|
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
void merge(const CodeCompletionResult &SemaCCResult);
|
|
|
|
void merge(const Symbol &IndexResult);
|
|
|
|
|
|
|
|
// Condense these signals down to a single number, higher is better.
|
2020-09-29 01:19:51 +08:00
|
|
|
float evaluateHeuristics() const;
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
};
|
|
|
|
llvm::raw_ostream &operator<<(llvm::raw_ostream &,
|
|
|
|
const SymbolQualitySignals &);
|
|
|
|
|
|
|
|
/// Attributes of a symbol-query pair that affect how much we like it.
|
|
|
|
struct SymbolRelevanceSignals {
|
2019-05-06 18:25:10 +08:00
|
|
|
/// The name of the symbol (for ContextWords). Must be explicitly assigned.
|
|
|
|
llvm::StringRef Name;
|
2018-06-06 20:38:37 +08:00
|
|
|
/// 0-1+ fuzzy-match score for unqualified name. Must be explicitly assigned.
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
float NameMatch = 1;
|
2019-05-06 18:25:10 +08:00
|
|
|
/// Lowercase words relevant to the context (e.g. near the completion point).
|
|
|
|
llvm::StringSet<>* ContextWords = nullptr;
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
bool Forbidden = false; // Unavailable (e.g const) or inaccessible (private).
|
2018-08-08 16:59:29 +08:00
|
|
|
/// Whether fixits needs to be applied for that completion or not.
|
|
|
|
bool NeedsFixIts = false;
|
2018-10-24 21:45:17 +08:00
|
|
|
bool InBaseClass = false; // A member from base class of the accessed class.
|
2018-06-15 16:58:12 +08:00
|
|
|
|
2018-07-03 16:09:29 +08:00
|
|
|
URIDistance *FileProximityMatch = nullptr;
|
2018-10-17 19:19:02 +08:00
|
|
|
/// These are used to calculate proximity between the index symbol and the
|
2018-06-15 16:58:12 +08:00
|
|
|
/// query.
|
|
|
|
llvm::StringRef SymbolURI;
|
|
|
|
/// FIXME: unify with index proximity score - signals should be
|
|
|
|
/// source-independent.
|
2018-10-17 19:19:02 +08:00
|
|
|
/// Proximity between best declaration and the query. [0-1], 1 is closest.
|
|
|
|
float SemaFileProximityScore = 0;
|
|
|
|
|
|
|
|
// Scope proximity is only considered (both index and sema) when this is set.
|
|
|
|
ScopeDistance *ScopeProximityMatch = nullptr;
|
2021-01-29 21:38:43 +08:00
|
|
|
llvm::Optional<llvm::StringRef> Scope;
|
2018-10-17 19:19:02 +08:00
|
|
|
// A symbol from sema should be accessible from the current scope.
|
|
|
|
bool SemaSaysInScope = false;
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
|
2021-01-29 21:38:43 +08:00
|
|
|
SymbolScope ScopeKind = SymbolScope::GlobalScope;
|
2018-06-06 00:30:25 +08:00
|
|
|
|
|
|
|
enum QueryType {
|
|
|
|
CodeComplete,
|
|
|
|
Generic,
|
|
|
|
} Query = Generic;
|
|
|
|
|
2018-07-23 18:56:37 +08:00
|
|
|
CodeCompletionContext::Kind Context = CodeCompletionContext::CCC_Other;
|
|
|
|
|
|
|
|
// Whether symbol is an instance member of a class.
|
|
|
|
bool IsInstanceMember = false;
|
|
|
|
|
2018-11-26 23:38:01 +08:00
|
|
|
// Whether clang provided a preferred type in the completion context.
|
|
|
|
bool HadContextType = false;
|
|
|
|
// Whether a source completion item or a symbol had a type information.
|
|
|
|
bool HadSymbolType = false;
|
|
|
|
// Whether the item matches the type expected in the completion context.
|
|
|
|
bool TypeMatchesPreferred = false;
|
|
|
|
|
[clangd] Use Decision Forest to score code completions.
By default clangd will score a code completion item using heuristics model.
Scoring can be done by Decision Forest model by passing `--ranking_model=decision_forest` to
clangd.
Features omitted from the model:
- `NameMatch` is excluded because the final score must be multiplicative in `NameMatch` to allow rescoring by the editor.
- `NeedsFixIts` is excluded because the generating dataset that needs 'fixits' is non-trivial.
There are multiple ways (heuristics) to combine the above two features with the prediction of the DF:
- `NeedsFixIts` is used as is with a penalty of `0.5`.
Various alternatives of combining NameMatch `N` and Decision forest Prediction `P`
- N * scale(P, 0, 1): Linearly scale the output of model to range [0, 1]
- N * a^P:
- More natural: Prediction of each Decision Tree can be considered as a multiplicative boost (like NameMatch)
- Ordering is independent of the absolute value of P. Order of two items is proportional to `a^{difference in model prediction score}`. Higher `a` gives higher weightage to model output as compared to NameMatch score.
Baseline MRR = 0.619
MRR for various combinations:
N * P = 0.6346, advantage%=2.5768
N * 1.1^P = 0.6600, advantage%=6.6853
N * **1.2**^P = 0.6669, advantage%=**7.8005**
N * **1.3**^P = 0.6668, advantage%=**7.7795**
N * **1.4**^P = 0.6659, advantage%=**7.6270**
N * 1.5^P = 0.6646, advantage%=7.4200
N * 1.6^P = 0.6636, advantage%=7.2671
N * 1.7^P = 0.6629, advantage%=7.1450
N * 2^P = 0.6612, advantage%=6.8673
N * 2.5^P = 0.6598, advantage%=6.6491
N * 3^P = 0.6590, advantage%=6.5242
N * scaled[0, 1] = 0.6465, advantage%=4.5054
Differential Revision: https://reviews.llvm.org/D88281
2020-09-22 13:56:08 +08:00
|
|
|
/// Length of the unqualified partial name of Symbol typed in
|
|
|
|
/// CompletionPrefix.
|
|
|
|
unsigned FilterLength = 0;
|
|
|
|
|
2021-01-10 23:32:00 +08:00
|
|
|
const ASTSignals *MainFileSignals = nullptr;
|
|
|
|
/// Number of references to the candidate in the main file.
|
|
|
|
unsigned MainFileRefs = 0;
|
|
|
|
/// Number of unique symbols in the main file which belongs to candidate's
|
|
|
|
/// namespace. This indicates how relevant the namespace is in the current
|
|
|
|
/// file.
|
|
|
|
unsigned ScopeRefsInFile = 0;
|
|
|
|
|
2020-09-23 20:37:07 +08:00
|
|
|
/// Set of derived signals computed by calculateDerivedSignals(). Must not be
|
|
|
|
/// set explicitly.
|
|
|
|
struct DerivedSignals {
|
|
|
|
/// Whether Name contains some word from context.
|
|
|
|
bool NameMatchesContext = false;
|
|
|
|
/// Min distance between SymbolURI and all the headers included by the TU.
|
|
|
|
unsigned FileProximityDistance = FileDistance::Unreachable;
|
|
|
|
/// Min distance between SymbolScope and all the available scopes.
|
|
|
|
unsigned ScopeProximityDistance = FileDistance::Unreachable;
|
|
|
|
};
|
|
|
|
|
|
|
|
DerivedSignals calculateDerivedSignals() const;
|
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
void merge(const CodeCompletionResult &SemaResult);
|
2018-06-06 00:30:25 +08:00
|
|
|
void merge(const Symbol &IndexResult);
|
2021-01-10 23:32:00 +08:00
|
|
|
void computeASTSignals(const CodeCompletionResult &SemaResult);
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
|
|
|
|
// Condense these signals down to a single number, higher is better.
|
2020-09-29 01:19:51 +08:00
|
|
|
float evaluateHeuristics() const;
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
};
|
|
|
|
llvm::raw_ostream &operator<<(llvm::raw_ostream &,
|
|
|
|
const SymbolRelevanceSignals &);
|
|
|
|
|
|
|
|
/// Combine symbol quality and relevance into a single score.
|
|
|
|
float evaluateSymbolAndRelevance(float SymbolQuality, float SymbolRelevance);
|
|
|
|
|
2020-10-23 16:19:53 +08:00
|
|
|
/// Same semantics as CodeComplete::Score. Quality score and Relevance score
|
|
|
|
/// have been removed since DecisionForest cannot assign individual scores to
|
|
|
|
/// Quality and Relevance signals.
|
|
|
|
struct DecisionForestScores {
|
|
|
|
float Total = 0.f;
|
|
|
|
float ExcludingName = 0.f;
|
|
|
|
};
|
|
|
|
|
|
|
|
DecisionForestScores
|
|
|
|
evaluateDecisionForest(const SymbolQualitySignals &Quality,
|
|
|
|
const SymbolRelevanceSignals &Relevance, float Base);
|
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
/// TopN<T> is a lossy container that preserves only the "best" N elements.
|
|
|
|
template <typename T, typename Compare = std::greater<T>> class TopN {
|
|
|
|
public:
|
|
|
|
using value_type = T;
|
|
|
|
TopN(size_t N, Compare Greater = Compare())
|
|
|
|
: N(N), Greater(std::move(Greater)) {}
|
|
|
|
|
|
|
|
// Adds a candidate to the set.
|
|
|
|
// Returns true if a candidate was dropped to get back under N.
|
|
|
|
bool push(value_type &&V) {
|
|
|
|
bool Dropped = false;
|
|
|
|
if (Heap.size() >= N) {
|
|
|
|
Dropped = true;
|
|
|
|
if (N > 0 && Greater(V, Heap.front())) {
|
|
|
|
std::pop_heap(Heap.begin(), Heap.end(), Greater);
|
|
|
|
Heap.back() = std::move(V);
|
|
|
|
std::push_heap(Heap.begin(), Heap.end(), Greater);
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
Heap.push_back(std::move(V));
|
|
|
|
std::push_heap(Heap.begin(), Heap.end(), Greater);
|
|
|
|
}
|
|
|
|
assert(Heap.size() <= N);
|
|
|
|
assert(std::is_heap(Heap.begin(), Heap.end(), Greater));
|
|
|
|
return Dropped;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Returns candidates from best to worst.
|
|
|
|
std::vector<value_type> items() && {
|
|
|
|
std::sort_heap(Heap.begin(), Heap.end(), Greater);
|
|
|
|
assert(Heap.size() <= N);
|
|
|
|
return std::move(Heap);
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
const size_t N;
|
|
|
|
std::vector<value_type> Heap; // Min-heap, comparator is Greater.
|
|
|
|
Compare Greater;
|
|
|
|
};
|
|
|
|
|
2018-05-30 20:41:19 +08:00
|
|
|
/// Returns a string that sorts in the same order as (-Score, Tiebreak), for
|
|
|
|
/// LSP. (The highest score compares smallest so it sorts at the top).
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
std::string sortText(float Score, llvm::StringRef Tiebreak = "");
|
|
|
|
|
2018-08-13 16:40:05 +08:00
|
|
|
struct SignatureQualitySignals {
|
|
|
|
uint32_t NumberOfParameters = 0;
|
|
|
|
uint32_t NumberOfOptionalParameters = 0;
|
|
|
|
CodeCompleteConsumer::OverloadCandidate::CandidateKind Kind =
|
|
|
|
CodeCompleteConsumer::OverloadCandidate::CandidateKind::CK_Function;
|
|
|
|
};
|
|
|
|
llvm::raw_ostream &operator<<(llvm::raw_ostream &,
|
|
|
|
const SignatureQualitySignals &);
|
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-16 01:43:27 +08:00
|
|
|
} // namespace clangd
|
|
|
|
} // namespace clang
|
|
|
|
|
2018-08-15 00:03:32 +08:00
|
|
|
#endif // LLVM_CLANG_TOOLS_EXTRA_CLANGD_QUALITY_H
|