Commit Graph

30 Commits

Author SHA1 Message Date
Utkarsh Saxena 275716d6db [clangd] Derive new signals in CC from ASTSignals.
This patch only introduces new signals but does not use their value
in scoring a CC candidate. Usage of these signals in CC ranking in both
heiristics and ML model will be introduced in later patches.

Differential Revision: https://reviews.llvm.org/D94473
2021-01-18 17:37:27 +01:00
Utkarsh Saxena 7df80a1204 [clangd] Add support for multiple DecisionForest model experiments.
With every incremental change, one needs to check-in new model upstream.
This also significantly increases the size of the git repo with every
new model.
Testing and comparing the old and previous model is also not possible as
we run only a single model at any point.

One solution is to have a "staging" decision forest which can be
injected into clangd without pushing it to upstream. Compare the
performance of the staging model with the live model. After a couple of
enhancements have been done to staging model, we can then replace the
live model upstream with the staging model. This reduces upstream churn
and also allows us to compare models with current baseline model.

This is done by having a callback in CodeCompleteOptions which is called
only when we want to use a decision forest ranking model. This allows us
to inject different completion model internally.

Differential Revision: https://reviews.llvm.org/D90014
2020-10-29 19:49:40 +01:00
Utkarsh Saxena 9b1666f3ce [clangd] Rename evaluate() to evaluateHeuristics()
Since we have 2 scoring functions (heuristics and decision forest),
renaming the existing evaluate() function to be more descriptive of the
Heuristics being evaluated in it.

Differential Revision: https://reviews.llvm.org/D88431
2020-09-28 20:05:01 +02:00
Utkarsh Saxena a8b55b6939 [clangd] Use Decision Forest to score code completions.
By default clangd will score a code completion item using heuristics model.

Scoring can be done by Decision Forest model by passing `--ranking_model=decision_forest` to
clangd.

Features omitted from the model:
- `NameMatch` is excluded because the final score must be multiplicative in `NameMatch` to allow rescoring by the editor.
- `NeedsFixIts` is excluded because the generating dataset that needs 'fixits' is non-trivial.

There are multiple ways (heuristics) to combine the above two features with the prediction of the DF:
- `NeedsFixIts` is used as is with a penalty of `0.5`.

Various alternatives of combining NameMatch `N` and Decision forest Prediction `P`
- N * scale(P, 0, 1): Linearly scale the output of model to range [0, 1]
- N * a^P:
  - More natural: Prediction of each Decision Tree can be considered as a multiplicative boost (like NameMatch)
  - Ordering is independent of the absolute value of P. Order of two items is proportional to `a^{difference in model prediction score}`. Higher `a` gives higher weightage to model output as compared to NameMatch score.

Baseline MRR = 0.619
MRR for various combinations:
N * P = 0.6346, advantage%=2.5768
N * 1.1^P = 0.6600, advantage%=6.6853
N * **1.2**^P = 0.6669, advantage%=**7.8005**
N * **1.3**^P = 0.6668, advantage%=**7.7795**
N * **1.4**^P = 0.6659, advantage%=**7.6270**
N * 1.5^P = 0.6646, advantage%=7.4200
N * 1.6^P = 0.6636, advantage%=7.2671
N * 1.7^P = 0.6629, advantage%=7.1450
N * 2^P = 0.6612, advantage%=6.8673
N * 2.5^P = 0.6598, advantage%=6.6491
N * 3^P = 0.6590, advantage%=6.5242
N * scaled[0, 1] = 0.6465, advantage%=4.5054

Differential Revision: https://reviews.llvm.org/D88281
2020-09-28 18:59:29 +02:00
Utkarsh Saxena 158af0d3d1 [clangd] Refactor code completion signal's utility properties.
Current implementation of heuristic-based scoring function also contains
computation of derived signals (e.g. whether name contains a word from
context, computing file distances, scope distances.)
This is an attempt to separate out the logic for computation of derived
signals from the scoring function.
This will allow us to have a clean API for scoring functions that will
take only concrete code completion signals as input.

Differential Revision: https://reviews.llvm.org/D88146
2020-09-23 16:12:18 +02:00
Ilya Biryukov 54eeb3f40a [clangd] Remove unused signature help quality signal. NFC
ContainsActiveParameter is not used anywhere, set incorrectly (see the
removed FIXME) and has no unit tests.
Removing it to simplify the code.

llvm-svn: 362686
2019-06-06 08:32:25 +00:00
Sam McCall 9fb22b2c86 [clangd] Boost code completion results that were named in the last few lines.
Summary:
The hope is this will catch a few patterns with repetition:

  SomeClass* S = ^SomeClass::Create()

  int getFrobnicator() { return ^frobnicator_; }

  // discard the factory, it's no longer valid.
  ^MyFactory.reset();

Without triggering antipatterns too often:

  return Point(x.first, x.^second);

I'm going to gather some data on whether this turns out to be a win overall.

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, jfb, kadircet, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61537

llvm-svn: 360030
2019-05-06 10:25:10 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Eric Liu 5ac37f495a [clangd] Penalize destructor and overloaded operators in code completion.
Reviewers: hokein

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D55061

llvm-svn: 347983
2018-11-30 11:17:15 +00:00
Ilya Biryukov 647da3e8a5 [clangd] Add type boosting in code completion
Reviewers: sammccall, ioeric

Reviewed By: sammccall

Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D52276

llvm-svn: 347562
2018-11-26 15:38:01 +00:00
Eric Liu 52a11b5662 [clangd] Downrank members from base class
Reviewers: sammccall, ilya-biryukov

Reviewed By: sammccall

Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D53638

llvm-svn: 345140
2018-10-24 13:45:17 +00:00
Eric Liu 4859738cfe [clangd] Names that are not spelled in source code are reserved.
Summary:
These are often not expected to be used directly e.g.
```
TEST_F(Fixture, X) {
  ^  // "Fixture_X_Test" expanded in the macro should be down ranked.
}
```

Only doing this for sema for now, as such symbols are mostly coming from sema
e.g. gtest macros expanded in the main file. We could also add a similar field
for the index symbol.

Reviewers: sammccall

Reviewed By: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D53374

llvm-svn: 344736
2018-10-18 12:23:05 +00:00
Eric Liu 3fac4ef1fd [clangd] Support scope proximity in code completion.
Summary:
This should make all-scope completion more usable. Scope proximity for
indexes will be added in followup patch.

Reviewers: sammccall

Reviewed By: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D53131

llvm-svn: 344688
2018-10-17 11:19:02 +00:00
Kirill Bobyrev 8e35f1e7cb NFC: Enforce good formatting across multiple clang-tools-extra files
This patch improves readability of multiple files in clang-tools-extra
and enforces LLVM Coding Guidelines.

Reviewed by: ioeric

Differential Revision: https://reviews.llvm.org/D50707

llvm-svn: 339687
2018-08-14 16:03:32 +00:00
Kadir Cetinkaya e486e37a09 [clangd] Introduce scoring mechanism for SignatureInformations.
Reviewers: ilya-biryukov

Reviewed By: ilya-biryukov

Subscribers: mgrang, ioeric, MaskRay, jkorous, arphaman, cfe-commits

Differential Revision: https://reviews.llvm.org/D50555

llvm-svn: 339547
2018-08-13 08:40:05 +00:00
Kadir Cetinkaya 2f84d91131 Added functionality to suggest FixIts for conversion of '->' to '.' and vice versa.
Summary: Added functionality to suggest FixIts for conversion of '->' to '.' and vice versa.

Reviewers: ilya-biryukov

Reviewed By: ilya-biryukov

Subscribers: yvvan, ioeric, jkorous, arphaman, cfe-commits, kadircet

Differential Revision: https://reviews.llvm.org/D50193

llvm-svn: 339224
2018-08-08 08:59:29 +00:00
Eric Liu d7de81172e [clangd] Tune down quality score for class constructors so that it's ranked after class types.
Summary:
Currently, class constructors have the same score as the class types, and they
are often ranked before class types. This is often not desireable and can
be annoying when snippet is enabled and constructor signatures are added.

Metrics:

```
==================================================================================================
                                        OVERALL
==================================================================================================
  Total measurements: 111117 (+0)
  All measurements:
	MRR: 64.06 (+0.20)	Top-5: 75.73% (+0.14%)	Top-100: 93.71% (+0.01%)
  Full identifiers:
	MRR: 98.25 (+0.55)	Top-5: 99.04% (+0.03%)	Top-100: 99.16% (+0.00%)
  Filter length 0-5:
	MRR:      15.23 (+0.02)		50.50 (-0.02)		65.04 (+0.11)		70.75 (+0.19)		74.37 (+0.25)		79.43 (+0.32)
	Top-5:    40.90% (+0.03%)		74.52% (+0.03%)		87.23% (+0.15%)		91.68% (+0.08%)		93.68% (+0.14%)		95.87% (+0.12%)
	Top-100:  68.21% (+0.02%)		96.28% (+0.07%)		98.43% (+0.00%)		98.72% (+0.00%)		98.74% (+0.01%)		98.81% (+0.00%)
==================================================================================================
                                        DEFAULT
==================================================================================================
  Total measurements: 57535 (+0)
  All measurements:
	MRR: 58.07 (+0.37)	Top-5: 69.94% (+0.26%)	Top-100: 90.14% (+0.03%)
  Full identifiers:
	MRR: 97.13 (+1.05)	Top-5: 98.14% (+0.06%)	Top-100: 98.34% (+0.00%)
  Filter length 0-5:
	MRR:      13.91 (+0.00)		38.53 (+0.01)		55.58 (+0.21)		63.63 (+0.30)		69.23 (+0.47)		72.87 (+0.60)
	Top-5:    24.99% (+0.00%)		62.70% (+0.06%)		82.80% (+0.30%)		88.66% (+0.16%)		92.02% (+0.27%)		93.53% (+0.21%)
	Top-100:  51.56% (+0.05%)		93.19% (+0.13%)		97.30% (+0.00%)		97.81% (+0.00%)		97.85% (+0.01%)		97.79% (+0.00%)
```

Remark:
- The full-id completions have +1.05 MRR improvement.
- There is no noticeable impact on EXPLICIT_MEMBER_ACCESS and WANT_LOCAL.

Reviewers: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits

Differential Revision: https://reviews.llvm.org/D49667

llvm-svn: 337816
2018-07-24 08:51:52 +00:00
Eric Liu 5d2a807f25 [clangd] Penalize non-instance members when accessed via class instances.
Summary:
The following are metrics for explicit member access completions. There is no
noticeable impact on other completion types.

Before:
EXPLICIT_MEMBER_ACCESS
  Total measurements: 24382
  All measurements: MRR: 62.27	Top10: 80.21%	Top-100: 94.48%
  Full identifiers: MRR: 98.81	Top10: 99.89%	Top-100: 99.95%
  0-5 filter len:
	MRR:  13.25	46.31	62.47	67.77	70.40	81.91
	Top-10:  29%	74%	84%	91%	91%	97%
	Top-100:  67%	99%	99%	99%	99%	100%

After:
EXPLICIT_MEMBER_ACCESS
  Total measurements: 24382
  All measurements: MRR: 63.18	Top10: 80.58%	Top-100: 95.07%
  Full identifiers: MRR: 98.79	Top10: 99.89%	Top-100: 99.95%
  0-5 filter len:
	MRR:  13.84	48.39	63.55	68.83	71.28	82.64
	Top-10:  30%	75%	84%	91%	91%	97%
	Top-100:  70%	99%	99%	99%	99%	100%

* Top-N: wanted result is found in the first N completion results.
* MRR: Mean reciprocal rank.

Remark: the change seems to have minor positive impact. Although the improvement
is relatively small, down-ranking non-instance members in instance member access
should reduce noise in the completion results.

Reviewers: sammccall

Reviewed By: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits

Differential Revision: https://reviews.llvm.org/D49543

llvm-svn: 337681
2018-07-23 10:56:37 +00:00
Sam McCall 3f0243fdaf [clangd] Incorporate transitive #includes into code complete proximity scoring.
Summary:
We now compute a distance from the main file to the symbol header, which
is a weighted count of:
 - some number of #include traversals from source file --> included file
 - some number of FS traversals from file --> parent directory
 - some number of FS traversals from parent directory --> child file/dir
This calculation is performed in the appropriate URI scheme.

This means we'll get some proximity boost from header files in main-file
contexts, even when these are in different directory trees.

This extended file proximity model is not yet incorporated in the index
interface/implementation.

Reviewers: ioeric

Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D48441

llvm-svn: 336177
2018-07-03 08:09:29 +00:00
Eric Liu 09c3c37b72 [clangd] Boost completion score according to file proximity.
Summary:
Also move unittest: URI scheme to TestFS so that it can be shared by
different tests.

Reviewers: sammccall

Reviewed By: sammccall

Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D47935

llvm-svn: 334810
2018-06-15 08:58:12 +00:00
Sam McCall c3b5bad723 [clangd] Boost keyword completions.
Summary: These have few signals other than being keywords, so the boost is high.

Reviewers: ilya-biryukov

Subscribers: ioeric, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D48083

llvm-svn: 334711
2018-06-14 13:42:21 +00:00
Sam McCall e018b36cea [clangd] Downrank symbols with reserved names (score *= 0.1)
Reviewers: ilya-biryukov

Subscribers: klimek, ioeric, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D47707

llvm-svn: 334274
2018-06-08 09:36:34 +00:00
Sam McCall 4caa85129f [clangd] Code completion: drop explicit injected names/operators, ignore Sema priority
Summary:
Now we have most of Sema's code completion signals incorporated in Quality,
which will allow us to give consistent ranking to sema/index results.

Therefore we can/should stop using Sema priority as an explicit signal.
This fixes some issues like namespaces always having a terrible score.

The most important missing signals are:
 - Really dumb/rarely useful completions like:
    SomeStruct().^SomeStruct
    SomeStruct().^operator=
    SomeStruct().~SomeStruct()
   We already filter out destructors, this patch adds injected names and
   operators to that list.
 - type matching the expression context.
   Ilya has a plan to add this in a way that's compatible with indexes
   (design doc should be shared real soon now!)

Reviewers: ioeric

Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D47871

llvm-svn: 334192
2018-06-07 12:49:17 +00:00
Sam McCall bc7cbb7895 [clangd] Boost fuzzy match score by 2x (so a maximum of 2) when the query is the full identifier name.
Summary: Fix a couple of bugs in tests an in Quality to keep tests passing.

Reviewers: ioeric

Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D47815

llvm-svn: 334089
2018-06-06 12:38:37 +00:00
Sam McCall 4a3c69ba6e Adjust symbol score based on crude symbol type.
Summary: Numbers are guesses to be adjusted later.

Reviewers: ioeric

Subscribers: ilya-biryukov, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D47787

llvm-svn: 334074
2018-06-06 08:53:36 +00:00
Sam McCall d9b54f0025 [clangd] Boost code completion results that are narrowly scoped (local, members)
Summary:
This signal is considered a relevance rather than a quality signal because it's
dependent on the query (the fact that it's completion, and implicitly the query
context).

This is part of the effort to reduce reliance on Sema priority, so we can have
consistent ranking between Index and Sema results.

Reviewers: ioeric

Subscribers: klimek, ilya-biryukov, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D47762

llvm-svn: 334026
2018-06-05 16:30:25 +00:00
Sam McCall db41e1c6da [clangd] Test tweaks (consistency, shorter, doc). NFC
llvm-svn: 334014
2018-06-05 12:22:43 +00:00
Ilya Biryukov f029646fa3 [clangd] Boost scores for decls from current file in completion
Summary: This should, arguably, give better ranking.

Reviewers: ioeric, sammccall

Reviewed By: sammccall

Subscribers: mgorny, klimek, MaskRay, jkorous, mgrang, cfe-commits

Differential Revision: https://reviews.llvm.org/D46943

llvm-svn: 333906
2018-06-04 14:50:59 +00:00
Ilya Biryukov 8573defa2d [clangd] clang-format the source code. NFC
llvm-svn: 333537
2018-05-30 12:41:19 +00:00
Sam McCall c5707b6c36 [clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
 - awkward to test. The mechanisms (extracting info from index/sema) can be
   unit-tested well, the policy (scoring) should be quantitatively measured.
   Neither was easily possible, and debugging was hard.
   The intermediate signal struct makes this easier.
 - hard to reuse. This is a bug in workspaceSymbols: it just presents the
   results in the index order, which is not sorted in practice, it needs to rank
   them!
   Also, index implementations care about scoring (both query-dependent and
   independent) in order to truncate result lists appropriately.

The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).

Reviewers: ilya-biryukov

Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits

Differential Revision: https://reviews.llvm.org/D46524

llvm-svn: 332378
2018-05-15 17:43:27 +00:00