[clangd] New CC Ranking Model to fix bad inference due to overflow.

Unreachable file distances are represented as
`std::numeric_limits<unsigned>::max()`.
The previous dataset recorded the signals as `signed int` capturing this default
value as `-1`.

A new dataset was regenerated and a new model is trained that
interprets this unreachable as the intended value.

Distribution of `SymbolScopeDistance`:
Value         Normalised Frequency
0             46.6184
4294967295    29.5342
6             14.5666
4              6.4433
2              1.4534
8              0.5760
10             0.3581
....

Distribution of `FileProximityDistance`:
Value         Normalised Frequency
4294967295    39.9378
12             5.1997
14             4.9828
15             4.4221
16             4.3820
13             4.2765
17             3.8957
11             3.6387
19             3.4799
18             3.4076
....

Differential Revision: https://reviews.llvm.org/D89035
This commit is contained in:
Utkarsh Saxena 2020-10-08 13:13:40 +02:00
parent 80bf29f00c
commit a0a6fd435c
1 changed files with 350482 additions and 359779 deletions

File diff suppressed because it is too large Load Diff